If you who grew up in the 90s with some kind of interest in computer security, there's a decent change you probably only ever attempted to learn how to program in assembly for one reason: writing shellcode.
I didn't grow up in the golden era when most programmers programmed in assembly and I only did one computer systems subject at university which had us learning assembly for the Motorola 68000 microprocessor. And these days, due to the ridiculous amounts of memory machines have, in only a few extreme circumstances, the lowest-level programming language I'll ever have to program with will be C, and it'll only be as an extension to my main programming language (Perl). Unless I end up working with embedded systems, or if I get into something like malware analysis/reverse engineering as a career, dealing directly with assembly is something I'm probably not going to do as part of my job, so it has to be pursued on the side.
Slowly over the past year or so, I've been spending a couple of hours here and there playing with some very small programs, solving reverse engineering problems for wargames and writing my own shellcode from scratch rather than stealing them from shell-storm all the time.
It's been long and challenging, and I wanted to share a few rewarding experiences.
Writing shellcode that doesn't just spawn a shell
Since shellcode was what originally exposed me to assembly all those years ago, why not further that skill?
execve(/bin/sh) shellcode is what's used in pretty much every tutorial on buffer overflows and shellcode ever (I mean, I assume that's where the name shell code came from, right?). But there are other useful kinds of shellcode to write.
An easily modifiable and totally unoriginal shellcode that I wrote to read a file and write it to stdout has come in super handy when playing online wargames, since many of these games require reading a file - that you usually know, or can guess the name of, ahead of time - to get the password to the next level. I also had to write a less easily modified version of the same read/write shellcode without using the jmp-call-pop technique.
I like writing shellcode because I'm writing a small program from scratch that does a very specific thing and isn't overly complicated. It requires reading a little documentation to figure out what syscall(s) I need and how they are called and it's often written quick enough that the pay-off is immediate and there's a practical purpose to the program, as opposed to writing a program that uppercases whatever's in argv[0].
A lot of reverse engineering is patterns
While reversing a program for a more difficult level in Blackbox, I came across some code that was indexing into an array of chars a lot, and I picked up a reoccurring pattern:
0x804859a: lea 0xfffffbe9(%ebp),%eax
0x80485a0: mov %eax,%ecx
0x80485a2: add 0xfffffbe4(%ebp),%ecx
At offset 0xfffffbe8(%ebp), I knew there was a char array stored on the stack, let's call it buf, so 0xfffffbe9(%ebp) is buf + 1. At offset 0xfffffbe4(%ebp) was an int that was being used to index into the array, so because we add that value, what we essentially end up with in ecx is the address of buf[index + 1].
0x80485a8: lea 0xfffffbe9(%ebp),%eax
0x80485ae: mov %eax,%edx
0x80485b0: add 0xfffffbe4(%ebp),%edx
This is the same thing, just placed into the edx register.
0x80485b6: lea 0xfffffbe8(%ebp),%eax
0x80485bc: add 0xfffffbe4(%ebp),%eax
This is almost the same thing, except it's the address of buf[index] that's placed into the eax register and the compiler used one less instruction to do it.
0x80485c2: movzbl (%eax),%eax
0x80485c5: xor (%edx),%al
0x80485c7: mov %al,(%ecx)
So if eax is the address to buf[index], ecx is the address to buf[index + 1] and edx is also the address to buf[index + 1], the first instruction puts the value at the address of buf[index] into eax, xor's it with the value at the address of buf[index + 1] and moves that value into the value at the address of buf[index + 1]. So all of this translates to the following bit of C code:
buf[index + 1] = buf[index] ^ buf[index + 1];
There were about 30-40 instructions that all did similar things all in a row - which is intimidating until you break it down into chunks of 2 or 3 instructions - and recognising the pattern and relating it to a higher-level concept made reversing the rest of the program much easier.
The more of these patterns you find, the more you understand how compilers break down higher-level ideas.
"Indexing into an array... how exciting..."
Finding gold along the way
The first real assembly nugget that I came across earlier this year was while playing Blackbox. While checking the disassembly of a program in gdb, I came across this sequence of instructions:
0x08048519 : lea 0xfffffc10(%ebp),%eax
0x0804851f : mov $0xffffffff,%ecx
0x08048524 : mov %eax,0xfffffc00(%ebp)
0x0804852a : mov $0x0,%al
0x0804852c : cld
0x0804852d : mov 0xfffffc00(%ebp),%edi
0x08048533 : repnz scas %es:(%edi),%al
0x08048535 : mov %ecx,%eax
0x08048537 : not %eax
0x08048539 : dec %eax
My reaction: what the fuck was that? What the hell is repnz? And scas? Is that two opcodes in one? What's that funky addressing syntax? And cld... an opcode with no operands? What is this?!
I won't go through and explain each instruction, since - as I discovered soon after figuring out what this little nugget was doing - there are already plenty of sites that have explained it before, but it was pretty cool to step through the code while reading reference documentation to figure out what those unfamiliar instructions were doing.
The short of it is, as explained by this StackOverflow answer, it's an implementation of strlen, and sometimes gcc will inline it as a little optimisation.
Moving forward
I'll keep reverse engineer programs and write shellcode for wargames. It's just a lot of fun once you get into the swing of it.
Lately something else that's become a lot of fun is learning how some low-level parts of systems work. An article about building a Tetris clone in x86 assembly does a great job at introducing some parts of systems that are much lower-level than I'd usually deal with (VRAM) and some more advanced topics (bootloaders).
2015-11-04 edit: I rewrote a few paragraphs, because I hated how they read.