Assembly: the old set – Prose Poetry Code

The x86 architecture, at the assembly level, is quite a bit more complex than our old friend, the 6502. Part of that comes from the fact that it does so much more: it has more registers, a bigger address space, and just more functionality altogether. So it has to be more complex. Maybe not as much as it is, but then it also had to maintain some backward compatibility with Intel’s older processors, and compatibility always complicates matters.

But we won’t judge it in this post. Instead, we’ll look at it as impartially as possible. If you’ve read the earlier posts in this series on assembly language, you know most of the conventions, so I won’t repeat them here. Like with the 6502,, I’ll look at the different instructions in groups based on their function, and I’m ignoring a lot of those that don’t have much use (like the BCD arithmetic group) or are specifically made for protected mode. What you get is the “core” set at the 286 level.

The x86 instruction set

Even stripped down to this essence, we’re looking at around 100 instructions, about double the 6502’s set. But most of these have obvious meanings, so I won’t have to dwell on them. Specifics will mostly come when we need them.

Also, since I’m assuming you’re using NASM (and an older version, at that), the instruction format in the examples I use will also be for that assembler. That means a couple of things:

The destination always comes first. So, to move the contents of the DX register to AX, you say mov ax, dx.
Square brackets indicate indirection. Thus, mov ax, bx moves the contents of BX into AX, while mov ax, [bx] moves the value in the memory location pointed to by BX.
NASM requires size suffixes on certain instructions. These are mostly the “string” instructions, such as MOVS, which you’d have to write as MOVSB or MOVSW, depending on the width of the data.

Flags

The x86, like most processors, comes with a number of flags that indicate internal state. And, as with the 6502, you can use these to control the flow of your own programs. Those that concern us the most are the carry, zero, sign, overflow, direction, and interrupt flags. The first three should be pretty obvious, even if you didn’t read the first parts of the series. The interrupt flag is likewise mostly self-explanatory. “Direction” is used for string instructions, which we’ll see later. And the overflow flag indicates that the last operation caused signed overflow based on two’s-complement arithmetic, as in this example:

overflow:
    mov al, 127
    add al, 2
; overflow flag is now set because 127 + 2 = 129,
; which overflows a signed byte (129 ~~ -127)
    add al, 2
; now overflow is clear, because -127 + 2 = -125

The carry, direction, and interrupt flags can be directly altered through code. The CLC, CLD, and CLI instructions clear them, while STC, STD, and STI set them. CMC complements the carry flag, flipping it to the opposite value. You can also use PUSHF to put whole register onto the stack, or POPF to load the flags from there; these instructions weren’t on the original 8086, however.

MOV and addressing

The MOV instruction is the workhorse of x86. It covers loads, stores, and copying between registers, and later extensions have made it Turing-complete in its own right. But in its original form, it wasn’t quite that bad. Plus, it allows all the different addressing modes, so it’s a good illustration of them.

The function of MOV is simple: copy the data in the source to the destination. Despite being short for “move”, it doesn’t do anything to the source data. The source, as for most x86 instructions can be a register, memory location, or an “immediate” value, and the destination can be memory or a register. The only general rule is that you can’t go directly from memory to memory in the same instruction. (There are, of course, exceptions.)

Moving registers (mov dx, ax) and immediate values (mov ah, 04ch) is easy enough, and it needs no further explanation. For memory, things get hairier. You’ve got a few options:

Direct address: a 16-bit value (or a label, in assembly code) indicating a memory location, such as mov ax, [1234] or mov dh, [data].
Register indirect: three registers, BX, SI, and DI, can be used as pointers within a segment: mov al, [bx] loads AL with the byte at location DS:BX.
Indexed: the same registers above, now with BP included, but with a displacement value added: mov al, [bx+4]. (BP is relative to the stack segment, though.)
Base indexed: either BX or BP plus either SI or DI, with an optional displacement: mov [bx+si+2], dx. (Again, BP uses the stack segment, all others the data segment.)

So MOV can do all of that, and that’s before it got expanded with 32-bit mode. Whew. If you don’t like clobbering the old value at the destination, you can use XCHG instead; it works the same way. (Interestingly, the x86 do-nothing instruction NOP is encoded as xchg ax, ax, which really does do nothing.)

Arithmetic and logic

After all the moving around, computing on values is the next most important task. We’ve got most of the usual suspects here: addition (ADD or the add-with-carry ADC); subtraction (SUB or SBB); logical AND, OR, NOT, and XOR (those are their mnemonics, too). There’s also a two’s-complement negation (NEG) and simple increment/decrement (INC, DEC) operations. These all do about what you’d expect, and they follow the same addressing rules as MOV above.

We can shift and rotate bits, as well. For shifting, SHL goes to the left, while SHR or SAR moves to the right; the difference is that SHR always shifts 0 into the most significant bit, while SAR repeats the bit that was already there. (Shifting left, as you probably know, is a quick and dirty way of multiplying by 2.)

Rotating moves the bits that get shifted “off the edge” back around to the other side of the byte or word, but it can optionally use the carry flag as an “extra” bit, so we have four possible permutations: ROL, ROR, RCL, RCR. The “rotate with carry” instructions effectively place the carry flag to the left of the most significant bit. Note that both shifting and rotating can take an immediate value for the number of times to shift, or they can use the value in CL.

A couple of instructions perform sign-extension. CBW takes the top bit in AL and duplicates it throughout AH. CWD works the same way, cloning the high bit of AX into every bit of DX. These are mainly used for signed arithmetic, and the registers they work on are fixed.

Unlike the 6502, the x86 has built-in instructions for multiplication and division. Unlike modern systems, the 16-bit versions are a bit limited. DIV divides either AX by a byte or DX:AX by a word. This implied register (or pair) also holds the result: quotient in AL or AX, remainder in AH or DX. MUL goes the other way, multiplying AL by a byte or AX by a word, and storing the result in AX or DX:AX. Those are more than a little restrictive, and they’re unsigned by design, so we also have IMUL and IDIV. These are for signed integers, and they let you use an immediate value instead: imul ax, -42.

Two other useful instructions can go here. CMP subtracts its source value from its destination and sets the flags accordingly, but throws away the result. TEST is similar, logical-ANDing its operands together for flag-changing purposes only. Both of these are mainly used for conditional flow control, as we’ll see below.

Flow control

We can move data around, we can operate on it, but we also need to be able to change the execution of a program based on the results of those operations. As you’ll recall, the 6502 did this with branching instructions. The x86 uses the same mechanism, but it calls them jumps instead. They come in two forms: conditional and unconditional. The unconditional one, JMP, simply causes the processor to pick up and move to a new location, and it can be anywhere in memory. Conditional jumps are only taken if certain conditions are met, and they take the form Jcc, where cc is a condition code. Those are:

C and NC, for “carry” and “no carry”, depending on the carry flag’s state.
Z and NZ, “zero” and “not zero”, based on the zero flag.
O and NO, for “overflow” and “no overflow”; as above, but for the overflow flag.
S and NS, “sign” and “no sign”, based on the sign flag; “sign” implies “negative”.
B and NB, “below” and “not below”, synonyms for C and NC.
A and NA, “above” and “not above”; “above” means neither the carry nor zero flag is set.
AE, BE, NAE, NBE; the same as the last two pairs, but add “or equal”.
L and NL, “less than” and “not less than”; “less than” requires either the sign or overflow flag set, but not both.
LE and NLE, “or equal” versions of the above.
G, GE, NG, NGE, “greater than”, etc., for the opposites of the previous four.
CXZ and NCXZ, “if CX is/is not zero”, usually used for loops.

These can be confusing, so here are a couple of examples:

mov ax, [value1]
mov dx, [value2]

a_loop:
add ax, dx

; jump if ax > 127,
; otherwise try again
jo end
jmp a_loop

end:
; do something else

mov ax, [bp+4]
cmp ax, 0
; if ax == 0...
jz iszero

; else if ax > 0...
jg ispos

; else if ax < 0...
jl isneg

; or if something went wrong
jmp error

CALL calls a subroutine, pushing a return address onto the stack beforehand. RET is the counterpart for returning from one. INT and IRET work the same way, but for interrupts rather than subroutines; INT doesn’t take an address, but an interrupt number, as we have seen.

A special LOOP instruction allows you to easily create, well, loops. It uses CX as an implicit counter, stopping when it reaches zero. You might use it like this:

; clear the screen
mov ax, 0b800h
mov es, ax
xor di, di  ; quicker clear to 0
mov cx, 80 * 25
mov dl, 20h ; ASCII code for space

nextchar:
mov [es:di],dl
add di,2    ; skip video attribute
loop nextchar

Two variant instructions LOOPZ and LOOPNZ, require that the zero flag be set or cleared, respectively, or they end the loop prematurely.

The stack

All x86 programs have use of a stack, and it’s not limited to 256 bytes like its 6502 cousin. Accessing the stack can’t be done directly in 16-bit land, as there’s no way to address relative to SP, but copying its value into BP and accessing from that is common. But even better are PUSH and POP, which take care of everything for you. They can be used on any register—except that you can’t pop into CS—and even memory; PUSH can also put an immediate value on the top of the stack, though not on the original 8086.

The stack grows “downward”. When a value is pushed onto it, that value is moved into the position pointed to by SP, and SP is decremented by 2. Popping does the opposite. Effectively, the instructions work like this:

do_push:
    mov [sp], value
    sub sp, 2

do_pop:
    mov value, [sp]
    add sp, 2

PUSHA and POPA are shortcuts for pushing all of the main 8 registers, helpful when you need to save state before starting a big subroutine.

Strings

The x86 can’t really work on strings, but it can work with arrays of bytes or 16-bit words using simple instructions. This is done through five instructions that operate on either bytes or words; NASM requires a suffixed “B” or “W” for these, but I’ll refer to them with a general “x”.

In all these cases, the “source” address is, by default, DS:SI, and the destination is ES:DI. Also, because these instructions were meant to be done in blocks, they can take a special REP prefix. This works a bit like LOOP, in that it stops when CX reaches 0. (REPE and REPNE are also available, and they work like LOOPZ and LOOPNZ.) After the instruction performs its operation, it increments both SI and DI by 1 for the byte version, 2 for the word version. This is where the direction flag comes into play, however: if it’s set, these instructions instead subtract 1 or 2 from those registers, effectively performing the operation in reverse.

LODSx and STOSx load and store, respectively. LODSx puts the value at [DS:SI] into AL or AX, while STOSx moves AL or AX into memory at [ES:DI]. Either way, they then change the index register (SI or DI) as described above. REP doesn’t really make sense with these, but they can work in hand-rolled loops.

MOVSx is a little more general, and it’s one of the few memory-to-memory operations available on the early x86. It copies a byte or word at [DS:SI] to [ES:DI], then changes both index registers based on the data width (1 for byte, 2 for word) and the direction flag (up for cleared, down for set). It’s all but made for block copying, as in this example:

; assumes SI and DI point to appropriate memory areas
; and CX holds a count of bytes to move
memcpy:
rep movsb
ret

CMPSx compares bytes or words, setting the flags accordingly. It could be used to implement a string comparison function like so:

; assumes SI and DI point where they should,
; and CX contains the max number of characters to test
; returns a value in AL:
; -1 if the "source" (1st) string is less,
; +1 if it's greater,
; 0 if they're equal
strncmp:
xor al, al
repe cmpsb
jg greater
dec al  ; sets to FFh, or -1
jmp exit

greater:
inc al  ; sets to 01h

ret

Finally, SCASx sets the flags based on a comparison between AL (for bytes) or AX (for words) and the value at [ES:DI]. The mnemonic stands for “scan string”, and that’s what it can do:

; assumes DI points to a string,
; CX holds the length of the string,
; and AL holds the character to search for
; returns in AX:
; position of found character, or -1 if not found

contains:
mov dx, cx
repne scasb
jncxz found

; character not found, since we ran out of string
mov ax, 0ffffh
jmp end

found:
; CX now holds the number of characters from string end,
; but we saved the original length in DX
; thus, the position is DX - CX - 1
inc cx
sub dx, cx
mov ax, dx

end:
ret

Input and output

Input and output send bytes or words between registers and the I/O space. This is a special set of 64K (65,536) memory locations, though only the first 1,024 were used on early PCs. Using them involves the IN and OUT instructions. These are fairly restrictive, in that they imply the AL or AX register for the data and DX for the I/O port: in ax, dx or out dx, al. However, for the “low” ports, those with addresses up to 0xff, you can instead use an immediate version: in al, 40h.

The 286 added in string I/O with the INSx and OUTSx instructions. These work similarly to LODSx and STOSx above, but the data is either coming from or going to an I/O port instead of main memory. (This was a bit faster than doing a manual loop, and some early peripherals actually couldn’t handle that!) The port number is always in DX, while [DX:SI or [ES:DI] is the data pointer, as above.

Enough for now

And we’re finally done. Next time, we can start programming this thing, but this post is already way too long, so I’ll see you later.