The x86 architecture, at the assembly level, is quite a bit more complex than our old friend, the 6502. Part of that comes from the fact that it does so much more: it has more registers, a bigger address space, and just more functionality altogether. So it has to be more complex. Maybe not as much as it is, but then it also had to maintain some backward compatibility with Intel’s older processors, and compatibility always complicates matters.
But we won’t judge it in this post. Instead, we’ll look at it as impartially as possible. If you’ve read the earlier posts in this series on assembly language, you know most of the conventions, so I won’t repeat them here. Like with the 6502,, I’ll look at the different instructions in groups based on their function, and I’m ignoring a lot of those that don’t have much use (like the BCD arithmetic group) or are specifically made for protected mode. What you get is the “core” set at the 286 level.
The x86 instruction set
Even stripped down to this essence, we’re looking at around 100 instructions, about double the 6502’s set. But most of these have obvious meanings, so I won’t have to dwell on them. Specifics will mostly come when we need them.
Also, since I’m assuming you’re using NASM (and an older version, at that), the instruction format in the examples I use will also be for that assembler. That means a couple of things:
- The destination always comes first. So, to move the contents of the DX register to AX, you say
mov ax, dx.
- Square brackets indicate indirection. Thus,
mov ax, bx moves the contents of BX into AX, while
mov ax, [bx] moves the value in the memory location pointed to by BX.
- NASM requires size suffixes on certain instructions. These are mostly the “string” instructions, such as
MOVS, which you’d have to write as
MOVSW, depending on the width of the data.
The x86, like most processors, comes with a number of flags that indicate internal state. And, as with the 6502, you can use these to control the flow of your own programs. Those that concern us the most are the carry, zero, sign, overflow, direction, and interrupt flags. The first three should be pretty obvious, even if you didn’t read the first parts of the series. The interrupt flag is likewise mostly self-explanatory. “Direction” is used for string instructions, which we’ll see later. And the overflow flag indicates that the last operation caused signed overflow based on two’s-complement arithmetic, as in this example:
mov al, 127
add al, 2
; overflow flag is now set because 127 + 2 = 129,
; which overflows a signed byte (129 ~~ -127)
add al, 2
; now overflow is clear, because -127 + 2 = -125
The carry, direction, and interrupt flags can be directly altered through code. The
CLI instructions clear them, while
STI set them.
CMC complements the carry flag, flipping it to the opposite value. You can also use
PUSHF to put whole register onto the stack, or
POPF to load the flags from there; these instructions weren’t on the original 8086, however.
MOV and addressing
MOV instruction is the workhorse of x86. It covers loads, stores, and copying between registers, and later extensions have made it Turing-complete in its own right. But in its original form, it wasn’t quite that bad. Plus, it allows all the different addressing modes, so it’s a good illustration of them.
The function of
MOV is simple: copy the data in the source to the destination. Despite being short for “move”, it doesn’t do anything to the source data. The source, as for most x86 instructions can be a register, memory location, or an “immediate” value, and the destination can be memory or a register. The only general rule is that you can’t go directly from memory to memory in the same instruction. (There are, of course, exceptions.)
Moving registers (
mov dx, ax) and immediate values (
mov ah, 04ch) is easy enough, and it needs no further explanation. For memory, things get hairier. You’ve got a few options:
- Direct address: a 16-bit value (or a label, in assembly code) indicating a memory location, such as
mov ax,  or
mov dh, [data].
- Register indirect: three registers,
DI, can be used as pointers within a segment:
mov al, [bx] loads
AL with the byte at location
- Indexed: the same registers above, now with
BP included, but with a displacement value added:
mov al, [bx+4]. (
BP is relative to the stack segment, though.)
- Base indexed: either
BP plus either
DI, with an optional displacement:
mov [bx+si+2], dx. (Again,
BP uses the stack segment, all others the data segment.)
MOV can do all of that, and that’s before it got expanded with 32-bit mode. Whew. If you don’t like clobbering the old value at the destination, you can use
XCHG instead; it works the same way. (Interestingly, the x86 do-nothing instruction
NOP is encoded as
xchg ax, ax, which really does do nothing.)
Arithmetic and logic
After all the moving around, computing on values is the next most important task. We’ve got most of the usual suspects here: addition (
ADD or the add-with-carry
ADC); subtraction (
SBB); logical AND, OR, NOT, and XOR (those are their mnemonics, too). There’s also a two’s-complement negation (
NEG) and simple increment/decrement (
DEC) operations. These all do about what you’d expect, and they follow the same addressing rules as
We can shift and rotate bits, as well. For shifting,
SHL goes to the left, while
SAR moves to the right; the difference is that
SHR always shifts 0 into the most significant bit, while
SAR repeats the bit that was already there. (Shifting left, as you probably know, is a quick and dirty way of multiplying by 2.)
Rotating moves the bits that get shifted “off the edge” back around to the other side of the byte or word, but it can optionally use the carry flag as an “extra” bit, so we have four possible permutations:
RCR. The “rotate with carry” instructions effectively place the carry flag to the left of the most significant bit. Note that both shifting and rotating can take an immediate value for the number of times to shift, or they can use the value in
A couple of instructions perform sign-extension.
CBW takes the top bit in
AL and duplicates it throughout
CWD works the same way, cloning the high bit of
AX into every bit of
DX. These are mainly used for signed arithmetic, and the registers they work on are fixed.
Unlike the 6502, the x86 has built-in instructions for multiplication and division. Unlike modern systems, the 16-bit versions are a bit limited.
DIV divides either
AX by a byte or
DX:AX by a word. This implied register (or pair) also holds the result: quotient in
AX, remainder in
MUL goes the other way, multiplying
AL by a byte or
AX by a word, and storing the result in
DX:AX. Those are more than a little restrictive, and they’re unsigned by design, so we also have
IDIV. These are for signed integers, and they let you use an immediate value instead:
imul ax, -42.
Two other useful instructions can go here.
CMP subtracts its source value from its destination and sets the flags accordingly, but throws away the result.
TEST is similar, logical-ANDing its operands together for flag-changing purposes only. Both of these are mainly used for conditional flow control, as we’ll see below.
We can move data around, we can operate on it, but we also need to be able to change the execution of a program based on the results of those operations. As you’ll recall, the 6502 did this with branching instructions. The x86 uses the same mechanism, but it calls them jumps instead. They come in two forms: conditional and unconditional. The unconditional one,
JMP, simply causes the processor to pick up and move to a new location, and it can be anywhere in memory. Conditional jumps are only taken if certain conditions are met, and they take the form
cc is a condition code. Those are:
NC, for “carry” and “no carry”, depending on the carry flag’s state.
NZ, “zero” and “not zero”, based on the zero flag.
NO, for “overflow” and “no overflow”; as above, but for the overflow flag.
NS, “sign” and “no sign”, based on the sign flag; “sign” implies “negative”.
NB, “below” and “not below”, synonyms for
NA, “above” and “not above”; “above” means neither the carry nor zero flag is set.
NBE; the same as the last two pairs, but add “or equal”.
NL, “less than” and “not less than”; “less than” requires either the sign or overflow flag set, but not both.
NLE, “or equal” versions of the above.
NGE, “greater than”, etc., for the opposites of the previous four.
CX is/is not zero”, usually used for loops.
These can be confusing, so here are a couple of examples:
mov ax, [value1]
mov dx, [value2]
add ax, dx
; jump if ax > 127,
; otherwise try again
; do something else
mov ax, [bp+4]
cmp ax, 0
; if ax == 0...
; else if ax > 0...
; else if ax < 0...
; or if something went wrong
CALL calls a subroutine, pushing a return address onto the stack beforehand.
RET is the counterpart for returning from one.
IRET work the same way, but for interrupts rather than subroutines;
INT doesn’t take an address, but an interrupt number, as we have seen.
LOOP instruction allows you to easily create, well, loops. It uses
CX as an implicit counter, stopping when it reaches zero. You might use it like this:
; clear the screen
mov ax, 0b800h
mov es, ax
xor di, di ; quicker clear to 0
mov cx, 80 * 25
mov dl, 20h ; ASCII code for space
add di,2 ; skip video attribute
Two variant instructions
LOOPNZ, require that the zero flag be set or cleared, respectively, or they end the loop prematurely.
All x86 programs have use of a stack, and it’s not limited to 256 bytes like its 6502 cousin. Accessing the stack can’t be done directly in 16-bit land, as there’s no way to address relative to
SP, but copying its value into
BP and accessing from that is common. But even better are
POP, which take care of everything for you. They can be used on any register—except that you can’t pop into
CS—and even memory;
PUSH can also put an immediate value on the top of the stack, though not on the original 8086.
The stack grows “downward”. When a value is pushed onto it, that value is moved into the position pointed to by
SP is decremented by 2. Popping does the opposite. Effectively, the instructions work like this:
mov [sp], value
sub sp, 2
mov value, [sp]
add sp, 2
POPA are shortcuts for pushing all of the main 8 registers, helpful when you need to save state before starting a big subroutine.
The x86 can’t really work on strings, but it can work with arrays of bytes or 16-bit words using simple instructions. This is done through five instructions that operate on either bytes or words; NASM requires a suffixed “B” or “W” for these, but I’ll refer to them with a general “x”.
In all these cases, the “source” address is, by default,
DS:SI, and the destination is
ES:DI. Also, because these instructions were meant to be done in blocks, they can take a special
REP prefix. This works a bit like
LOOP, in that it stops when
CX reaches 0. (
REPNE are also available, and they work like
LOOPNZ.) After the instruction performs its operation, it increments both
DI by 1 for the byte version, 2 for the word version. This is where the direction flag comes into play, however: if it’s set, these instructions instead subtract 1 or 2 from those registers, effectively performing the operation in reverse.
STOSx load and store, respectively.
LODSx puts the value at
AX into memory at
[ES:DI]. Either way, they then change the index register (
DI) as described above.
REP doesn’t really make sense with these, but they can work in hand-rolled loops.
MOVSx is a little more general, and it’s one of the few memory-to-memory operations available on the early x86. It copies a byte or word at
[ES:DI], then changes both index registers based on the data width (1 for byte, 2 for word) and the direction flag (up for cleared, down for set). It’s all but made for block copying, as in this example:
; assumes SI and DI point to appropriate memory areas
; and CX holds a count of bytes to move
CMPSx compares bytes or words, setting the flags accordingly. It could be used to implement a string comparison function like so:
; assumes SI and DI point where they should,
; and CX contains the max number of characters to test
; returns a value in AL:
; -1 if the "source" (1st) string is less,
; +1 if it's greater,
; 0 if they're equal
xor al, al
dec al ; sets to FFh, or -1
inc al ; sets to 01h
SCASx sets the flags based on a comparison between
AL (for bytes) or
AX (for words) and the value at
[ES:DI]. The mnemonic stands for “scan string”, and that’s what it can do:
; assumes DI points to a string,
; CX holds the length of the string,
; and AL holds the character to search for
; returns in AX:
; position of found character, or -1 if not found
mov dx, cx
; character not found, since we ran out of string
mov ax, 0ffffh
; CX now holds the number of characters from string end,
; but we saved the original length in DX
; thus, the position is DX - CX - 1
sub dx, cx
mov ax, dx
Input and output
Input and output send bytes or words between registers and the I/O space. This is a special set of 64K (65,536) memory locations, though only the first 1,024 were used on early PCs. Using them involves the
OUT instructions. These are fairly restrictive, in that they imply the
AX register for the data and
DX for the I/O port:
in ax, dx or
out dx, al. However, for the “low” ports, those with addresses up to 0xff, you can instead use an immediate version:
in al, 40h.
The 286 added in string I/O with the
OUTSx instructions. These work similarly to
STOSx above, but the data is either coming from or going to an I/O port instead of main memory. (This was a bit faster than doing a manual loop, and some early peripherals actually couldn’t handle that!) The port number is always in
[ES:DI] is the data pointer, as above.
Enough for now
And we’re finally done. Next time, we can start programming this thing, but this post is already way too long, so I’ll see you later.