The x86 architecture, at the assembly level, is quite a bit more complex than our old friend, the 6502. Part of that comes from the fact that it does so much more: it has more registers, a bigger address space, and just more functionality altogether. So it has to be more complex. Maybe not as much as it is, but then it also had to maintain some backward compatibility with Intel’s older processors, and compatibility always complicates matters.
But we won’t judge it in this post. Instead, we’ll look at it as impartially as possible. If you’ve read the earlier posts in this series on assembly language, you know most of the conventions, so I won’t repeat them here. Like with the 6502,, I’ll look at the different instructions in groups based on their function, and I’m ignoring a lot of those that don’t have much use (like the BCD arithmetic group) or are specifically made for protected mode. What you get is the “core” set at the 286 level.
The x86 instruction set
Even stripped down to this essence, we’re looking at around 100 instructions, about double the 6502’s set. But most of these have obvious meanings, so I won’t have to dwell on them. Specifics will mostly come when we need them.
Also, since I’m assuming you’re using NASM (and an older version, at that), the instruction format in the examples I use will also be for that assembler. That means a couple of things:
- The destination always comes first. So, to move the contents of the DX register to AX, you say
mov ax, dx
.
- Square brackets indicate indirection. Thus,
mov ax, bx
moves the contents of BX into AX, while mov ax, [bx]
moves the value in the memory location pointed to by BX.
- NASM requires size suffixes on certain instructions. These are mostly the “string” instructions, such as
MOVS
, which you’d have to write as MOVSB
or MOVSW
, depending on the width of the data.
Flags
The x86, like most processors, comes with a number of flags that indicate internal state. And, as with the 6502, you can use these to control the flow of your own programs. Those that concern us the most are the carry, zero, sign, overflow, direction, and interrupt flags. The first three should be pretty obvious, even if you didn’t read the first parts of the series. The interrupt flag is likewise mostly self-explanatory. “Direction” is used for string instructions, which we’ll see later. And the overflow flag indicates that the last operation caused signed overflow based on two’s-complement arithmetic, as in this example:
overflow:
mov al, 127
add al, 2
; overflow flag is now set because 127 + 2 = 129,
; which overflows a signed byte (129 ~~ -127)
add al, 2
; now overflow is clear, because -127 + 2 = -125
The carry, direction, and interrupt flags can be directly altered through code. The CLC
, CLD
, and CLI
instructions clear them, while STC
, STD
, and STI
set them. CMC
complements the carry flag, flipping it to the opposite value. You can also use PUSHF
to put whole register onto the stack, or POPF
to load the flags from there; these instructions weren’t on the original 8086, however.
MOV and addressing
The MOV
instruction is the workhorse of x86. It covers loads, stores, and copying between registers, and later extensions have made it Turing-complete in its own right. But in its original form, it wasn’t quite that bad. Plus, it allows all the different addressing modes, so it’s a good illustration of them.
The function of MOV
is simple: copy the data in the source to the destination. Despite being short for “move”, it doesn’t do anything to the source data. The source, as for most x86 instructions can be a register, memory location, or an “immediate” value, and the destination can be memory or a register. The only general rule is that you can’t go directly from memory to memory in the same instruction. (There are, of course, exceptions.)
Moving registers (mov dx, ax
) and immediate values (mov ah, 04ch
) is easy enough, and it needs no further explanation. For memory, things get hairier. You’ve got a few options:
- Direct address: a 16-bit value (or a label, in assembly code) indicating a memory location, such as
mov ax, [1234]
or mov dh, [data]
.
- Register indirect: three registers,
BX
, SI
, and DI
, can be used as pointers within a segment: mov al, [bx]
loads AL
with the byte at location DS:BX
.
- Indexed: the same registers above, now with
BP
included, but with a displacement value added: mov al, [bx+4]
. (BP
is relative to the stack segment, though.)
- Base indexed: either
BX
or BP
plus either SI
or DI
, with an optional displacement: mov [bx+si+2], dx
. (Again, BP
uses the stack segment, all others the data segment.)
So MOV
can do all of that, and that’s before it got expanded with 32-bit mode. Whew. If you don’t like clobbering the old value at the destination, you can use XCHG
instead; it works the same way. (Interestingly, the x86 do-nothing instruction NOP
is encoded as xchg ax, ax
, which really does do nothing.)
Arithmetic and logic
After all the moving around, computing on values is the next most important task. We’ve got most of the usual suspects here: addition (ADD
or the add-with-carry ADC
); subtraction (SUB
or SBB
); logical AND, OR, NOT, and XOR (those are their mnemonics, too). There’s also a two’s-complement negation (NEG
) and simple increment/decrement (INC
, DEC
) operations. These all do about what you’d expect, and they follow the same addressing rules as MOV
above.
We can shift and rotate bits, as well. For shifting, SHL
goes to the left, while SHR
or SAR
moves to the right; the difference is that SHR
always shifts 0 into the most significant bit, while SAR
repeats the bit that was already there. (Shifting left, as you probably know, is a quick and dirty way of multiplying by 2.)
Rotating moves the bits that get shifted “off the edge” back around to the other side of the byte or word, but it can optionally use the carry flag as an “extra” bit, so we have four possible permutations: ROL
, ROR
, RCL
, RCR
. The “rotate with carry” instructions effectively place the carry flag to the left of the most significant bit. Note that both shifting and rotating can take an immediate value for the number of times to shift, or they can use the value in CL
.
A couple of instructions perform sign-extension. CBW
takes the top bit in AL
and duplicates it throughout AH
. CWD
works the same way, cloning the high bit of AX
into every bit of DX
. These are mainly used for signed arithmetic, and the registers they work on are fixed.
Unlike the 6502, the x86 has built-in instructions for multiplication and division. Unlike modern systems, the 16-bit versions are a bit limited. DIV
divides either AX
by a byte or DX:AX
by a word. This implied register (or pair) also holds the result: quotient in AL
or AX
, remainder in AH
or DX
. MUL
goes the other way, multiplying AL
by a byte or AX
by a word, and storing the result in AX
or DX:AX
. Those are more than a little restrictive, and they’re unsigned by design, so we also have IMUL
and IDIV
. These are for signed integers, and they let you use an immediate value instead: imul ax, -42
.
Two other useful instructions can go here. CMP
subtracts its source value from its destination and sets the flags accordingly, but throws away the result. TEST
is similar, logical-ANDing its operands together for flag-changing purposes only. Both of these are mainly used for conditional flow control, as we’ll see below.
Flow control
We can move data around, we can operate on it, but we also need to be able to change the execution of a program based on the results of those operations. As you’ll recall, the 6502 did this with branching instructions. The x86 uses the same mechanism, but it calls them jumps instead. They come in two forms: conditional and unconditional. The unconditional one, JMP
, simply causes the processor to pick up and move to a new location, and it can be anywhere in memory. Conditional jumps are only taken if certain conditions are met, and they take the form Jcc
, where cc
is a condition code. Those are:
C
and NC
, for “carry” and “no carry”, depending on the carry flag’s state.
Z
and NZ
, “zero” and “not zero”, based on the zero flag.
O
and NO
, for “overflow” and “no overflow”; as above, but for the overflow flag.
S
and NS
, “sign” and “no sign”, based on the sign flag; “sign” implies “negative”.
B
and NB
, “below” and “not below”, synonyms for C
and NC
.
A
and NA
, “above” and “not above”; “above” means neither the carry nor zero flag is set.
AE
, BE
, NAE
, NBE
; the same as the last two pairs, but add “or equal”.
L
and NL
, “less than” and “not less than”; “less than” requires either the sign or overflow flag set, but not both.
LE
and NLE
, “or equal” versions of the above.
G
, GE
, NG
, NGE
, “greater than”, etc., for the opposites of the previous four.
CXZ
and NCXZ
, “if CX
is/is not zero”, usually used for loops.
These can be confusing, so here are a couple of examples:
mov ax, [value1]
mov dx, [value2]
a_loop:
add ax, dx
; jump if ax > 127,
; otherwise try again
jo end
jmp a_loop
end:
; do something else
mov ax, [bp+4]
cmp ax, 0
; if ax == 0...
jz iszero
; else if ax > 0...
jg ispos
; else if ax < 0...
jl isneg
; or if something went wrong
jmp error
CALL
calls a subroutine, pushing a return address onto the stack beforehand. RET
is the counterpart for returning from one. INT
and IRET
work the same way, but for interrupts rather than subroutines; INT
doesn’t take an address, but an interrupt number, as we have seen.
A special LOOP
instruction allows you to easily create, well, loops. It uses CX
as an implicit counter, stopping when it reaches zero. You might use it like this:
; clear the screen
mov ax, 0b800h
mov es, ax
xor di, di ; quicker clear to 0
mov cx, 80 * 25
mov dl, 20h ; ASCII code for space
nextchar:
mov [es:di],dl
add di,2 ; skip video attribute
loop nextchar
Two variant instructions LOOPZ
and LOOPNZ
, require that the zero flag be set or cleared, respectively, or they end the loop prematurely.
The stack
All x86 programs have use of a stack, and it’s not limited to 256 bytes like its 6502 cousin. Accessing the stack can’t be done directly in 16-bit land, as there’s no way to address relative to SP
, but copying its value into BP
and accessing from that is common. But even better are PUSH
and POP
, which take care of everything for you. They can be used on any register—except that you can’t pop into CS
—and even memory; PUSH
can also put an immediate value on the top of the stack, though not on the original 8086.
The stack grows “downward”. When a value is pushed onto it, that value is moved into the position pointed to by SP
, and SP
is decremented by 2. Popping does the opposite. Effectively, the instructions work like this:
do_push:
mov [sp], value
sub sp, 2
do_pop:
mov value, [sp]
add sp, 2
PUSHA
and POPA
are shortcuts for pushing all of the main 8 registers, helpful when you need to save state before starting a big subroutine.
Strings
The x86 can’t really work on strings, but it can work with arrays of bytes or 16-bit words using simple instructions. This is done through five instructions that operate on either bytes or words; NASM requires a suffixed “B” or “W” for these, but I’ll refer to them with a general “x”.
In all these cases, the “source” address is, by default, DS:SI
, and the destination is ES:DI
. Also, because these instructions were meant to be done in blocks, they can take a special REP
prefix. This works a bit like LOOP
, in that it stops when CX
reaches 0. (REPE
and REPNE
are also available, and they work like LOOPZ
and LOOPNZ
.) After the instruction performs its operation, it increments both SI
and DI
by 1 for the byte version, 2 for the word version. This is where the direction flag comes into play, however: if it’s set, these instructions instead subtract 1 or 2 from those registers, effectively performing the operation in reverse.
LODSx
and STOSx
load and store, respectively. LODSx
puts the value at [DS:SI]
into AL
or AX
, while STOSx
moves AL
or AX
into memory at [ES:DI]
. Either way, they then change the index register (SI
or DI
) as described above. REP
doesn’t really make sense with these, but they can work in hand-rolled loops.
MOVSx
is a little more general, and it’s one of the few memory-to-memory operations available on the early x86. It copies a byte or word at [DS:SI]
to [ES:DI]
, then changes both index registers based on the data width (1 for byte, 2 for word) and the direction flag (up for cleared, down for set). It’s all but made for block copying, as in this example:
; assumes SI and DI point to appropriate memory areas
; and CX holds a count of bytes to move
memcpy:
rep movsb
ret
CMPSx
compares bytes or words, setting the flags accordingly. It could be used to implement a string comparison function like so:
; assumes SI and DI point where they should,
; and CX contains the max number of characters to test
; returns a value in AL:
; -1 if the "source" (1st) string is less,
; +1 if it's greater,
; 0 if they're equal
strncmp:
xor al, al
repe cmpsb
jg greater
dec al ; sets to FFh, or -1
jmp exit
greater:
inc al ; sets to 01h
ret
Finally, SCASx
sets the flags based on a comparison between AL
(for bytes) or AX
(for words) and the value at [ES:DI]
. The mnemonic stands for “scan string”, and that’s what it can do:
; assumes DI points to a string,
; CX holds the length of the string,
; and AL holds the character to search for
; returns in AX:
; position of found character, or -1 if not found
contains:
mov dx, cx
repne scasb
jncxz found
; character not found, since we ran out of string
mov ax, 0ffffh
jmp end
found:
; CX now holds the number of characters from string end,
; but we saved the original length in DX
; thus, the position is DX - CX - 1
inc cx
sub dx, cx
mov ax, dx
end:
ret
Input and output
Input and output send bytes or words between registers and the I/O space. This is a special set of 64K (65,536) memory locations, though only the first 1,024 were used on early PCs. Using them involves the IN
and OUT
instructions. These are fairly restrictive, in that they imply the AL
or AX
register for the data and DX
for the I/O port: in ax, dx
or out dx, al
. However, for the “low” ports, those with addresses up to 0xff, you can instead use an immediate version: in al, 40h
.
The 286 added in string I/O with the INSx
and OUTSx
instructions. These work similarly to LODSx
and STOSx
above, but the data is either coming from or going to an I/O port instead of main memory. (This was a bit faster than doing a manual loop, and some early peripherals actually couldn’t handle that!) The port number is always in DX
, while [DX:SI
or [ES:DI]
is the data pointer, as above.
Enough for now
And we’re finally done. Next time, we can start programming this thing, but this post is already way too long, so I’ll see you later.