Assembly: the architecture – Prose Poetry Code

The x86 gets a lot of hate for its architecture. In modern days, it’s really more out of habit than any technical reason, but the 16-bit variant really deserved some of its bad reputation. Other parts, in my opinion, were overstated. Hyperbole then became accepted wisdom, and positive reinforcement only made it grow.

But I’m not here to apologize for the x86. No, this post has a different purpose. It’s an overview of the x86 architecture from the point of view of an assembly programmer circa 1991. Although the 386 had been out for a few years, older processors were still around, and not too much was making full use of the 32-bit extensions that the 386 brought. DOS, particularly, remained 16-bit, though there were extensions to address more memory. For the most part, however, we’ll stick to “real mode”, like it was in the olden days.

The CPU

An x86 processor of this vintage was a lot less complex than today’s Intel i7 or AMD A10, and not just because it didn’t include integrated graphics, megabytes of on-die cache, power management functions, and so on. Later lines have also added lots of assembly-level goodness, like MMX and SSE. They’re 64-bit now, and that required the addition of “long mode”.

But let’s ignore all that and look at the “core” of the x86. There’s not that much to it, really. In the original conception, you have about a dozen programmer-accessible registers, all of which started out 16 bits wide, but exactly one of these is truly general-purpose. The registers can be divided into a few categories, and we’ll take each of them in turn.

General registers

These are the “big four” of the x86: AX, BX, CX, and DX. As I said last time, all four can also be accessed as a pair of byte-sized registers. The high bytes are identified by the first letter of the register name followed by H, while the low bytes use L. So we have AL or BH or whatever. It doesn’t actually increase the number of registers we have, but sometimes the savings from loading only a single byte can add up. Remember, older computers had less memory, so they had to use it more wisely.

Each of the four 16-bit general registers is used in its own special way. AX is the accumulator (like the A register from the 6502), and it’s usually the best for arithmetic; some instructions, like the multiplication instruction MUL, require it. BX is used as a “base” for a few addressing-type instructions. CX is semi-reserved as a loop counter. And DX is sometimes taken as an “extension” of AX, creating a kind of 32-bit register referred to as DX:AX.

Of course, if you’re not using the instructions that work on specific registers, you can do what you like with these. Unlike the 6502, where almost everything involved a memory access, x86 does let you work register-to-register. (On the other hand, it doesn’t have cheap access to the zero page, so there.) You can add AX to BX, for instance, and no one will care.

Pointer registers

The other four main registers all have something to do with pointers and addressing. You can use them as scratch space for arithmetic, but a lot of instructions assume they hold addresses. Unlike the general registers, all these are only 16-bit. (Modern systems do give you special access to the low byte of them, however.)

SP is the big one out of this group: the stack pointer. Stacks are a lot more important on the x86 than the 6502, mainly because that’s where you put your “extra” data that won’t fit in registers. But programmers usually don’t manipulate SP directly. They instead pop and push (note the terminology change from 6502), and those instructions change SP as needed. BP is an extra pointer register mostly used by languages like C to access stack “frames”, but assembly programmers can turn it into a general pointer.

The other two come in a pair: SI and DI. These stand for “source index” and “destination index”, respectively, and the processor uses them for certain load and store operations. Quite a few of the DOS and BIOS APIs expect them to hold pointers to input and output parameters. And on an early x86, they were the best option for indirect addressing, a bit like the 6502’s X and Y registers.

System registers

The instruction pointer, IP, controls the execution of code. It’s not directly accessible by programmers; instead, you change it through branching (jumping, in x86 parlance) and subroutine calls. In other words, you can mostly act like it’s not there.

The register that holds the flags, usually called FLAGS when it needs a name, also can’t directly be read from or written into. You can push it to the stack, however, then manipulate it from there, but the main three flags (carry, direction, and interrupt) have special instructions to set and clear them, similar to the 6502.

While the x86 has quite a few more flags than the 6502, most of them aren’t too important unless you’re delving deep into an operating system’s internals. The main ones to know about are the carry, zero, sign, direction, overflow, and interrupt flags. Most of them should be self-explanatory, while “overflow” works in a similar fashion to its 6502 counterpart. The direction flag is mostly used for string-handling instructions, which we’ll see in a later post.

One more register deserves a brief mention here. On the 286, it’s called the MSW, or “machine status word”. After that, it gets the official designation CR0. It’s used to control internal aspects of the processor, such as switching between real and protected modes or emulating a floating-point unit. I can’t think of a case where this series would use it, but now you know it’s there.

Segment registers

And then we come to the bane of many an assembly programmer, at least those of a generation or two ago: the segment registers. We’ll look at the x86 memory model in a moment; for now, just think of segments as something like overlapping banks of memory.

We’ve got four segment registers, all 16 bits wide even in 32-bit mode, for the code, data, stack, and extra segments. Their mnemonic names, conveniently enough, are initialisms: CS, DS, SS, and ES. CS points to the segment where execution is occurring; you can’t change it except with a “far” call, but you can read from it. SS holds the segment address of the stack, but you probably figured that one out already. DS is the default for reading and writing memory, while ES, as its name suggests, is for whatever you like.

Segment registers are weird. You can move values to and from them (except into CS, as I said), but you can’t operate on them. What you can do, however, is use them to “override” an address. For example, loading a value from memory uses DS as its base, but you can make it use ES instead: mov ax, [es:di] loads the value pointed to by DI, but in the ES segment.

Memory model

And that leads us to the x86 memory model. It’s a bit convoluted, since the original 8086 was designed as a 16-bit system that could address 1 MB of memory. Something had to give, but Intel took a…nonstandard approach.

Every address on the x86 has two parts: segment and offset. (This is true even on today’s processors, but 64-bit mode is hardwired to treat all segments as starting at address 0.) In real mode, as with an older x86 running DOS, an actual memory address can be obtained by shifting the segment 4 bits to the left and adding the offset. Or, to put it in code: address = (segment << 4) + offset. Each segment, then, can address a 64K block of memory in a fashion much like banking in the 8-bit world.

The difference between one segment and the next is only 16 bytes, thanks to the 4-bit shift. That means that segments will overlap. The addresses b000:8123 and b800:0173, for example, refer to the same memory location: 0xb8123. In practice, this doesn’t matter too much; the segment portion is mostly used as a base address, while the offset is, well, an offset.

In protected mode, throw those last paragraphs out. Segments instead are indexes into a lookup table, creating a virtual memory system that essentially went unused until its mercy-killing by AMD when they brought out the x86-64. (The segment-based virtual memory scheme of protected mode, interesting though it seemed, was basically an exercise in the knapsack problem.) We won’t be worrying about protected mode much, though, so let’s move on.

Back to real mode, the world of DOS and early x86s. A little bit of arithmetic shows that the segment:offset addressing method allows access to 1 MB of memory, more or less. 0000:0000 is, of course, address 0, and it’s the lowest possible value. The highest possible is ffff:ffff, and that presents a problem. Add it up, and you get 0x100fef. On old systems, this simply wrapped around the 1 MB “barrier”, to 0x00fef. (Once memory sizes expanded to multiple megabytes, it no longer overflowed, but some programs relied on that behavior, so a hardware hack was put into place. It’s called the A20 gate, and it was originally put in the keyboard controller, of all places. But I digress.)

Input/output

Also available to the x86 are the I/O ports. These are accessed using the IN and OUT instructions, a byte, word, or (in 32-bit mode) double-word at a time. They function like their own little address space, separate from main memory. The x86 architecture itself doesn’t really define which ports do what. That’s left to the PC platform—which will be the subject of the next post.

Modern operating systems also allow memory-mapped I/O access, but we’ll leave that alone for the time being. It’s far more useful when you go beyond the bounds of the 16-bit world.

Interrupts

Like the 6502, the x86 has support for interrupting the processor’s normal flow, but it goes about it in a different way. An interrupt can be caused by hardware; the old term IRQ, “interrupt request”, referred to this. But software can also directly invoke interrupts. As we saw last week, that’s how DOS implemented its API.

In real mode, the result is the same either way. The processor jumps to a specific memory location and executes a bit of code, then returns to what it was doing like nothing happened. We’ll see the details later on, but that’s the gist of it.

To be continued

So I’ve rambled on long enough for this post. Next up will be a look at the PC platform itself, at least as it stood a quarter century ago. That’ll be a look into deep history for many, but the choices made then still affect us today. Following that will be the dive into the deep end of “old-school” x86 assembly.