Assembly: the precursor

Before Windows was a thing, there was DOS. Before today’s 64-bit processors, or even their 32-bit predecessors, there was the 16-bit world. In a way, 16-bit DOS is a bridge between the olden days of 8-bit microprocessors like the 6502 and the modern behemoths of computing, like the A10 in the PC I’m writing this on, or the Core i5 in the next room.

A while back, I started writing a short series of posts about assembly language. I chose the 6502, and one of the reasons why was the availability of an online assembler and emulator. Very recently, a new challenger has come into this field, Virtual x86. Even better, it comes with a ready-to-go FreeDOS image with the NASM assembler pre-installed. Perfect.

So, I’m thinking about reviving the assembly language series, moving to the (slightly) more modern architecture of 16-bit x86 and DOS. It’s a little more relevant than the 6502, as well as much more forgiving, but the fundamentals of assembly are still there: speed, size, power. And, since 16-bit code doesn’t run at all on 64-bit x86 CPUs, we don’t have to worry as much about bad habits carrying over. The use of DOS (FreeDOS, specifically) helps, too, since essentially nothing uses it these days. Thus, we can focus on the code in the abstract, rather than getting bogged down in platform-specific details, as we’d have to do if we looked at “modern” x86.

A free sample

Assembly in this old-school fashion is fairly simple, though not quite as easy as the venerable 6502. Later on, we can delve into the intricacies of addressing modes and segment overrides and whatnot. For now, we’ll look at the 16-bit, real-mode, DOS version of everyone’s first program.

org 100h

    mov dx, data
    mov ah, 09h
    int 21h
    mov ah, 04ch
    int 21h

data:
    db 'Hello, World!$'

All this little snippet does is print the classic string to the screen and then exit, but it gives you a good idea of the structure of x86 assembly using what passes for the DOS API. (Oh, by the way, I copied the code from Virtual x86’s FreeDOS image, but I don’t see any other way you could write it.)

Here are the highlights:

  • org 100h defines the program origin. For the simplest DOS programs (COM files), this is always 100h, or 0x100. COM files are limited to a single 64K segment, and the first 256 bytes are reserved for the operating system, to hold command-line arguments and things like that.

  • The 16-bit variety of x86 has an even dozen programmer-accessible registers, but only four of these are anywhere close to general-purpose. These are AX, BX, CX, and DX, and they’re all 16 bits wide. However, you can also use them as byte-sized registers. AH is the high byte of AX, AL the low byte, and so on with BH, BL, CH, CL, DH, and DL. Sometimes, that’s easier than dealing with a full word at a time.

  • mov is the general load/store instruction for x86. It’s very versatile; in fact, it’s Turing-complete. Oh, and some would say it’s backwards: the first argument is the destination, the second the source. That’s just the way x86 does things (unless you’re one of those weirdos using the GNU assembler). You get used to it.

  • int, short for “interrupt”, is a programmatic way of invoking processor interrupts. The x86 architecture allows up to 256 of these, though the first 16 are for the CPU itself, and the next are taken up by the BIOS. DOS uses a few of its own for its API. Interrupt 0x21 (21h) is the main one.

  • Since there are only 256 possible interrupts and far more useful operations, the OS needs some way of subdividing them. For DOS, that’s what AH is for. A “function code” is stored in that register to specify which API function you’re calling. The other registers hold arguments or pointers to them.

  • Function 0x09 (09h) of interrupt 0x21 writes a string to the console. The string’s address is stored in DX (with some segment trickery we’ll discuss in a later post), and it must end with a dollar sign ($). Why? I don’t know.

  • Function 0x4c (04ch) exits the program. AL can hold a return code. Like on modern operating systems, 0 is “success”, while anything else indicates failure.

  • db isn’t an actual assembly instruction. Much like in 6502-land, it defines a sequence of literal bytes. In this case, that’s a string; the assembler knows how to convert this to an ASCII sequence. (“Where’s Unicode?” you might be wondering. Remember that DOS is halfway to retirement age. Unicode wasn’t even invented before DOS was obsolete.)

Going from here

If you like, you can run the FreeDOS image on Virtual x86. It comes with both source and executable for the above snippet, and the original source file even includes compilation directions. And, of course, you can play around with everything else the site offers. Meanwhile, I’ll be working on the next step in this journey.

One thought on “Assembly: the precursor”

Leave a Reply

Your email address will not be published. Required fields are marked *