Assembly: the old set

The x86 architecture, at the assembly level, is quite a bit more complex than our old friend, the 6502. Part of that comes from the fact that it does so much more: it has more registers, a bigger address space, and just more functionality altogether. So it has to be more complex. Maybe not as much as it is, but then it also had to maintain some backward compatibility with Intel’s older processors, and compatibility always complicates matters.

But we won’t judge it in this post. Instead, we’ll look at it as impartially as possible. If you’ve read the earlier posts in this series on assembly language, you know most of the conventions, so I won’t repeat them here. Like with the 6502,, I’ll look at the different instructions in groups based on their function, and I’m ignoring a lot of those that don’t have much use (like the BCD arithmetic group) or are specifically made for protected mode. What you get is the “core” set at the 286 level.

The x86 instruction set

Even stripped down to this essence, we’re looking at around 100 instructions, about double the 6502’s set. But most of these have obvious meanings, so I won’t have to dwell on them. Specifics will mostly come when we need them.

Also, since I’m assuming you’re using NASM (and an older version, at that), the instruction format in the examples I use will also be for that assembler. That means a couple of things:

  • The destination always comes first. So, to move the contents of the DX register to AX, you say mov ax, dx.
  • Square brackets indicate indirection. Thus, mov ax, bx moves the contents of BX into AX, while mov ax, [bx] moves the value in the memory location pointed to by BX.
  • NASM requires size suffixes on certain instructions. These are mostly the “string” instructions, such as MOVS, which you’d have to write as MOVSB or MOVSW, depending on the width of the data.
Flags

The x86, like most processors, comes with a number of flags that indicate internal state. And, as with the 6502, you can use these to control the flow of your own programs. Those that concern us the most are the carry, zero, sign, overflow, direction, and interrupt flags. The first three should be pretty obvious, even if you didn’t read the first parts of the series. The interrupt flag is likewise mostly self-explanatory. “Direction” is used for string instructions, which we’ll see later. And the overflow flag indicates that the last operation caused signed overflow based on two’s-complement arithmetic, as in this example:

overflow:
    mov al, 127
    add al, 2
; overflow flag is now set because 127 + 2 = 129,
; which overflows a signed byte (129 ~~ -127)
    add al, 2
; now overflow is clear, because -127 + 2 = -125

The carry, direction, and interrupt flags can be directly altered through code. The CLC, CLD, and CLI instructions clear them, while STC, STD, and STI set them. CMC complements the carry flag, flipping it to the opposite value. You can also use PUSHF to put whole register onto the stack, or POPF to load the flags from there; these instructions weren’t on the original 8086, however.

MOV and addressing

The MOV instruction is the workhorse of x86. It covers loads, stores, and copying between registers, and later extensions have made it Turing-complete in its own right. But in its original form, it wasn’t quite that bad. Plus, it allows all the different addressing modes, so it’s a good illustration of them.

The function of MOV is simple: copy the data in the source to the destination. Despite being short for “move”, it doesn’t do anything to the source data. The source, as for most x86 instructions can be a register, memory location, or an “immediate” value, and the destination can be memory or a register. The only general rule is that you can’t go directly from memory to memory in the same instruction. (There are, of course, exceptions.)

Moving registers (mov dx, ax) and immediate values (mov ah, 04ch) is easy enough, and it needs no further explanation. For memory, things get hairier. You’ve got a few options:

  • Direct address: a 16-bit value (or a label, in assembly code) indicating a memory location, such as mov ax, [1234] or mov dh, [data].
  • Register indirect: three registers, BX, SI, and DI, can be used as pointers within a segment: mov al, [bx] loads AL with the byte at location DS:BX.
  • Indexed: the same registers above, now with BP included, but with a displacement value added: mov al, [bx+4]. (BP is relative to the stack segment, though.)
  • Base indexed: either BX or BP plus either SI or DI, with an optional displacement: mov [bx+si+2], dx. (Again, BP uses the stack segment, all others the data segment.)

So MOV can do all of that, and that’s before it got expanded with 32-bit mode. Whew. If you don’t like clobbering the old value at the destination, you can use XCHG instead; it works the same way. (Interestingly, the x86 do-nothing instruction NOP is encoded as xchg ax, ax, which really does do nothing.)

Arithmetic and logic

After all the moving around, computing on values is the next most important task. We’ve got most of the usual suspects here: addition (ADD or the add-with-carry ADC); subtraction (SUB or SBB); logical AND, OR, NOT, and XOR (those are their mnemonics, too). There’s also a two’s-complement negation (NEG) and simple increment/decrement (INC, DEC) operations. These all do about what you’d expect, and they follow the same addressing rules as MOV above.

We can shift and rotate bits, as well. For shifting, SHL goes to the left, while SHR or SAR moves to the right; the difference is that SHR always shifts 0 into the most significant bit, while SAR repeats the bit that was already there. (Shifting left, as you probably know, is a quick and dirty way of multiplying by 2.)

Rotating moves the bits that get shifted “off the edge” back around to the other side of the byte or word, but it can optionally use the carry flag as an “extra” bit, so we have four possible permutations: ROL, ROR, RCL, RCR. The “rotate with carry” instructions effectively place the carry flag to the left of the most significant bit. Note that both shifting and rotating can take an immediate value for the number of times to shift, or they can use the value in CL.

A couple of instructions perform sign-extension. CBW takes the top bit in AL and duplicates it throughout AH. CWD works the same way, cloning the high bit of AX into every bit of DX. These are mainly used for signed arithmetic, and the registers they work on are fixed.

Unlike the 6502, the x86 has built-in instructions for multiplication and division. Unlike modern systems, the 16-bit versions are a bit limited. DIV divides either AX by a byte or DX:AX by a word. This implied register (or pair) also holds the result: quotient in AL or AX, remainder in AH or DX. MUL goes the other way, multiplying AL by a byte or AX by a word, and storing the result in AX or DX:AX. Those are more than a little restrictive, and they’re unsigned by design, so we also have IMUL and IDIV. These are for signed integers, and they let you use an immediate value instead: imul ax, -42.

Two other useful instructions can go here. CMP subtracts its source value from its destination and sets the flags accordingly, but throws away the result. TEST is similar, logical-ANDing its operands together for flag-changing purposes only. Both of these are mainly used for conditional flow control, as we’ll see below.

Flow control

We can move data around, we can operate on it, but we also need to be able to change the execution of a program based on the results of those operations. As you’ll recall, the 6502 did this with branching instructions. The x86 uses the same mechanism, but it calls them jumps instead. They come in two forms: conditional and unconditional. The unconditional one, JMP, simply causes the processor to pick up and move to a new location, and it can be anywhere in memory. Conditional jumps are only taken if certain conditions are met, and they take the form Jcc, where cc is a condition code. Those are:

  • C and NC, for “carry” and “no carry”, depending on the carry flag’s state.
  • Z and NZ, “zero” and “not zero”, based on the zero flag.
  • O and NO, for “overflow” and “no overflow”; as above, but for the overflow flag.
  • S and NS, “sign” and “no sign”, based on the sign flag; “sign” implies “negative”.
  • B and NB, “below” and “not below”, synonyms for C and NC.
  • A and NA, “above” and “not above”; “above” means neither the carry nor zero flag is set.
  • AE, BE, NAE, NBE; the same as the last two pairs, but add “or equal”.
  • L and NL, “less than” and “not less than”; “less than” requires either the sign or overflow flag set, but not both.
  • LE and NLE, “or equal” versions of the above.
  • G, GE, NG, NGE, “greater than”, etc., for the opposites of the previous four.
  • CXZ and NCXZ, “if CX is/is not zero”, usually used for loops.

These can be confusing, so here are a couple of examples:

mov ax, [value1]
mov dx, [value2]

a_loop:
add ax, dx

; jump if ax > 127,
; otherwise try again
jo end
jmp a_loop

end:
; do something else

mov ax, [bp+4]
cmp ax, 0
; if ax == 0...
jz iszero

; else if ax > 0...
jg ispos

; else if ax < 0...
jl isneg

; or if something went wrong
jmp error

CALL calls a subroutine, pushing a return address onto the stack beforehand. RET is the counterpart for returning from one. INT and IRET work the same way, but for interrupts rather than subroutines; INT doesn’t take an address, but an interrupt number, as we have seen.

A special LOOP instruction allows you to easily create, well, loops. It uses CX as an implicit counter, stopping when it reaches zero. You might use it like this:

; clear the screen
mov ax, 0b800h
mov es, ax
xor di, di  ; quicker clear to 0
mov cx, 80 * 25
mov dl, 20h ; ASCII code for space

nextchar:
mov [es:di],dl
add di,2    ; skip video attribute
loop nextchar

Two variant instructions LOOPZ and LOOPNZ, require that the zero flag be set or cleared, respectively, or they end the loop prematurely.

The stack

All x86 programs have use of a stack, and it’s not limited to 256 bytes like its 6502 cousin. Accessing the stack can’t be done directly in 16-bit land, as there’s no way to address relative to SP, but copying its value into BP and accessing from that is common. But even better are PUSH and POP, which take care of everything for you. They can be used on any register—except that you can’t pop into CS—and even memory; PUSH can also put an immediate value on the top of the stack, though not on the original 8086.

The stack grows “downward”. When a value is pushed onto it, that value is moved into the position pointed to by SP, and SP is decremented by 2. Popping does the opposite. Effectively, the instructions work like this:

do_push:
    mov [sp], value
    sub sp, 2

do_pop:
    mov value, [sp]
    add sp, 2

PUSHA and POPA are shortcuts for pushing all of the main 8 registers, helpful when you need to save state before starting a big subroutine.

Strings

The x86 can’t really work on strings, but it can work with arrays of bytes or 16-bit words using simple instructions. This is done through five instructions that operate on either bytes or words; NASM requires a suffixed “B” or “W” for these, but I’ll refer to them with a general “x”.

In all these cases, the “source” address is, by default, DS:SI, and the destination is ES:DI. Also, because these instructions were meant to be done in blocks, they can take a special REP prefix. This works a bit like LOOP, in that it stops when CX reaches 0. (REPE and REPNE are also available, and they work like LOOPZ and LOOPNZ.) After the instruction performs its operation, it increments both SI and DI by 1 for the byte version, 2 for the word version. This is where the direction flag comes into play, however: if it’s set, these instructions instead subtract 1 or 2 from those registers, effectively performing the operation in reverse.

LODSx and STOSx load and store, respectively. LODSx puts the value at [DS:SI] into AL or AX, while STOSx moves AL or AX into memory at [ES:DI]. Either way, they then change the index register (SI or DI) as described above. REP doesn’t really make sense with these, but they can work in hand-rolled loops.

MOVSx is a little more general, and it’s one of the few memory-to-memory operations available on the early x86. It copies a byte or word at [DS:SI] to [ES:DI], then changes both index registers based on the data width (1 for byte, 2 for word) and the direction flag (up for cleared, down for set). It’s all but made for block copying, as in this example:

; assumes SI and DI point to appropriate memory areas
; and CX holds a count of bytes to move
memcpy:
rep movsb
ret

CMPSx compares bytes or words, setting the flags accordingly. It could be used to implement a string comparison function like so:

; assumes SI and DI point where they should,
; and CX contains the max number of characters to test
; returns a value in AL:
; -1 if the "source" (1st) string is less,
; +1 if it's greater,
; 0 if they're equal
strncmp:
xor al, al
repe cmpsb
jg greater
dec al  ; sets to FFh, or -1
jmp exit

greater:
inc al  ; sets to 01h

ret

Finally, SCASx sets the flags based on a comparison between AL (for bytes) or AX (for words) and the value at [ES:DI]. The mnemonic stands for “scan string”, and that’s what it can do:

; assumes DI points to a string,
; CX holds the length of the string,
; and AL holds the character to search for
; returns in AX:
; position of found character, or -1 if not found

contains:
mov dx, cx
repne scasb
jncxz found

; character not found, since we ran out of string
mov ax, 0ffffh
jmp end

found:
; CX now holds the number of characters from string end,
; but we saved the original length in DX
; thus, the position is DX - CX - 1
inc cx
sub dx, cx
mov ax, dx

end:
ret
Input and output

Input and output send bytes or words between registers and the I/O space. This is a special set of 64K (65,536) memory locations, though only the first 1,024 were used on early PCs. Using them involves the IN and OUT instructions. These are fairly restrictive, in that they imply the AL or AX register for the data and DX for the I/O port: in ax, dx or out dx, al. However, for the “low” ports, those with addresses up to 0xff, you can instead use an immediate version: in al, 40h.

The 286 added in string I/O with the INSx and OUTSx instructions. These work similarly to LODSx and STOSx above, but the data is either coming from or going to an I/O port instead of main memory. (This was a bit faster than doing a manual loop, and some early peripherals actually couldn’t handle that!) The port number is always in DX, while [DX:SI or [ES:DI] is the data pointer, as above.

Enough for now

And we’re finally done. Next time, we can start programming this thing, but this post is already way too long, so I’ll see you later.

Mothers

Yesterday was Mother’s Day. (Coincidentally enough, it was also my mom’s birthday.) That’s a time to reflect on motherhood in general, or on how it has affected us. If you think about it, being a mother is the most altruistic thing a woman can do. Months of pregnancy, hours of labor, and years of care, all for the biological, psychological, and physical well-being of another. Look at it from that perspective, and they deserve a whole lot more than just one day.

I consider myself blessed to have grown up so close to my mother and, until about a year ago, her mother. Not everyone has that opportunity, unfortunately. So let’s take this day off to think about what we do have. Let’s bring a little bit of yesterday into today.

If you’re a writer, what are your characters’ mothers like? What do they think of the adventures of their children?

For worldbuilders, today’s the day to contemplate motherhood as a cultural notion. Are mothers revered among your fictitious people? Are they granted leniency or honor that a childless woman wouldn’t receive?

And for everybody: we all had a mother at some point. Some of us still do, and we should be thankful.

Let’s make a language – Part 15a: Color terms (Intro)

Once you have the grammar parts figured out, most of the rest of the conlanging process is making words. We began to see that in Part 14, when we discussed deriving new words from existing roots. This time around, we’re going back to the roots (pun intended) and looking at a very specific set of words: the color terms.

Color terms are, well, terms for colors. They’re the names you see on crayons or paint swatches. As anyone who has been to a hardware store knows, there are thousands of these, but we’ll focus on the absolute basics. Most colors are named after things that are that color, like “violet” or “salmon”. A few, however, are truly basic: “red”, “yellow”, “black”, and so on. These are the ones that most interest us here.

More importantly, which color terms are considered “basic” turns out to be a way in which languages differ. That makes this subject an excellent illustration of how a language can divide up the “semantic space”. Not every language is the same in this respect, and realizing that is a good step towards creating a more naturalistic conlang, rather than a simple cipher of your native tongue.

The color hierarchy

Every language has at least two basic colors. That seems to be a linguistic and cultural universal. But according to a study by Berlin and Kay (1969), what comes next follows a fairly regular trajectory. To be sure, there are outliers, but the past few decades have only reinforced the notion of a developmental hierarchy of color terms, making it a useful model for conlangs.

The first distinction in color is near-universal: light and dark. This can also be black and white or warm and cool; the specifics won’t matter too much. Mostly, yellow and red fall in with white in this scheme, while blue and green are dark. Other colors, like purple, brown, or orange, fall in somewhere along this spectrum. Exactly where is different for each language. It’s easy to see pink as “light” and purple as “dark”, but what about a soft lavender or a deep ruby?

At some point, probably fairly early in a culture’s history, a new color term comes about, splitting “light” into white and red. This seems obvious, as blood is red, and it’s a very important part of humanity. Yellow also tends to get lumped in with red in this scheme, meaning that most oranges do, too.

The next two colors to “break off” are green and yellow, in either order. Green can come first or yellow can, but they both need to be present before the next stage can begin. Once a language has these five color terms—black, white, red, green, yellow—then it’s on to the sixth and final major color: blue.

These six are the main group, then, and there’s a very good reason why. Human vision, as anybody who took biology knows, has two key parts: rods and cones. The rods are monochromatic, distinguishing only light and dark; in other words, just like a two-color-term language. The cones, however, are how we see color. They come in three flavors, roughly corresponding to red, green, and blue.

So that’s probably a good explanation for the first six basic color terms. Red has the longest wavelength, so it’s the easiest to see, hence why stop signs and a car’s brake lights are red. It stands to reason that it would be singled out first. The eye’s green cones tend to be the most sensitive, but green and yellow are pretty close together, spectrally speaking, so they’re the next two, but their similarity leads to the flip-flop in which comes first. And then that leaves blue.

What about the others, though? Well, there it gets murky. Brown is usually the seventh basic color, distinguished from red and yellow. After that, there’s no real set order among the next four: orange, pink, purple, and gray. But those eleven, possibly accompanied by one or more lighter or darker shades (cyan, magenta, azure, etc.), make up the core color terminology of the majority of languages.

The rest of the box

All the other colors’ names will be derived in some way, and that can include some from the above list, if a language doesn’t have a full complement of basic terms. One way of doing this is with adjectives that specify a particular shade of a color. English has lots of these: dark, light, pale, deep, etc. The new color names produced with them aren’t single words, but phrases like “dark blue” or “pale pink”; other languages might have ways of compounding them, though.

Compounds give us another way of making new color words. By combining two basic colors, we can get new ones. That’s how we have “red-orange” or “blue-green”, to name but two. They’re in-between colors, and they tend to be composed of two colors adjacent on the spectrum. It’s hard to imagine a “yellow-blue” that isn’t green, for instance.

Another possibility is the abstract color word. These aren’t basic terms; instead, they tend to come about as finer distinctions of shade. They may have started off with some other meaning, but they now refer almost exclusively to a specific range of colors. Maroon and cyan are a couple of English examples.

By far, though, the best way of making names for colors is through description. Something that has a certain color becomes a descriptor for that particular color—“navy blue”, for instance—then, eventually, the color’s name. That’s how it worked for salmon, coral, violet, lavender, and hundreds of others. It may have even been the case for orange, as the fruit’s name seems be older than the color term. And if the original reason for one of these names is lost, then it may come to be considered an abstract term; indigo is one color that has gone through this process.

Using all these, a language can easily fill up even the biggest box of crayons. But the more color terms you have, the less of the color space each one covers. There will be overlap, of course, and the general terms will always cover more area than the more specific ones. And every language makes its own distinctions. The border between, say, red and yellow isn’t set in stone.

Even weirder

A few conlangers like making languages for speakers that aren’t ordinary humans. Since we’re moving into more culture-specific parts of language, this is a good opportunity to look at what needs to be done for that sort of conlang.

If the prevailing theory is accurate, basic color terms come about in the order they do because of human vision, as we saw above. A race that doesn’t follow normal human rules, however, will have a different color hierarchy. Some people, for example, have a fourth set of cone cells, purportedly letting them see otherwise “impossible” colors. Tetrachromats, as they’re called, effectively have a fourth primary color at their disposal.

An entire race (in the literary sense) of tetrachromats would have a language that reflects this. Where their fourth color fits into the hierarchy would depend on the specifics of how that fourth cone cell works, but it would certainly be in that first group alongside red, green, yellow, and blue.

Similarly, red-green colorblindness could be the norm for a race. In that case, red and green wouldn’t differentiate, obviously, but the rest of the diminished color space would also be changed. In fact, it’s easy to imagine such a race never getting past the light/dark stage.

And no discussion of color vision would be complete without including the neighboring portions of the spectrum. The human lens blocks ultraviolet, but some people report being able to see it. Vision reaching into the infrared is a little more plausible for our species. Aliens, though, could have their equivalent to cones reach their peak sensitivity at different points of the spectrum, allowing them to see into the deeper or higher ranges. Their color terms would likely reflect this, and an alien race could have a whole collection of words for color combinations that we simply cannot see.

Next up

Next time, we’ll look at our two conlangs and their color words. Then, it’ll be off to another part of the semantic realm, but I don’t yet know exactly which one. Stay tuned.

Assembly: the standard machine

This crazy thing we call the PC has been around for over 30 years now, and it’s quite far from its original roots as IBM’s business-oriented machine. The numbers involved are mind-boggling in their differences. The first PC ran at 4.77 MHz, while today’s can top 4 GHz. (And, thanks to parallelization, cache, and all those other advances we’ve seen, that adds up to far more than a 1000x increase in computing power.) Intel’s 8086 processor could address 1 MB of memory; modern computers have multiple megabytes inside them. Back then, the primary medium for storage was the floppy disk, and the PC’s disks weighed in at a hefty 360 KB; nowadays, you can find USB flash drives capable of holding more than a million times that.

And people still managed to get things done with machines that are easily outdone by our watches, our refrigerators, our thermostats. A lot of people today can’t even comprehend that, including many—I know quite a few—who used to get by without computers at all! But histories of the PC are everywhere, and we have a specific reason for our trek down memory lane. So let’s take a look at these ancient contraptions from the assembly programmer’s point of view.

The ideal PC

Since we’re using an emulator for our programming, we don’t need to go into a lot of hardware specifics. It doesn’t particularly matter for the purposes of this series that the 8088 had only an 8-bit bus, or that the 486SX lacked a built-in floating-point unit. (I had one of those on my first PC. It sucked.) From our level, those details are mostly irrelevant. What does matter is, for lack of a better term, the hardware interface.

For this series, I’m assuming an “ideal” PC loosely based on real ones. It’s not an original PC, nor is it an XT, AT, or something later. It’s really PC compatible, but nothing more. It roughly conforms to the specs, though some things might be different. That’s intentional, and it’s for ease of explanation. (It also means I can reuse knowledge from all those old DOS tutorials.)

Our idealized machine will use a 286-level processor running in real mode. That lets us focus on a simplified architecture (no protected mode, no 32-bit stuff) that isn’t too far off from what people were using in the late 80s. It does mean limitations, but we’ll work around them.

We’ll also assume that we have access to VGA graphics. That was not the case for the early 286-based PCs; they had to make do with EGA and its 16 colors, at best. Again, this makes things easier, and you’re free to look up the “real thing” if you wish.

Finally, we’re using FreeDOS in the v86 emulator until further notice. It’s not exactly Microsoft’s old DOS from the days of yore, but it’s close enough for this series. Basically, assume it has the features a DOS is supposed to have.

Now that that’s out of the way, let’s see what we have to work with.

Memory

We effectively have a 286 processor, which we looked at last week. Since we’re ignoring protected mode entirely, that means we have access to a total of 1 MB of memory. (Note that this is a real megabyte, not the million bytes that hard drive manufacturers would have you believe.) That memory, however, is divided up into various portions.

  • The first 1 KB is reserved by the CPU as the interrupt vector table. The x86 allows 256 different interrupts, whether caused by hardware or software. This table, then, holds 256 4-byte addresses, each one a pointer to an interrupt handler. For example, the core DOS interrupt, 21h, causes execution to jump to the interrupt routine whose address is stored at 0x21 * 4 = 0x84, or 0000:0084.

  • The next 256 bytes, starting at 0040:0000, are the BIOS data area. (I’ll explain the BIOS in a bit.) Much of this space has special meaning for the BIOS, so it’s a bit like the zero page on most 6502 computers.

  • DOS, the BIOS, and the PC all use bits of the next 256 bytes, starting at 0050:0000.

  • DOS effectively begins loading at 0060:0000, though how much memory it uses from here depends on a wide variety of factors.

Whatever is left after this is available for program use, up to the PC’s RAM limit. Everything else in the address space, starting at A000:0000, is given to the various types of adapters supported by the architecture. Even on later systems with expanded memory capabilities, this conventional memory limit of 640 KB remained. Yes, the original PC was limited to 640K of RAM—leading to the famous quote Bill Gates may or may not have uttered—but this wasn’t as stringent as it seems; the first IBM models only had 256K, and that was still four times what the 6502 could understand.

The remaining 384 KB of the old PC’s range was for the display, the BIOS, and any expansions that may have been around:

  • A 64K block starting at a000:0000 is the video buffer for color graphics modes, including the VGA-alike our idealized PC uses. (Modern graphics cards use high-speed direct memory accesses, but this area is still there for “fallback” mode.)

  • The next 32K, from b000:0000, is used by monochrome video adapters for text. Some systems that neither had nor supported monochrome used this space as extra memory.

  • Another 32K block, starting with b800:0000 or b000:8000 (they’re the same thing), is a very big deal. It’s where the color text-mode memory lies, so manipulating this area will directly affect the contents of the screen. It’s a linear array of words: the low byte holds the character, the high byte its foreground and background colors. We haven’t seen the last of this space.

  • Starting at c000:0000, things get murkier. Usually, the 32K starting here is where the video BIOS lives. After that is the hard disk BIOS, if there is any. Everything from d000:0000 to f000:0000 was intended for expansion; add-ons were expected to set themselves up at a particular place and define how much memory they were going to use. Of course, conflicts were all too easy.

  • The last big block of memory begins at f000:0000. It’s meant for the BIOS, though the original one only used the last 8K (f000:e000 and up). By the way, the processor starts at f000:fff0, which is why the first BIOS went at the end of the block.

  • Finally, the last segment, ffff:xxxx, wraps around to zero on an 8086, but it can access a little bit of higher memory on the 286 and later. That’s the high memory area that DOS users learned so much about back in the day. It won’t really concern us much, but it’s good to know it exists, especially if you’re reading older literature that refers to it.

The BIOS

BIOS (Basic Input/Output System) is the name given to the PC’s internal code, as well as the programmer-accessible interfaces that code provides. The BIOS is mainly responsible for taking a bare x86 processor and turning it into a usable system; in that, it’s like the system ROMs of older computers. Different BIOSes over the years have offered numerous features—some of them can now run entire operating systems, while Coreboot is an OS—but they all also contain the same basic functions that you had 30 years ago.

Rather than using a vector table, like that of a Commodore 64, the PC BIOS provides access to its API through interrupts. By placing certain values in registers before invoking the interrupt, you can select different functions and pass arguments to them. Return values would likewise be placed in registers, sometimes the same ones. As an example, BIOS interrupt 10h controls the video system, and one of its available functions is setting the cursor position. To do that, we need to load the number 2 into AH, among other things, as in this snippet:

curs_set:
    mov ah, 2   ; function number
    mov bh, 0   ; "page number": 0 is the "main" page

    ; these would probably be previously defined
    mov dh, row
    mov dl, column

    int 10h

We’ll be seeing the BIOS later on, but we definitely won’t meet more than a fraction of its interface.

Peripherals and I/O

The PC was designed with peripherals in mind. Floppy disks, hard disks, printers, and many other things came to be attached to these computers. Usually, they came with drivers, and the BIOS could talk to some of them, but programmers occasionally needed to access them directly. Nor were these methods mutually exclusive. After all, it was entirely common to bypass the BIOS and write directly to video memory, because that was far faster.

Under our assumptions, we’ve got a VGA graphics adapter. We’ve got our keyboard and mouse, which the emulator…well, emulates. The FreeDOS image we’ll be using includes one floppy disk, but that’s it. Other than that, we have the PC speaker and a few other odds and ends. Not much, but plenty to explore. But that’s for later.

The operating system

Today, we use Windows or Linux or OSX or whatever, and we only really think about the operating system when it won’t work. But almost every home PC of the late 80s ran DOS first, and that’s what people learned. FreeDOS is like an extension of that, an upgraded version that can run on modern machinery. It uses the same interfaces for “legacy” code, however, and that’s what we want. And v86 includes a preset FreeDOS disk image for us, so that’s a bonus.

DOS, as derided as it is today, actually packed a lot of functionality into its tiny innards. Sure, it didn’t have true multitasking, and you needed hacks to use more than 1 MB of memory. Some of its internal functions were slow enough that it was better to write to the hardware directly. And it even left some things up to the BIOS. But it was far better than nothing.

I could make this a “write a 16-bit OS” series, but I won’t. Because I don’t have to. That’s one of the few good things about DOS: it’s so simple that you can almost ignore it, but you can still use it when it’s what you need. So that’s what I’ll be doing. I mean, it worked for fifteen years, right?

And so on

There’s lots more I could say about the PC platform, whether the real thing or our nostalgic remembrance. But this post is already getting too long. Everything else can come up when it comes up, because now we have to move on to the whole reason for this series: the x86 assembly language. Next time, you’ll see what it looked like (not what it looks like now, though, because it’s totally different). Maybe we can even figure out why so many people hated it with such passion. Oh, and we’ll have some actual code, too.

Fantasy governments

As we get ever deeper into this seemingly unending election season, it’s hard not to think about governments. I’ve already done a post about the future of government, covering science fiction, but what about the past? What does government look like in a fantasy world?

Many works, when they think about it at all, default to the “feudalism-lite” model of D&D and video games. I call it “lite” because, while it does bear some of the hallmarks of medieval European feudalism—the hierarchical structure, the figurehead monarchy—it lacks the deeper roots of feudalism. Rarely will you see kings asserting their divine right to rule, for instance. The lower classes are given much more freedom, especially of movement, than they had historically.

In essence, fantasy feudalism is more like the later days, when the system was breaking down. After the Black Death wiped out so much of Europe’s population, those who survived effectively became that much wealthier. They did begin to gain some of the freedoms that Dark Age serfs lacked, simply because they knew that the labor vacuum made them more valuable. It’s in this era that we also see the rise of the merchant republics in Italy and the stirrings of absolutist monarchy in France, and this is when the idea of class warfare truly begins.

From a storytelling point of view, that’s a good compromise. Before the plague, it was much harder for people to rise above their station. After, they had a bit more upward mobility. It still wasn’t quite the free-for-all of much fantasy, where random peasants address the king with familiarity and candor, but we can make allowances for dramatic effect.

But the world of fantasy gives us so much more. With a little bit of worldbuilding work, we no longer have to settle for the stripped-down version of late feudalism popular in sword-and-sorcery fiction. If we put some thought into it, we can do better.

Low fantasy

It’s popular to divide fantasy into “low” and “high”, largely based on the amount and power of magic available to the world. Game of Thrones and the books that spawned it are, in this system, low fantasy (though getting higher with each volume or season). Something like the Dragonlance series, by contrast, has lots of powerful wizardry, so it’s classified as high fantasy. Since low fantasy is closer to our world, we’ll start with it.

Most systems of governing in a low fantasy world will resemble ours quite closely, as those worlds are very similar to our own. Monarchies are thousands of years old in our world, and those seem to be the most common everywhere, so they’ll be well-represented. Republics, of the Roman style, are rarer; if they exist in a fantasy realm, they should have a good backstory to explain why. Finally, as much as we uphold it as an ideal, democracy is historically highly uncommon on the national level. In older ages, it breaks down as populations grow; it’s entirely possible for an early democracy to evolve into a republic as people decide that voting on everything is a waste of time.

Each of these major types of government covers a broad range of political theory. Monarchies can be absolutist or dictatorial, with a king or emperor ruling with an iron fist, or they can be parliamentary, as England became in the 13th century. A republic can be full of partisan bickering, even in medieval times, and it has a clear path to a parliamentary system, simply by electing a leader from the representative body.

But fantasy also gives us the opportunity to explore other methods of government, those that didn’t gain purchase in our Western societies for whatever reason. Some might not have been possible for us, either given the evolutionary history of European culture or the limitations of the medieval world. So let’s take a look, shall we?

Socialism is a hot topic right now, no matter where you are in the Western Hemisphere. Definitions differ, but the general idea is a state where everyone contributes to the populace as a whole. It’s usually highly centralized, enforcing a redistribution of wealth from rich to poor (a welfare state, in other words), and offering numerous public services.

In earlier times, public and social services seem to have always existed on some scale, so it’s not beyond the realm of possibility to have a socialist state. Producing it from a monarchy might be unlikely, but republics can do it. Socialism does appear more likely to come about early in the development of a civilization, at the tribal or village level. After all, it’s easier to redistribute wealth when there’s not that much of it, and sharing—socialism is just institutionalized sharing—is as old as humanity.

Communism, on the other hand, is the product of 19th century political thinking. The original idea, basically, was to empower the working classes at the expense of the educated, noble, or otherwise privileged. That didn’t work, and just about every communist state in our world has either turned into a dictatorship or oligarchy (USSR, China) or grown towards capitalism (Vietnam). Medieval-era fantasy likely won’t have the chance to try, unless they have some bright thinkers to come up with the notion in the first place.

Theocracy is, literally, rule by religion. We’ve seen a few attempts at a theocratic state throughout history. Papal Rome might be considered one, at least when it wasn’t just a regular autocracy that happened to be ruled by the Pope. The followers of Muhammed after his death tried to implement a government based on their scriptural writings. And, of course, many of today’s terrorist groups claim to want the same. Nobody’s really succeeded for any length of time, though, except maybe the repressive, authoritarian regimes in Iran and Saudi Arabia.

For fantasy, it’s entirely possible to have a theocracy. (High fantasy has it even easier, since you can have the gods themselves intervene, but we’re getting ahead of ourselves.) At its core, theocracy isn’t that much more than a monarchy ruled by a religious leader. Its code of laws will be scripture. But theocracies are highly conservative by their very nature, and they don’t exactly tend to be breeding grounds for new advances in any field other than, well, theology. For that reason, fantasy theocracies might work best as a “bad guy” government.

High fantasy

With the addition of magic and the divine, high fantasy opens up a few more options for government, some that we cannot emulate in our world. That does mean it’s harder for an author to imagine how they would work, but they’re great for making a place truly exotic.

First, as noted above, theocracy gets a boost from being in high fantasy. This direct theocracy, as I’ll call it, is one where divine beings directly interfere with the workings of a state that follows them. At the far end, it degenerates into an absolute dictatorship, one controlled by a tyrannical deity, probably something far more horrific than anything ISIS could do. But there is a place for a less-awful direct theocracy, especially in a polytheistic culture. In a way, that one could conceivably turn into a kind of theocratic republic, where party lines are drawn based on which god’s teachings you follow.

The idea of a government run by magical means is probably as old as fantasy itself. This thaumatocracy can take many forms. Rule by the adept is a subset of oligarchy, roughly equivalent to republics where only landowners could be elected as representatives. Using magic itself to rule or otherwise control the populace edges closer to socialism or even communism. And if magic can in any way be used for warfare, then there’s also the potential for a strong practitioner to rise to autocracy. So this one is highly sensitive to conditions, and which outcome you get will depend on history.

If magic (sorcerous, divine, or whatever) can contact or summon the dead, then there’s the chance that a government based on this could form. It’s even got a name: thanatocracy, rule by the dead. The Inca are said to have believed that their deceased rulers could continue to influence the living; thanatocracy is the logical extension of that to a world where they really can. By its very nature, this would be a very conservative state, probably one founded by a culture practicing ancestor worship. There’s the potential for an oligarchy to form, if talking to the dead is a skill available only to a cabal of priests or wizards. But the nature of the afterlife will also play a big role, as will the number of dead consulted for questions of government.

Slight modifications

In a few cases, it’s not the type of government that’s unrealistic or ahistorical, but some defining quality of it. The following are a few subsets of governments that have the possibility of existing in fantasy:

  • Matriarchy is one of the most popular. Traditionally in most societies with inherited power, the right to rule passes down through the male line first. There are very good biological, sociological, and historical reasons for this, but fantasy cultures don’t have to follow our rules. It’s easy to envision, for instance, a matriarchal monarchy, one ruled by a queen who is succeeded by her eldest daughter. You likely want to have some reason why the men weren’t in power; perhaps this is a non-human race, like D&D’s drow.

  • Meritocracy is a high ideal of a lot of thinkers. Its goal is that rule goes to those most qualified, probably as determined through some sort of examination. China tried something like this, but it was never as successful as it could have been, because the political machinery needed to start a meritocracy is easy to “break”. Like a radioactive element, meritocracy decays into bureaucracy. Those in power adjust the qualifications so they stay in power. But maybe a fantasy culture could break that cycle.

  • The junta or other forms of military dictatorship can readily be adapted to a fantasy setting. We have all too many examples, both in the real world and in fiction, but there’s always space for new ideas. Militaries tend to come to power when they overthrow a legitimate government, so there is a ready-made source of conflict. And it doesn’t take much for them to break into factions, each led by a warlord who thinks he has sole right to rule.

Keep thinking

I’m sure you can come up with other ideas. An earlier post goes into a bit more detail about creating your own governments. Extrapolating to a fantasy world is fairly straightforward. Remember that a government, as with any part of society, is rarely created from scratch. It has a history, even if you never write it. The more outlandish it is, the better chance you’ll need to defend it at some point. So, for those “crazier” governments, think a little more about how they came about. Usually, you can find something that’ll works.

The future of auxlangs

Auxlangs are auxiliary languages: conlangs specifically created to be a medium of communication, rather than for artistic purposes. In other words, auxlangs are made to be used. And two auxlangs have become relatively popular in the world. Esperanto is actually spoken by a couple million people, and it has, at times, been considered a possibility for adoption by large groups of people. Lojban, though constructed on different principles, is likewise an example of an auxlang being used to communicate.

The promise of auxlangs, of course, is the end of mistranslation. Different languages have different meanings, different grammars, different ways of looking at the world. That results in some pretty awful failures to communicate; a quick Internet search should net you hundreds of “translation fails”. But if we had a language designed to be a go-between for speakers of, say, English and Spanish, then things wouldn’t be so bad, right?

That’s the idea, anyway. Esperanto, despite its numerous flaws, does accomplish this to a degree. Lojban is…less useful for speaking, but it has a few benefits that we’ll call “philosophical”. And plenty of conlangers think they will make the one true international auxiliary language.

So let’s fast-forward a few centuries. Esperanto was invented on the very edge of living memory, as we know, and Lojban is even younger than that, but Rome wasn’t built in a day. Once auxlangs have a bit of history behind them, will any of them achieve that Holy Grail?

The obvious contender

They’d have to get past English, first. Right now, the one thing holding back auxlang adoption is English. Sure, less than a quarter of the world’s population speaks it, but it’s the language for global communication right now. Nothing in the near future looks likely to take its place, but let’s look at the next best options.

Chinese, particularly Mandarin, may have a slight edge in sheer numbers, but it’s, well, Chinese. It’s spoken by Chinese, written by Chinese, and it’s almost completely confined to China. Sure, Japan, Korea, and much of Southeast Asia took parts of its writing system and borrowed tons of words, but that was a thousand years ago. Today, Chinese is for China. No matter how many manufacturing jobs move there (and they’re starting to leave), it won’t be the world language. That’s not to say we won’t pick a few items from it, though.

On the surface, Arabic looks like another candidate. It’s got a few hundred million speakers right now, and they’re growing. It has a serious written history, the support of multiple nations…it’s almost the perfect setup. But that’s Classical Arabic, the kind used in the Koran. Real-life “street” Arabic is a horrible mess of dialects, some mutually unintelligible. But let’s take the classical tongue. Can it gain some purchase as an auxlang?

Probably not. Again, Arabic is associated with a particular cultural “style”. It’s not only used by Muslims or even Arabs, mind you, but that’s the common belief. There’s a growing backlash against Muslims in certain parts of the world, and some groups are taking advantage of this to further fan the flames. (I write this a few hours after the Brussels bombings on March 22.) But Arabic’s problems aren’t entirely political. It’s an awful language to try to speak, at least by European standards. Chinese has tones, yes, but you can almost learn those; pharyngeal and emphatic consonants are even worse for us. Now imagine the trouble someone from Japan would have.

Okay, so the next two biggest language blocks are out. What’s left? Spanish is a common language for most of two continents, although it has its own dialect troubles. Hindi is phonologically complex, and it’s not even a majority language in its own country. Latin is dead, as much as academics hate to acknowledge that fact. Almost nothing else has the clout of English, Chinese, and Arabic. It would take a serious upheaval to make any of them the world’s lingua franca.

Outliving usefulness

It’s entirely possible that we’ll never need an international auxiliary language at all, because automatic translation becomes good enough for daily use in real-time. Some groups are making great headway on this right now, and it’s only expected to get better.

If that’s the case, auxlangs are then obsolete. There’s no other way of putting it. If computers can translate between any two languages at will, then why do you need yet another one to communicate with people? It seems likely that computing will only become more ubiquitous. Wearables look silly to me, but I’ll admit that I’m not the best judge of such things. Perhaps they’ll go mainstream within the next decade.

Whatever computers you have on your person, whether in the form of a smartphone or headgear, likely won’t be powerful enough to do the instantaneous translation needed for conversation, but it’ll be connected to the Internet (sorry, the cloud), with all the access that entails. Speech and text could both be handled by such a system, probably using cameras for the latter.

For auxlang designers, that’s very much a dystopian future. Auxiliary languages effectively become a subset of artlangs. But never fear. Not everyone will have a connection. Not everyone will have the equipment. It’ll take time for the algorithms to learn how to translate the thousands of spoken languages in the world, even if half of them are supposed to go extinct in the coming decades.

The middle road

Auxlangs, then, have a tough road ahead. They have to displace English as the world language, then hold off the other natural contenders. They need real-time translation to be a much more intractable problem than Google and Microsoft are making it out to be. But there’s a sliver of a chance.

Not all auxlangs are appropriate as an international standard of communication. Lojban is nice in a logical, even mathematical way, but it’s too complicated for most people. A truly worldwide auxlang won’t look like that. So what would it look like?

It’ll be simple, that’s for sure. Think something closer to pidgins and creoles than lambda calculus. Something like Toki Pona might be too far down the scale of simplicity, but it’s a good contrast. The optimum is probably nearer to it than to Lojban. Esperanto and other simplified Latins can work, but you need to strip out a lot of filler. Remember, everyone has to speak this, from Europeans to Inuits to Zulus to Aborigines, and everywhere in between. You can’t please everybody, but you can limit the damage.

Phonology would also tend to err on the side of simplicity. No tones, no guttural sounds half the world would need to learn, no complex consonant clusters (but English gets a pass with that one, strangely enough). The auxiliary language of the future won’t be Hawaiian, but it won’t be Georgian, either. Again, on the lower side of medium seems to be the sweet spot.

The vocabulary for a hypothetical world language will be the biggest problem. There’s no way around it that I can see, short of doing some serious linguistic analysis or using the shortcut of “take the same term in a few big languages and find the average”. Because of this, I can seriously see a world auxlang as being a pidgin English. Think a much simplified grammar, with most of the extraneous bits thrown out. Smooth out the phonology (get rid of “wh”, drop the dental fricatives, regularize the vowels, etc.) and make the whole thing either more isolating or more agglutinative—I’m not sure which works best for this. The end result is a leaner language that is easier to pick up.

Or just wait for the computers to take care of things for us.

Assembly: the architecture

The x86 gets a lot of hate for its architecture. In modern days, it’s really more out of habit than any technical reason, but the 16-bit variant really deserved some of its bad reputation. Other parts, in my opinion, were overstated. Hyperbole then became accepted wisdom, and positive reinforcement only made it grow.

But I’m not here to apologize for the x86. No, this post has a different purpose. It’s an overview of the x86 architecture from the point of view of an assembly programmer circa 1991. Although the 386 had been out for a few years, older processors were still around, and not too much was making full use of the 32-bit extensions that the 386 brought. DOS, particularly, remained 16-bit, though there were extensions to address more memory. For the most part, however, we’ll stick to “real mode”, like it was in the olden days.

The CPU

An x86 processor of this vintage was a lot less complex than today’s Intel i7 or AMD A10, and not just because it didn’t include integrated graphics, megabytes of on-die cache, power management functions, and so on. Later lines have also added lots of assembly-level goodness, like MMX and SSE. They’re 64-bit now, and that required the addition of “long mode”.

But let’s ignore all that and look at the “core” of the x86. There’s not that much to it, really. In the original conception, you have about a dozen programmer-accessible registers, all of which started out 16 bits wide, but exactly one of these is truly general-purpose. The registers can be divided into a few categories, and we’ll take each of them in turn.

General registers

These are the “big four” of the x86: AX, BX, CX, and DX. As I said last time, all four can also be accessed as a pair of byte-sized registers. The high bytes are identified by the first letter of the register name followed by H, while the low bytes use L. So we have AL or BH or whatever. It doesn’t actually increase the number of registers we have, but sometimes the savings from loading only a single byte can add up. Remember, older computers had less memory, so they had to use it more wisely.

Each of the four 16-bit general registers is used in its own special way. AX is the accumulator (like the A register from the 6502), and it’s usually the best for arithmetic; some instructions, like the multiplication instruction MUL, require it. BX is used as a “base” for a few addressing-type instructions. CX is semi-reserved as a loop counter. And DX is sometimes taken as an “extension” of AX, creating a kind of 32-bit register referred to as DX:AX.

Of course, if you’re not using the instructions that work on specific registers, you can do what you like with these. Unlike the 6502, where almost everything involved a memory access, x86 does let you work register-to-register. (On the other hand, it doesn’t have cheap access to the zero page, so there.) You can add AX to BX, for instance, and no one will care.

Pointer registers

The other four main registers all have something to do with pointers and addressing. You can use them as scratch space for arithmetic, but a lot of instructions assume they hold addresses. Unlike the general registers, all these are only 16-bit. (Modern systems do give you special access to the low byte of them, however.)

SP is the big one out of this group: the stack pointer. Stacks are a lot more important on the x86 than the 6502, mainly because that’s where you put your “extra” data that won’t fit in registers. But programmers usually don’t manipulate SP directly. They instead pop and push (note the terminology change from 6502), and those instructions change SP as needed. BP is an extra pointer register mostly used by languages like C to access stack “frames”, but assembly programmers can turn it into a general pointer.

The other two come in a pair: SI and DI. These stand for “source index” and “destination index”, respectively, and the processor uses them for certain load and store operations. Quite a few of the DOS and BIOS APIs expect them to hold pointers to input and output parameters. And on an early x86, they were the best option for indirect addressing, a bit like the 6502’s X and Y registers.

System registers

The instruction pointer, IP, controls the execution of code. It’s not directly accessible by programmers; instead, you change it through branching (jumping, in x86 parlance) and subroutine calls. In other words, you can mostly act like it’s not there.

The register that holds the flags, usually called FLAGS when it needs a name, also can’t directly be read from or written into. You can push it to the stack, however, then manipulate it from there, but the main three flags (carry, direction, and interrupt) have special instructions to set and clear them, similar to the 6502.

While the x86 has quite a few more flags than the 6502, most of them aren’t too important unless you’re delving deep into an operating system’s internals. The main ones to know about are the carry, zero, sign, direction, overflow, and interrupt flags. Most of them should be self-explanatory, while “overflow” works in a similar fashion to its 6502 counterpart. The direction flag is mostly used for string-handling instructions, which we’ll see in a later post.

One more register deserves a brief mention here. On the 286, it’s called the MSW, or “machine status word”. After that, it gets the official designation CR0. It’s used to control internal aspects of the processor, such as switching between real and protected modes or emulating a floating-point unit. I can’t think of a case where this series would use it, but now you know it’s there.

Segment registers

And then we come to the bane of many an assembly programmer, at least those of a generation or two ago: the segment registers. We’ll look at the x86 memory model in a moment; for now, just think of segments as something like overlapping banks of memory.

We’ve got four segment registers, all 16 bits wide even in 32-bit mode, for the code, data, stack, and extra segments. Their mnemonic names, conveniently enough, are initialisms: CS, DS, SS, and ES. CS points to the segment where execution is occurring; you can’t change it except with a “far” call, but you can read from it. SS holds the segment address of the stack, but you probably figured that one out already. DS is the default for reading and writing memory, while ES, as its name suggests, is for whatever you like.

Segment registers are weird. You can move values to and from them (except into CS, as I said), but you can’t operate on them. What you can do, however, is use them to “override” an address. For example, loading a value from memory uses DS as its base, but you can make it use ES instead: mov ax, [es:di] loads the value pointed to by DI, but in the ES segment.

Memory model

And that leads us to the x86 memory model. It’s a bit convoluted, since the original 8086 was designed as a 16-bit system that could address 1 MB of memory. Something had to give, but Intel took a…nonstandard approach.

Every address on the x86 has two parts: segment and offset. (This is true even on today’s processors, but 64-bit mode is hardwired to treat all segments as starting at address 0.) In real mode, as with an older x86 running DOS, an actual memory address can be obtained by shifting the segment 4 bits to the left and adding the offset. Or, to put it in code: address = (segment << 4) + offset. Each segment, then, can address a 64K block of memory in a fashion much like banking in the 8-bit world.

The difference between one segment and the next is only 16 bytes, thanks to the 4-bit shift. That means that segments will overlap. The addresses b000:8123 and b800:0173, for example, refer to the same memory location: 0xb8123. In practice, this doesn’t matter too much; the segment portion is mostly used as a base address, while the offset is, well, an offset.

In protected mode, throw those last paragraphs out. Segments instead are indexes into a lookup table, creating a virtual memory system that essentially went unused until its mercy-killing by AMD when they brought out the x86-64. (The segment-based virtual memory scheme of protected mode, interesting though it seemed, was basically an exercise in the knapsack problem.) We won’t be worrying about protected mode much, though, so let’s move on.

Back to real mode, the world of DOS and early x86s. A little bit of arithmetic shows that the segment:offset addressing method allows access to 1 MB of memory, more or less. 0000:0000 is, of course, address 0, and it’s the lowest possible value. The highest possible is ffff:ffff, and that presents a problem. Add it up, and you get 0x100fef. On old systems, this simply wrapped around the 1 MB “barrier”, to 0x00fef. (Once memory sizes expanded to multiple megabytes, it no longer overflowed, but some programs relied on that behavior, so a hardware hack was put into place. It’s called the A20 gate, and it was originally put in the keyboard controller, of all places. But I digress.)

Input/output

Also available to the x86 are the I/O ports. These are accessed using the IN and OUT instructions, a byte, word, or (in 32-bit mode) double-word at a time. They function like their own little address space, separate from main memory. The x86 architecture itself doesn’t really define which ports do what. That’s left to the PC platform—which will be the subject of the next post.

Modern operating systems also allow memory-mapped I/O access, but we’ll leave that alone for the time being. It’s far more useful when you go beyond the bounds of the 16-bit world.

Interrupts

Like the 6502, the x86 has support for interrupting the processor’s normal flow, but it goes about it in a different way. An interrupt can be caused by hardware; the old term IRQ, “interrupt request”, referred to this. But software can also directly invoke interrupts. As we saw last week, that’s how DOS implemented its API.

In real mode, the result is the same either way. The processor jumps to a specific memory location and executes a bit of code, then returns to what it was doing like nothing happened. We’ll see the details later on, but that’s the gist of it.

To be continued

So I’ve rambled on long enough for this post. Next up will be a look at the PC platform itself, at least as it stood a quarter century ago. That’ll be a look into deep history for many, but the choices made then still affect us today. Following that will be the dive into the deep end of “old-school” x86 assembly.

Magic and tech: defenses

Last time, we looked at how magic can augment a civilization’s offenses. Now, let’s turn to the other side of the coin and see what we can do about protecting ourselves against such force. It’s time to look at defense.

In the typical fantasy setting, sans magic, the common personal defense is, of course, armor. Sword-and-sorcery fiction often throws in some sort of spell-based defense, anything from walls of force to circles of protection to arrow-deflecting fields. And it’s a fairly common thing to give most potential offensive magic some sort of counterbalance. (The spell that can’t be blocked or resisted usually has a very good reason, and it’ll probably be a superweapon.) First, though, let’s look at what the mundane world has to offer.

Real-world protection

For personal protection, armor of various sorts has been around for millennia. Just about anything can be used as an armor material, as long as it does the job of preventing puncture or dissipating kinetic energy. Cloth, leather, many kinds of metal, wood, paper…you name it, somebody’s probably made armor from it. Exactly which material is used will depend on a civilization’s technological status, their geography (mo metal deposits means no metallic armor), their cultural outlook on warfare, the local climate, and many other factors. In general, though, pretty much everybody will use some armor, stories of naked Viking berserkers notwithstanding.

In the time period we’re focusing on in this series, the later Middle Ages, the best armor tended to be made of metal. But metal was relatively expensive, so not every single levied soldier is going to be running around in full plate. The best armor would be had by those with the means to procure it: nobles, knights, and the like. A well-equipped army will have better protection, naturally, while hurried musters of villagers will net you a company of men in whatever they could find, just like with weapons.

Remember that armor is designed as protection first, and most of its qualities will follow. The main type of injury it was protecting against was puncture—cutting and stabbing. Blunt trauma a very distant runner-up. We’ll take a look at medicine in a future post, but it’s helpful to think about how deadly even the smallest open wounds were back then. Without antibiotics or a working knowledge of sanitation and antiseptics, infection and sepsis were far more commonplace and far more dangerous. The best medicine was not to be wounded in the first place, and most armors show this.

Armor evolves alongside weapons. That’s why, once gunpowder spread to every battlefield in Europe, the heavier types of armor began to fall out of fashion. When fifty or more pounds of plate could no longer render you impervious to everything, why bother wearing it in the first place? (In modern times, materials science has advanced enough to create new plate that can take a shot, and now we see heavier armor coming back into vogue.)

Shields, in a sense, are nothing more than handheld armor. Some of them, depending on the culture, might have specialized defenses for a particularly common kind of attack. Others will instead use more of a weaker material, like your typical round shield made of hardwood. Again, guns tended to make most shields obsolete, at least until science could catch up. Today’s riot shields would make a 14th-century soldier salivate, but they’re based on the same old principles.

Larger-scale defenses work a different way. The usual suspects for city protection are walls, ramparts, moats, killing fields, and the like. Each one has its own purpose, its own specific target. Some of them fell by the wayside, victims of progress—how many modern cities have walls?—and some were remade to keep up. Most of them represent a significant allocation of materials and labor; bigger cities can afford that, but smaller towns might not be able to.

Magically reinforced

When the world becomes more dangerous as a result of weaponized magic, it stands to reason that new defenses will be developed to protect against such threats. One of the best ways of preventing injury, as we know, is never being hit at all. A spell to sharpen one’s senses lets a soldier react more quickly to an attack, meaning that there’s a better chance of dodging it. But that’s a waste of magical talent. Armies can comprise hundreds or even thousands of soldiers, and there’s not enough time (or enough mages) to enchant them all on the eve of battle.

Our “easy out” of stores of magical energy won’t help much here, so what can we do? Since personal defenses are, well, personal, and we’ve already said that very few people are mages, it doesn’t seem like we have a lot of options. Enchanted materials are the best bet. Armor can be fortified against breaking, making it harder to penetrate. It’s not perfect, but it’s a good start, and it will take a lot of heat off our soldiers.

It’ll also have a secondary effect, one that will come to the fore in later years. Harder, stronger materials push back the date of gunpowder-induced obsolescence by quite a while. A fortified plate across your chest won’t make you not feel a bullet, but it’ll stop that bullet from piercing your skin and hitting something vital. Like Kevlar jackets today, these would cause the impact energy to spread out, which lowers the pressure on any one spot. That’s enough to save lives, especially if the enchantment isn’t too costly. And it wouldn’t be, because it’s valuable enough to research better ways of doing it.

Fortified shields benefit in the same way, but there we get a side bonus. Shields can become stronger or they can become lighter. The second option might be a better one, if mobility is the goal.

Protecting against magical attacks is far tougher. Wards are the best way in our setting, but they have a severe downside: one ward only counters one specific type of attack. We’ve seen that magic gives us a bunch of new weapons. Warding against all of them is inconvenient at best, impossible at worst. This is a case for good espionage (another post idea!) and scouting—if you know what to expect, you’ll be able to defend against it. Still, armor can hold a few different wards, and those who can afford it will likely invest in a bit of extra protection.

On the large scale, we see the same ideas, just bigger. Wards can be made on walls, for example, and a gate can receive a fortifying enchantment. The increased size makes these ludicrously expensive, but can you put a price on the lives of your citizens? Moats, however, become practically useless, and drawbridges are little more than a degenerate case of a gate.

Picking up the pieces

Besieged settlements in our magical setting are far more perilous than anything medieval Europe knew. In pitched battles, too, the advantage will tend to go to the attacker. That isn’t too far off from what happened in our own world, from the Renaissance to the early days of the Industrial Revolution. Once gunpowder reigned supreme, defense took a back seat.

It’s the strategy and tactics that will change the most. Protracted sieges are less of a risk for the offensive side, as you can always bomb the city into oblivion. Staying in one place will only get you killed, so guerilla warfare becomes much more attractive for an outnumbered foe. It might be better for a defender to give up the city and work from the shadows as an organized resistance movement.

Magic, then, creates an asymmetry in warfare. This little bit of it gives the offense the edge. Defense needs a lot more help. Of course, it’s said that the best defense is a good offense. In our magical world, that won’t be so much a witty aphorism as a standard doctrine.

Let’s make a language – Part 14c: Derivation (Ardari)

Ardari takes a different approach for its word derivation. Instead of compounding, like Isian does, Ardari likes stacking derivational affixes. That doesn’t mean it totally lacks compounds, just that they take a bit of a back seat to affixes. Therefore, we should start with the latter.

Ardari’s three main parts of speech—noun, verb, and adjective—are mostly separate. Sure, you can use adjectives directly as nouns, and we’ve got ky to create infinitives, but there are usually insurmountable boundaries surrounding these three. The most regular and productive derivation affixes, then, are the ones that let us pass through those boundaries.

Making nouns

To make new nouns from other types of words, we’ve got a few choices:

  • -önda creates abstract nouns from verbs (luchönda “feeling”)
  • -kön makes agent nouns, like English “-er” (kwarkön “hunter”)
  • -nyn creates patient nouns from verbs, a bit like a better “-ee” (chudnyn “one who is guarded”)
  • -ymat takes an adjective and makes an abstract noun from it (agrisymat “richness”)

All of these are perfectly regular and widely used in the language. The nouns they create are, by default, neuter. -kön and -nyn, however, can be gendered: kwarköna denotes a male hunter, kwarköni a huntress.

Two other important nominal suffixes are -sö and -ölad. The first switches an abstract or mass noun to a concrete or count noun, while the second does the opposite. Thus, we have ichurisö “a time of peace”, oblasö “a drop of water”, sèdölad “childhood”, or kujdölad “kingship”. (Note that a final vowel disappears when -ölad is added.)

Ardari also has both a diminutive -imi and an augmentative -oza. These work on nouns about like you’d expect: rhasimi “puppy”, oskoza “ocean”. However, there is a bit of a sticking point. Diminutive nouns are always feminine, and augmentatives always masculine, no matter the original noun’s gender. This can cause oddities, especially with kinship terms: emönimi “little brother” is grammatically feminine!

The other main nominal derivation is po- (p- before vowels). This forms antonyms or opposites, like English “un-” or “non-“. Examples include poban “non-human” and polagri “gibberish”.

Most other derived nouns are, in fact, adjectives used as nouns, as we’ll see below.

Making adjectives

First of all, adjectives can be made by one of three class-changing suffixes:

  • -ösat makes an adjective from an abstract noun (idyazösat “warlike”)
  • -rät makes an adjective from a concrete noun (emirät “motherly”)
  • -ròs creates a “possibility” adjective from a verb (dervaròs “livable”)

Diminutives and augmentatives work as for nouns, but they take the forms -it and -ab, and they don’t alter gender, as Ardari adjectives must agree with head nouns in gender. Some examples: pòdit “oldish”, nejab “very wrong”.

We’ve already seen the general adjective negator ur- in the Babel Text. It works very similarly to English un-, except that it can be used anywhere. (The blended form u- from the Babel Text’s ulokyn is a special, nonproductive stem change.)

Most of the other adjective derivations are essentially postpositional phrases with the order reversed. Here are some of the most common:

  • nèch-, after (nèchidyaz “postwar”)
  • jögh-, before (jötulyan “pre-day”)
  • olon-, middle, centrally (olongoz “midnight”)
  • är-, above or over (ärdaböl “overland”, from dabla)
  • khow-, below or under (khowdyev “underground”)

Many of these are quickly turned into abstract nouns. For instance, olongoz is perfectly usable as a noun meaning “midnight”. Like any other adjective-turned-noun, it would be neuter: olongoze äl “at midnight”.

Making verbs

There are only two main class-changing suffixes to make verbs. We can add -ara to create a verb roughly meaning “to make X”, as khèvara “to dry”. The suffix -èlo works on nouns, and its meaning is often more nuanced. For example, pämèlo “to plant”, from pämi “plant”.

Repetition, like English “re-“, is a suffix in Ardari. For verb stems ending in a consonant, it’s -eg: prèlleg- “to relearn”. Vowel-stems instead use -vo, as in bejëvo- “to rethink”.

Ardari also has a number of prefixes that can be added for subtle connotations. The following table shows some of these, along with their English equivalents.

Prefix Meaning English Example
ej- for, in favor of pro- ejsim “to speak for”
èk- against anti- èksim “to speak against”
jès- with co- jèzgrät “to co-create”
nich- wrongly, badly mis- nichablon “to mishear”
ob- after post-/re- opsim “to reply”
sèt- before pre- sètokön “to precut”
wa- into in- wamykhes “to inquire”
zha- out of ex- zhalo “to expire”

Making compounds

Compounds aren’t as common in Ardari as they are in Isian, but they’re still around. Any noun can be combined with any other noun or adjective, with the head component coming last, as in the rest of the language.

Adjective-noun combinations are the most regular, like chelban “youth, young person”. Noun-agent is another productive combination: byzrivirdökön “bookseller”. Noun-noun compounds tend to be idiosyncratic: lagribyzri “dictionary”, from lagri “word” and byzri “book”.

Reduplicated adjectives are sometimes used for colloquial superlatives: khajkhaj “topmost”, slisli “most beautiful”.

A few words derived from nouns or verbs sit somewhere between compounds and derivational morphemes. An example is -allonda, from allèlönda “naming”. This one works a bit like English “-onomy”: palallonda “astronomy”. Another is -prèllönda, more like “-ology”: ondaprèllönda “audiology”. Finally, -benda and -bekön, from bejë-, work like “-ism” and “-ist”: potsorbekön “atheist” (po- + tsor + -bekön).

Make some words

As before, these aren’t all of the available derivations for Ardari. They’re enough to get started though, and they’re enough to accomplish our stated goal: creating lots of words!

Assembly: the precursor

Before Windows was a thing, there was DOS. Before today’s 64-bit processors, or even their 32-bit predecessors, there was the 16-bit world. In a way, 16-bit DOS is a bridge between the olden days of 8-bit microprocessors like the 6502 and the modern behemoths of computing, like the A10 in the PC I’m writing this on, or the Core i5 in the next room.

A while back, I started writing a short series of posts about assembly language. I chose the 6502, and one of the reasons why was the availability of an online assembler and emulator. Very recently, a new challenger has come into this field, Virtual x86. Even better, it comes with a ready-to-go FreeDOS image with the NASM assembler pre-installed. Perfect.

So, I’m thinking about reviving the assembly language series, moving to the (slightly) more modern architecture of 16-bit x86 and DOS. It’s a little more relevant than the 6502, as well as much more forgiving, but the fundamentals of assembly are still there: speed, size, power. And, since 16-bit code doesn’t run at all on 64-bit x86 CPUs, we don’t have to worry as much about bad habits carrying over. The use of DOS (FreeDOS, specifically) helps, too, since essentially nothing uses it these days. Thus, we can focus on the code in the abstract, rather than getting bogged down in platform-specific details, as we’d have to do if we looked at “modern” x86.

A free sample

Assembly in this old-school fashion is fairly simple, though not quite as easy as the venerable 6502. Later on, we can delve into the intricacies of addressing modes and segment overrides and whatnot. For now, we’ll look at the 16-bit, real-mode, DOS version of everyone’s first program.

org 100h

    mov dx, data
    mov ah, 09h
    int 21h
    mov ah, 04ch
    int 21h

data:
    db 'Hello, World!$'

All this little snippet does is print the classic string to the screen and then exit, but it gives you a good idea of the structure of x86 assembly using what passes for the DOS API. (Oh, by the way, I copied the code from Virtual x86’s FreeDOS image, but I don’t see any other way you could write it.)

Here are the highlights:

  • org 100h defines the program origin. For the simplest DOS programs (COM files), this is always 100h, or 0x100. COM files are limited to a single 64K segment, and the first 256 bytes are reserved for the operating system, to hold command-line arguments and things like that.

  • The 16-bit variety of x86 has an even dozen programmer-accessible registers, but only four of these are anywhere close to general-purpose. These are AX, BX, CX, and DX, and they’re all 16 bits wide. However, you can also use them as byte-sized registers. AH is the high byte of AX, AL the low byte, and so on with BH, BL, CH, CL, DH, and DL. Sometimes, that’s easier than dealing with a full word at a time.

  • mov is the general load/store instruction for x86. It’s very versatile; in fact, it’s Turing-complete. Oh, and some would say it’s backwards: the first argument is the destination, the second the source. That’s just the way x86 does things (unless you’re one of those weirdos using the GNU assembler). You get used to it.

  • int, short for “interrupt”, is a programmatic way of invoking processor interrupts. The x86 architecture allows up to 256 of these, though the first 16 are for the CPU itself, and the next are taken up by the BIOS. DOS uses a few of its own for its API. Interrupt 0x21 (21h) is the main one.

  • Since there are only 256 possible interrupts and far more useful operations, the OS needs some way of subdividing them. For DOS, that’s what AH is for. A “function code” is stored in that register to specify which API function you’re calling. The other registers hold arguments or pointers to them.

  • Function 0x09 (09h) of interrupt 0x21 writes a string to the console. The string’s address is stored in DX (with some segment trickery we’ll discuss in a later post), and it must end with a dollar sign ($). Why? I don’t know.

  • Function 0x4c (04ch) exits the program. AL can hold a return code. Like on modern operating systems, 0 is “success”, while anything else indicates failure.

  • db isn’t an actual assembly instruction. Much like in 6502-land, it defines a sequence of literal bytes. In this case, that’s a string; the assembler knows how to convert this to an ASCII sequence. (“Where’s Unicode?” you might be wondering. Remember that DOS is halfway to retirement age. Unicode wasn’t even invented before DOS was obsolete.)

Going from here

If you like, you can run the FreeDOS image on Virtual x86. It comes with both source and executable for the above snippet, and the original source file even includes compilation directions. And, of course, you can play around with everything else the site offers. Meanwhile, I’ll be working on the next step in this journey.