Assembly: the building blocks – Prose Poetry Code

(Editor’s note: Sorry about the formatting for this one. The syntax highlighter I use on here had absolutely no idea what to do with assembly—I think it thought it was Perl—and it screwed everything up. To turn it off, I had to wrap each code sample in HTML code blocks, which killed the indentation.)

So here we are. One of the reasons I chose the 6502 for this little series is because it has such a simple assembly language. It could fit in one post, even covering the intricacies of addressing, the stack, and other bits we’ll get into. Compare that to, say, 16-bit x86, with its fairly complex addressing modes, its segmented memory model, and a completely different I/O system. Add to that the requirement to have an OS, even one such as FreeDOS, and you have quite the task just getting started. The 6502, by contrast, is easy, at least as far as any assembly language can be called easy.

The idea of assembly

In most modern programming languages, things are a bit abstract. You have functions, flow control statements (if, while, and for in C, for example), variables, maybe even objects and exceptions and other neat stuff like that. You’ve got a big standard library full of pre-built routines so you don’t have to reinvent the wheel. In some styles of programming, you aren’t even supposed to care about the low-level details of just how your code runs.

With assembly, all that is gone. It’s just you and the machine. You don’t write assembly code on the level of functions or even statements. You write instructions. You’re telling the computer exactly what to do at each step, and you have to tell it in its own language.

That leads us to a couple of basic concepts regarding assembly. First, each processor has an instruction set. This, obviously, is the set of instructions that it understands. Typically, these all have a prose description like “load accumulator” or “logical shift right”. This notation is a convenience for those studying the instruction set, not for those actually using it. The processor itself doesn’t understand them; it works in numbers like (hexadecimal) $A9 and $4A, what are often called opcodes (a shortened version of “operation codes”). Assembly programmers get a compromise between these extremes: a set of mnemonics, one for each kind of instruction that the processor understands. These are abbreviations, usually only a few letters—the 6502, for example, always uses 3-letter mnemonics. In this form, the two instructions above would be written as LDA and LSR. (Most assemblers nowadays are case-insensitive, so you can write lda and lsr if you want, and I will in assembly language program listings. For describing the instructions themselves, however, I’ll stick to all-caps.)

The second thing to know about assembly also regards its lack of abstractions, but concerning the computer’s memory. Especially on early microprocessors like the 6502, the assembly programmer needs intimate knowledge of the memory layout and how the CPU can access it. Remember, we can’t call a function like in C (f(x,y)). We have to convert even that to a language that the computer understands. How we do that depends very much on the specific system we’re using, so now it’s time to look at the 6502 in particular.

Addressing the 6502

Before we get to the meat of 6502 assembly, we need to look at how a programmer can communicate with the processor. Obviously, most of the work will be done by the registers we saw last time, namely A, X, and Y. Of course, three bytes of usable data isn’t that much, so we’ll be accessing memory almost constantly. And the 6502 offers a few ways to do that—called addressing modes—although some only work with certain instructions.

The first way we can access data not in a register is by putting it in the instruction itself, as an immediate value. On 6502 assemblers, this is usually indicated by a #. For example, LDA #$10 places the value 16 (or $10 in hexadecimal) into the accumulator.

If we want to work with a known location of memory, we might be able to give that location to the instruction using absolute addressing. For example, the Commodore 64’s screen memory is set up so that the upper-left character on the screen is at address $0400. To store the value in the A register there, we could use STA $0400. When using zero-page addresses ($00xx), we can omit the top byte: LDA $FE. This actually saves a byte of memory, which is a lot more important on a system with only 64K than on todays multi-gig computers.

Most of the other addressing modes of the 6502 are in some way indirect, using a value in memory or a register like a pointer (for those of you that know a language like C). These include:

Absolute indirect. Only one instruction actually uses this one. JMP ($FFFE) jumps to the address stored at memory location $FFFE. Since the 6502 has a 16-bit address space, this actually uses the address you give and the one right after it—in this case, $FFFE and $FFFF. (The 6502, like the x86, is little-endian, meaning that the first byte is the low one.)
Relative. This mode is only used by the branching instructions, and it indicates a relative “displacement”. BEQ $08, for example, would jump forward 8 bytes “if equal”. Negative values are allowed, but they’re encoded as two’s-complement numbers (basically, think of N bytes back as $100 - N ahead): BNE $FE jumps back 2 bytes, which makes an awfully effective infinite loop.
Indexed. This is where the X and Y registers start coming into their own. With indexed addressing, one of these is added to the address you give (either 2-byte absolute or 1-byte zero-page). An example would be STA $0400,X, which stores the accumulator value in the address $0400 + X. So, if the X register contains $10, this writes to $0410. Note that some instructions can only use X, some can only use Y, and a few are limited to zero-page addresses.
Indexed indirect and indirect indexed. Don’t worry about mixing the names up on these; they don’t matter. What does matter is how they work. These both use a memory location as a pointer and a register value as an index. The difference is where they add the index in. Indexed indirect adds the X register to the address you give, and creates a pointer from that. Indirect indexed, on the other hand, adds the Y register to the stored value, then uses that as the pointer.

As an example, let’s say that memory locations $30 and $31 each contain the value $80, while $40 and $41 both have $20. Also, both X and Y are set to $10. In this setup, indexed indirect (LDA ($30,X)) takes the memory location $30 + X (i.e., $40) and loads whatever is at the address stored there, essentially as if you’d written LDA $2020. Indirect indexed (LDA ($30),Y) instead takes what is stored at the location you give ($8080, in our example), then adds Y ($10) to that to get the final pointer: $8080 + $10 = $8090. In this case, the effect is the same as LDA $8090.

Finally, assemblers allow the use of labels, so you don’t have to worry about exact addresses. These are the closest you’re going to get to something like named functions. In assembly source code, they’re defined like this: label:. Later on, you can refer to them like you would any memory address, e.g., LDA label or BEQ label. One of the assembler’s jobs is to replace the labels with the”real” addresses, and it’s pretty good at that.

The instructions

After all that, the actual instruction set of the 6502 is refreshingly uncomplicated. All told, there are only a few dozen possible instructions, all of them performing only the most basic of actions. Yet this small arsenal was enough for a generation of 8-bit computers.

Many assembly language references put the instructions in alphabetical order by mnemonics. But the 6502’s set is so small that we can get away with ordering them by what they do. As it turns out, there aren’t too many categories, only about a dozen or so. Also, I’ll have examples for some of the categories, but not all. In the code samples, a ; marks a comment; the rest of the line is ignored, just like // or #, depending on your favorite language.

Load and store

Especially on older, less capable systems like the 6502, moving data around is one of the most important tasks. And there are two ways that data can go: from memory into a register or the other way around. For our CPU, moving a byte from memory to a register is a load, while sending it from a register to a memory location is a store. (The x86, to illustrate a different way of doing things, uses a single instruction, MOV, to do both of these.)

There are three “user” registers on the 6502, and each one has a load and a store instruction. To load a value into a register, you use LDA, LDX, or LDY. To store a value from one of them into memory, it’s STA, STX, and STY. (I think you can figure out which one uses which register.)

In terms of addressing, these instructions are the most well-rounded. The accumulator gives you the most options here, offering immediate, absolute, and all the indirect options. With X and Y, you can’t use indirect addressing, and you can only use the other of the two registers as an index. So you can write LDX $30,Y, but not LDX $30,X.

This code example doesn’t do too much. It sets up the first two memory locations as a pointer to $0400, then writes the byte $36 to that location. For the online assembler I’m using, that makes a blue dot on the left side of the screen, in the middle. On a real C64 or Apple II, that address is the top-left corner of the screen, so it will display whatever the system thinks $36 should be, probably the number 6.



start:

    lda #$00    ; we need 2 loads & 2 stores

    sta $00     ; to set up a 16-bit address

    lda #$04

    sta $01

lda #$36 ldy #$00 ; clear Y to use as an index sta ($00),Y ; stores our byte at $0400

Arithmetic

Besides shuffling data around, computers mainly do math. It’s what they’re best at. As an older microprocessor, the 6502 had to cut corners; by itself, it can only add and subtract, and then only using the accumulator. These two instructions, ADC and SBC, are a little finicky, and they’re our first introduction to the processor status or “flags” register, P. So we’ll take a quick diversion to look at it.

The P register on the 6502, like all its other registers, is a single byte. But we usually don’t care about its byte value as a whole. Instead, we want to look at the individual bits. Since there are eight bits in a byte, there are eight possible flags. The 6502 uses seven of these, although the online assembler doesn’t support two of those, and a third was rarely used even back in the day. So that leaves four that are important enough to mention here:

Bit 7, the Negative (N) flag, is changed after most instructions that affect the A register. It’ll be equal to the high bit of the accumulator, which will always indicate a negative number.
Bit 6, Overflow (V), is set whenever the “sign” of the accumulator changes from arithmetic.
Bit 1 is the Zero (Z) flag, which is only set if the last load, store, or arithmetic instruction ended in a 0.
Bit 0, the Carry (C) flag, is the important one. It’s set when an addition or subtraction causes a result that can’t fit into a byte, as well as when we use some bitwise instructions.

Now, the two arithmetic instructions are ADC and SBC, which stand for “add with carry” and “subtract with carry”. The 6502 doesn’t have a way to add or subtract without involving the carry flag! So, if we don’t want it messing with us, we need to clear it (CLC, which we’ll see again below) before we start doing our addition. Conversely, before subtracting, we must set it with the SEC instruction. (The reason for this is due to the way the processor was designed.)

Also, these instructions only work with the accumulator and a memory address or immediate value. You can’t directly add to X or Y with them, but that’s okay. In the next section, we’ll see instructions that can help us.

The code example here builds on the last one. In the online assembler, it displays a brown pixel next to the blue one. On real computers, it should put a 9 to the right of the 6, because 8-bit coders have dirty minds.



    start:

        lda #$00    ; we need 2 loads & 2 stores

        sta $00     ; to set up a 16-bit address

        lda #$04

        sta $01
        lda #$36

        ldy #$00    ; clear Y to use as an index

        sta ($00),Y ; stores our byte at $0400

clc ; always clear carry first adc #$03 ; A += 3 iny ; move the position right 1 sta ($00),Y ; store the new value

Increment and decrement

The INY (“increment Y register”) instruction I just used is one of a group of six: INC, DEC, INX, DEX, INY, DEY.

All these do instructions do is add or subtract 1, an operation so common that just about every processor in existence has dedicated instructions for it, which is also why C has the ++ and -- operators. For the 6502, these can work on either of our index registers or a memory location. (If you’re lucky enough to have a later model, you also have INA and DEA, which work on the accumulator.)

Our code example this time is an altered version of the last one. This time, instead of incrementing the Y register, we increment the memory location $00 directly. The effect, though, is the same.



    start:

        lda #$00    ; we need 2 loads & 2 stores

        sta $00     ; to set up a 16-bit address

        lda #$04

        sta $01
        lda #$36

        ldy #$00    ; clear Y to use as an index

        sta ($00),Y ; stores our byte at $0400

clc ; always clear carry first adc #$03 ; A += 3 inc $00 ; move the position right 1 sta ($00),Y ; store the new value

Flags

We’ve already seen CLC and SEC. Those are part of a group of instructions that manipulate the flags register. Since we don’t care about all the flags, there’s only one more of these that is important: CLV. All it does is clear the overflow flag, which can come in handy sometimes.

By the way, the other four are two pairs. CLI and SEI work on the interrupt flag, which the online assembler doesn’t support. CLD and SED manipulate the decimal flag, which doesn’t seem to get much use.

There’s no real code example this time, since we’ve already used CLC. SEC works the same way, and I can’t think of a quick use of the overflow flag.

Comparison

Sometimes, it’s useful to just compare numbers, without adding or subtracting. For this, the 6502 offers a trio of arithmetic comparison instructions and one bitwise one.

CMP, CPX, CPY each compare a value in memory to the one in a register (CMP uses A, the others are obvious). If the register value is less than the memory one, the N flag is set. Otherwise, the C flag gets set. If they’re equal, it also sets the Z flag.

BIT works a little differently. It sets the N and V flags to the top two bits of the memory location (no indirection or indexing allowed). Then, it sets the Z flag if the bitwise-AND of the memory byte and the accumulator is zero, i.e., if they have no 1 bits in common.

Comparison instructions are most useful in branching, so I’ll hold off on the example until then.

Branching

Branching is how we simulate the higher-level control structures like conditionals and loops. In the 6502, we have the option of conditionally hopping around our code by using any of nine different instructions. Eight of these come in pairs, each pair based on one of the four main flags.

BCC and BCS branch if the C flag is clear (0) or set (1), respectively.
BNE (“branch not equal”) and BEQ (“branch equal”) do the same for the Z flag.
BVC and BVS branch based on the state of the V flag.
BPL (“branch plus”) and BMI (“branch minus”) work on the N flag.

All of these use the “relative” addressing mode, limiting them to short jumps.

The ninth instruction is JMP, and it can go anywhere. You can use it with a direct address (JMP $FFFE) or an indirect one (JMP ($0055)), and it always jumps. Put simply, it’s GOTO. But that’s not as bad as it sounds. Remember, we don’t have the luxury of while or for. JMP is how we make those.

This code sample, still building on our earlier attempts, draws nine dots (or the digits 0-9) on the screen.



    start:

        lda #$00

        sta $00

        lda #$04

        sta $01
        lda #$30

        ldy #$00

loop: sta ($00),Y ; write the byte to the screen clc adc #$01 ; add 1 to A for next character iny ; move 1 character to the right cpy #$0a ; have we reached 10 yet? bne loop ; if not, go again

For comparison, a pseudo-C version of the same thing:

char* screen = 0x0400;
char value = 0x30;
for (int i = 0; i < 10; i++) {
    screen[i] = value;
    value++;
}

The stack

The stack, on the 6502 processor, is the second page of memory, starting at address $0100. It can be used to store temporary values, addresses, and other data, but it’s all accessed through the stack pointer (SP). You push a value onto the stack, then pop (or pull, to use 6502 terminology) it back off when you need it back.

We’ve got an even half dozen instructions to control the stack. We can push the accumulator value onto it with PHA, and we can do the same with the flags by using PHP. (Not the programming language with that name, thankfully.) Popping—or pulling, if you prefer the archaic term—the value pointed to by the SP uses PLA and PLP. The other two instructions, TSX and TXS let us copy the stack pointer to the X register, or vice versa.

Subroutines

Branches give us flow control, an important part of any high-level programming. For functions, the assembly programmer uses subroutines, and the 6502 has a pair of instructions that help us implement them. JSR (“jump to subroutine”) is an unconditional jump like JMP, except that it pushes the address of the next instruction to the stack before jumping. (Since we only have a page of stack space, this limits how “deep” you can go.) When the subroutine is done, the RTS instruction sends you back to where you started, just after the JSR.

The code sample here shows a little subroutine. See if you can figure out what it does.



    start:

        lda #$00

        sta $00

        lda #$04

        sta $01
        lda #$31

        ldy #$09

        jsr show    ; call our subroutine

        jmp end     ; jump past when we're done
    show:
        sta ($00),Y ; write the byte to screen mem

        clc

        adc #$01    ; add 1 to accumulator

        dey

        bne show    ; loop until Y = 0

        rts         ; return when we're done

end: ; label so we can skip the subroutine

Bitwise

We’ve got a total of seven bitwise instructions (not counting BIT, which is different). Three of these correspond to the usual AND, OR, and XOR operations, and they work on a memory location and the accumulator. AND has an obvious name, ORA stands for “OR with accumulator”, and EOR is “exclusive OR”. (Why they used “EOR” instead of “XOR”, I don’t know.) If you’ve ever used the bit-twiddling parts of C or just about any other language, you know how these work. These three instructions also change the Z and N flags: Z if the result is 0, N if the highest bit of the result is set.

The other four manipulate the bits of memory or the accumulator themselves. ASL is “arithmetic shift left”, identical to the C << operator, except that it only works one bit at a time. The high bit is shifted into the C flag, while Z and N are altered like you’d expect. LSR (“logical shift right”) works mostly in reverse: every bit is shifted down, a 0 is moved into the high bit, and the low bit goes into C.

ROL and ROR (“rotate left/right”) are the oddballs, as few higher-level languages have a counterpart to them. Really, though, they’re simple. ROL works just like ASL, except that it shifts whatever was in the C flag into the low bit instead of always a 0. ROR is the same, but the other way around, putting the C flag’s value into the high bit.

Transfer

We could move bytes between the A, X, and Y registers by copying them to memory or using the stack instructions. That’s time-consuming, though. Instead, we’ve got the TAX, TAY, TXA, and TYA instructions. These transfer a value from one register to another, with the second letter of the mnemonic as the source and the third as the destination. (TAX copies A to X, etc.) The flags are set how you’d expect.

The other guys

There are two other 6502 assembly instructions that don’t do too much. BRK causes an interrupt, which the online assembler can’t handle and isn’t that important for user-level coding. NOP does nothing at all. It’s used to fill space, basically.

Next time

Whew. Maybe I was wrong about fitting all this in one post. Still, that’s essentially everything you need to know about the 6502 instruction set. The web has a ton of tutorials, all of them better than mine. But this is the beginning. In the next part, we’ll look at actually doing things with assembly. That one will be full of code, too.