Assembly: the first steps

(Editor’s note: I pretty much gave up on the formatting for this one. Short of changing to a new syntax highlighter, there’s not an awful lot I can do for it, so I just left it as is. I did add an extra apostrophe on one line to stop the thing from thinking it was reading an unclosed string. Sorry for any trouble you might have when reading.)

As I’ve been saying throughout this little series, assembly is the closest we programmers can get to bare metal. On older systems, it was all but necessary to forgo the benefits of a higher-level language, because the speed gains from using assembly outweighed the extra developer time needed to write it. Nowadays, of course, the pendulum has swung quite far in the opposite direction, and assembly is usually only used in those few places where it can produce massive speedups.

But we’re looking at the 6502, a processor that is ancient compared to those of today. And it didn’t have the luxury of high-level languages, except for BASIC, which wasn’t much better than a prettified assembly language. The 6502, before you add in the code stored in a particular system’s ROM, couldn’t even multiply two numbers, much less perform complex string manipulation or operate on data structures.

This post has two code samples, written by myself, that demonstrate two things. First, they show you what assembly looks like, in something more than the little excerpts from last time. Second, they illustrate just how far we’ve come. These aren’t all that great, I’ll admit, and they’re probably not the fastest or smallest subroutines around. But they work for our purposes.

A debug printer

Debugging is a very important part of coding, as any programmer can (or should) agree. Assembly doesn’t give us too much in the way of debugging tools, however. Some assemblers do, and you might get something on your particular machine, but the lowest level doesn’t even have that. So this first snippet prints a byte to the screen in text form.

; Prints the byte in A to the address ($10),Y
; as 2 characters, then a space
    tax          ; save for later
    ; Some assemblers prefer these as "lsr a" instead
    lsr          ; shift A right 4 bits
    lsr          ; this moves the high bits to the bottom
    jsr outb     ; we use a subroutine for each character
    txa          ; reload A
    and #$0F     ; mask out the top 4 bits
    jsr outb     ; now print the bottom 4 bits
    lda #$20     ; $20 = ASCII space
    sta ($10),Y
    adc #$30     ; ASCII codes for digits are $30-$39
    cmp #$39     ; if A > 9, we print a letter, not a digit
    bmi digit
; Comment out this next line if you're using '
    adc #$07     ; ASCII codes for A-F are $41-$46
digit:           ; either way, we end up here
    sta ($10),Y
    iny          ; move the "cursor" forward

You can call this with JSR printb, and it will do just what the comments say: print the byte in the accumulator. You’d probably want to set $10 and $11 to point to video memory. (On many 6502-based systems, that starts at $0400.)

Now, how does it work? The comments should help you—assembly programming requires good commenting—but here’s the gist. Hexadecimal is the preferred way of writing numbers when using assembly, and each hex digit corresponds to four bits. Thus, our subroutine takes the higher four bits (sometimes called a nibble, and occasionally spelled as nybble) and converts them to their ASCII text representation. Then it does the same thing with the lower four bits.

How does it do that part, though? Well, that’s the mini-subroutine at the end, starting at the label outb. I use the fact that ASCII represents the digits 0-9 as hexadecimal $30-$39. In other words, all you have to do is add $30. For hex A-F, this doesn’t work, because the next ASCII characters are punctuation. That’s what the CMP #$39...BMI digit check is for. The code checks to see if it should print a letter; if so, then it adds a further correction factor to get the right ASCII characters. (Since the online assembler doesn’t support true text output, we should comment out this adjustment; we’re only printing pixels, and these don’t need to be changed.)

This isn’t much, granted. It’s certainly not going to replace printf anytime soon. Then again, printf takes a lot more than 34 bytes. Yes, that’s all the space this whole subroutine needs, although it’s still about 1/2000 of the total memory of a 6502-based computer.

If you’re using the online assembler, you’ll probably want to hold on to this subroutine. Coders using a real machine (or emulation thereof) can use the available ROM routines. On a Commodore 64, for example, you might be able to use JSR $FFD2 instead.

Filling a gap

As I stated above, the 6502 processor can’t multiply. All it can do, as far as arithmetic is concerned, is add and subtract. Let’s fix that.

; Multiplies two 8-bit numbers at $20 and $21
; Result is a 16-bit number stored at $22-$23
; Uses $F0-$F2 as scratch memory
    ldx #$08    ; X holds our counter
    lda #$00    ; clear our result and scratch memory
    sta $22     ; these start at 0
    sta $23
    sta $F1

    lda $20     ; these can be copied
    sta $F0
    lda $21
    sta $F2

    lsr $F2
    bcc next    ; if no carry, skip the addition
    lda $22     ; 16-bit addition
    adc $F0
    sta $22
    lda $23
    adc $F1
    sta $23

    asl $F0     ; 2-byte shift
    rol $F1
    dex         ; if our counter is > 0, repeat
    bne nxbit

This one will be harder to adapt to a true machine, since we use a few bytes of the zero page for “scratch” space. When you only have a single arithmetic register, sacrifices have to be made. On newer or more modern machines, we’d be able to use extra registers to hold our temporary results. (We’d also be more likely to have a built-in multiply instruction, but that’s beside the point.)

The subroutine uses a well-known algorithm, sometimes called peasant multiplication, that actually dates back thousands of years. I’ll let Wikipedia explain the details of the method itself, while I focus on the assembly-specific bits.

Basically, our routine is only useful for multiplying a byte by another byte. The result of this is a 16-bit number, which shouldn’t be too surprising. Of course, we only have an 8-bit register to use, so we need to do some contortions to get things to work, one of the problems of using the 6502. (This is almost like a manual version of what compilers call register spilling.)

What’s most important for illustrative purposes isn’t the algorithm itself, though, but the way we call it. We have to set things up in just the right way, with our values at the precise memory locations; we must adhere to a calling convention. When you use a higher-level language, the compiler takes care of this for you. And when you use assembly to interface with higher-level code (the most common use for it today), it’s something you need to watch.

As an example, take a modern x86 system using the GCC compiler. When you call a C function, the compiler emits a series of instructions to account for the function’s arguments and return value. Arguments are pushed to the stack in a call frame, then the function is called. It accesses those arguments by something like the 6502’s indexed addressing mode, then it does whatever it’s supposed to do, and returns a result either in a register (or two) or at a caller-specified memory location. Then, the caller manipulates the stack pointer—much faster than repeatedly popping from the stack—to remove the call frame, and continues execution.

No matter how it’s done, assembly code that’s intended to connect to higher-level libraries—whether in C or some other language—have to respect that language’s calling conventions. Other languages do, too. That’s what extern "C" is for in C++, and it’s also why many other languages have a foreign function interface, or FFI. In our case, however, we’re writing those libraries, and the 6502 is such a small and simple system, so we can make our own calling conventions. And that’s another reason we need good documentation when coding assembly.

Coming up

We’ll keep going through this wonderful, primitive world a while longer. I’ll touch on data structures, because they have a few interesting implications when working at this low level, but we won’t spend too much time on them. After that, who knows?

Leave a Reply

Your email address will not be published. Required fields are marked *