Code – Page 9 – Prose Poetry Code

Assembly: a little bit more

Well, I’m back. Instead of giving you more apologies for missing a couple of weeks of this exciting series (sarcasm alert!), let’s jump right back in and look at some more old-school assembly language. This week, we’ll get to know homebrewed 6502 versions of a couple of C standard library staples, and we can start talking about how you use data structures in assembly.

Memory and data

The simplest, dumbest (for the computer, not the programmer) way to treat data is as raw memory. The problem is, there’s not much you can do with it. You can initialize it, copy it around, and that’s about it. Everything else needs some structure. Copying in assembly language is pretty easy, though, even in 6502-land:

; Arguments:
; $F0-$F1: Source address
; $F2-$F3: Destination address
; Y register: Byte count
memcpy:
    lda ($F0), Y    ; 2 bytes, 5 cycles
    sta ($F2), Y    ; 2 bytes, 6 cycles
    dey             ; 1 byte, 2 cycles
    bne memcpy      ; 2 bytes, 2-3 cycles
    rts             ; 1 byte, 6 cycles

Yep, this is a stripped-down version of memcpy. It has its limitations—it can only copy a page of memory at a time, and it has no error checking—but it’s short and to the point. Note that, instead of a prose description of the subroutine’s arguments and return values and whatnot, I’m just putting that in the comments before the code. I trust that you can understand how to work with that.

Since the code is pretty self-explanatory, the comments for each line show the size and time taken by each instruction. A little bit of addition should show you that the whole subroutine is only 8 bytes; even on modern processors, the core of memcpy isn’t exactly huge.

The timing calculation is a little more complex, but it’s no less important on a slow, underpowered CPU like the 6502. In the case of our subroutine, it depends on how many bytes we’re copying. The core of the loop will take 13 cycles for each iteration. The branch instruction is 3 cycles when the branch is taken, 2 cycles when it’s missed. Altogether, copying n bytes takes 16n+5 cycles, a range of 21 to 4101. (A zero byte count is treated as 256.) In a modern computer, four thousand cycles would be a few microseconds at most. For the 6502, however, it’s more like a few milliseconds, but it’s hard to get faster than what we’ve got here.

Strings

The first way we can give structure to our data is with strings. Particularly, we’ll look at C-style strings, series of bytes terminated by a null value, hex $00. One of the first interesting operations is taking the string’s length—the C standard library’s strlen—and this is one implementation of it in 6502 assembly:

; Arguments:
; $F0-$F1: String address
; Returns:
; A: Length of null-terminated string
strlen:
    ldy #$00        ; 2 bytes, 2 cycles
    clv             ; 1 byte, 2 cycles
  slloop:
    lda ($F0), Y    ; 2 bytes, 5 cycles
    beq slend       ; 2 bytes, 2-3 cycles
    iny             ; 1 byte, 2 cycles
    bvc slloop      ; 2 bytes, 3 cycles
  slend:
    tya             ; 1 byte, 2 cycles
    rts             ; 1 byte, 6 cycles

All it does is count up through memory, starting at the pointed-to address, until it reaches a zero byte. When it’s done, it gives back the result in the accumulator. Now, this comes with an obvious restriction: our strings can’t be more than 255 bytes, or we get wraparound. For this example, that’s fine, but you need to watch out in real code. Of course, in modern processors, you’ll usually have at least a 32-bit register to work with, and there aren’t too many uses for a single string of a few billion bytes.

Our assembly version of strlen weighs in at 12 bytes. Timing-wise, it’s 12n+20 cycles for a string of length n, which isn’t too bad. The only real trickery is abusing the overflow flag to allow us an unconditional branch, since none of the instructions this subroutine uses will affect it. Using a simple JMP instruction is equivalent in both time and space, but it means we can’t relocate the code once it has been assembled.

Another common operation is comparing strings, so here’s our version of C’s strcmp:

; Arguments:
; $F0-$F1: First string
; $F2-$F3: Second string
; Returns comparison result in A:
; -1: First string is less than second
; 0: Strings are equal
; 1; First string is greater than second
strcmp:
    ldy #$00        ; 2 bytes, 2 cycles
  scload:
    lda ($F0), Y    ; 2 bytes, 5 cycles
    cmp ($F2), Y    ; 2 bytes, 5 cycles
    bne scdone      ; 2 bytes, 2-3 cycles
    iny             ; 1 byte, 2 cycles
    cmp #$00        ; 2 bytes, 2 cycles
    bne scload      ; 2 bytes, 2-3 cycles
    lda #$00        ; 2 bytes, 2 cycles
    rts             ; 1 byte, 6 cycles
  scdone:
    bcs scgrtr      ; 2 bytes, 2-3 cycles
    lda #$FF        ; 2 bytes, 2 cycles
    rts             ; 1 byte, 6 cycles
  scgrtr:
    lda #$01        ; 2 bytes, 2 cycles
    rts             ; 1 byte, 6 cycles

Like the its C namesake, our strcmp doesn’t care about alphabetical order, only the values of the bytes themselves. The subroutine uses just 24 bytes, though, so you can’t ask for too much. (Timing for this one is…nontrivial, so I’ll leave it to the more interested reader.)

Other structures

Arrays, in theory, would work almost like strings. Instead of looking for null bytes, you’d have an explicit count, more like newer C functions such as strncmp. On the 6502, the indirect indexed addressing mode (e.g., LDA ($F0), Y) we’ve used in every example so far is your main tool for this. Other architectures have their own variations, like the x86’s displacement mode.

More complex structures (like C structs or C++ classes), are tougher. This is where the assembly programmer needs a good understanding of how high-level compilers implement such things. Issues like layout, padding, and alignment come into play on modern computers, while the 6502 suffers from the slower speed of indirection.

Self-contained structures (those that won’t be interfacing with higher-level components) are really up to you. The most common layout is linear, with each member of the structure placed consecutively in memory. This way, you’re only working with basic offsets.

But there’s a problem with that, as newer systems don’t really like to access any old byte. Rather, they’ll pull in some number of bytes (always a power of 2: 2, 4, 8, 16, etc.) all at once. Unaligned memory accesses, such as loading a 32-bit value stored at memory location 0x01230001 (using x86-style hex notation) will be slower, because the processor will want to load two 32-bit values—0x01230000 and 0x01230004—and then it has to do a little bit of internal shuffling. Some processors won’t even go that far; they’ll give an error at the first sign of an unaligned access.

For both of these reasons, modern languages generate structures with padding. A C struct containing a byte and a 32-bit word (in that order), won’t take up the 5 bytes you’d expect. No, it’ll be at least 8, and a 64-bit system might even make it 16 bytes. It’s a conscious trade-off of size for speed, and it’s a fair trade in these present days of multi-gigabyte memory. It’s not even that bad on embedded systems, as they grow into the space occupied by PCs a generation ago.

Coming up

For now, I think I’m going to put this series on hold, as I’m not sure where I want it to go. I might move on to a bigger architecture, something like the x86 in its 16-bit days. Until then, back to your regularly scheduled programming posts.

On learning to code

Coding is becoming a big thing right now, particularly as an educational tool. Some schools are promoting programming and computer science classes, even a full curriculum that lasts through the entirety of education. And then there are the commercial and political movements such as Code.org and the Hour of Code. It seems that everyone wants children to learn something about computers, beyond just how to use them.

On the other side of the debate are the detractors of the “learn to code” push, who argue that it’s a boondoggle at best. Not everybody can learn how to code, they argue, nor should they. We’re past the point where anyone who wants to use a computer must learn to program it, too.

Both camps have a point, and I can see some merit in either side of the debate. I was one of a lucky few that did have the chance to learn about programming early in school, so I can speak from experience in a way that most others cannot. So here are my thoughts on the matter.

The beauty of the machine

Programming, in my opinion, is an exercise that brings together a number of disparate elements. You need math, obviously, because computer science—the basis for programming—is all math. You also need logic and reason, talents that are in increasingly short supply among our youth. But computer programming is more than these. It’s math, it’s reasoning, it’s problem solving. But it’s also art. Some problems have more than one solution, and some of those are more elegant than others.

At first glance, it seems unreasonable to try to teach coding to children before its prerequisites. True, there are kid-friendly programming environments, like MIT’s Scratch. But these can only take you so far. I started learning BASIC in 3rd grade, at the age of 8, but that was little more than copying snippets of code out of a book and running them, maybe changing a few variables here and there for different effects. And I won’t pretend that that was anywhere near the norm, or that I was. (Incidentally, I was the only one that complained when the teacher—this was a gifted class, so we had the same teacher each year—took programming out of the curriculum.)

My point is, kids need a firm grasp of at least some math before they can hope to understand the intricacies of code. Arithmetic and some concept of algebra are the bare minimum. General computer skills (typing, “computer literacy”, that sort of thing) are also a must. And I’d want some sort of introduction to critical thinking, too, but that should be a mandatory part of schooling, anyway.

I don’t think that very young students (kindergarten through 2nd grade) should be fooling around with anything more than a simple interface to code like Scratch. (Unless they show promise or actively seek the challenge, that is. I’m firmly in favor of more educational freedom.) Actually writing code requires, well, writing. And any sort of abstraction—assembly on a fictitious processor or something like that—probably should wait until middle school.

Nor do I think that coding should be a fixed part of the curriculum. Again, I must agree somewhat with the learn-to-code detractors. Not everyone is going to take to programming, and we shouldn’t force them to. It certainly doesn’t need to be a required course for advancement. The prerequisites of math, critical thinking, writing, etc., however, do need to be taught to—and understood by—every student. Learning to code isn’t the ultimate goal, in my mind. It’s a nice destination, but we need to focus on the journey. We should be striving to make kids smarter, more well-rounded, more rational.

Broad strokes

So, if I had my way, what would I do? That’s hard to say. These posts don’t exactly have a lot of thought put in them. But I’ll give it a shot. This will just be a few ideas, nothing like an integrated, coherent plan. Also, for those outside the US, this is geared towards the American educational system. I’ll leave it to you to convert it to something more familiar.

Early years (K-2): The first years of school don’t need coding, per se. Here, we should be teaching the fundamentals of math, writing, science, computer use, typing, and so on. Add in a bit of an introduction to electronics (nothing too detailed, but enough to plant the seed of interest). Near the end, we can introduce the idea of programming, the notion that computers and other digital devices are not black boxes, but machines that we can control.
Late elementary (3-5): Starting in 3rd grade (about age 8-9), we can begin actual coding, probably starting with Scratch or something similar. But don’t neglect the other subjects. Use simple games as the main programming projects—kids like games—but also teach how programs can solve problems. And don’t punish students that figure out how to get the computer to do their math homework.
Middle school (6-8): Here, as students begin to learn algebra and geometry (in my imaginary educational system, this starts earlier, too), programming can move from the graphical, point-and-click environments to something involving actual code. Python, JavaScript, and C# are some of the better bets, in my opinion. Games should still be an important hook, but more real-world applications can creep in. You can even throw in an introduction to robotics. This is the point where we can introduce programming as a discipline. Computer science then naturally follows, but at a slower pace. Also, design needs to be incorporated sometime around here.
High school (9-12): High school should be the culmination of the coding curriculum. The graphical environments are gone, but the games remain. With the higher math taught in these grades, 3D can become an important part of the subject. Computer science also needs to be a major focus, with programming paradigms (object-oriented, functional, and so on) and patterns (Visitor, Factory, etc.) coming into their own. Also, we can begin to teach students more about hardware, robotics, program design, and other aspects beyond just code.

We can’t do it alone

Besides educators, the private sector needs to do its part if ubiquitous programming knowledge is going to be the future. There’s simply no point to teaching everyone how to code if they’ll never be able to use such a skill. Open source code, open hardware, free or low-cost tools, all these are vital to this effort. But the computing world is moving away from all of them. Apple’s iOS costs hundreds of dollars just to start developing. Android is cheaper, but the wide variety of devices means either expensive testing or compromises. Even desktop platforms are moving towards the walled garden.

This platform lockdown is incompatible with the idea of coding as a school subject. After all, what’s the point? Why would I want to learn to code, if the only way I could use that knowledge is by getting a job for a corporation that can afford it? Every other part of education has some reflection in the real world. If we want programming to join that small, elite group, then we must make sure it has a place.

Assembly: the first steps

(Editor’s note: I pretty much gave up on the formatting for this one. Short of changing to a new syntax highlighter, there’s not an awful lot I can do for it, so I just left it as is. I did add an extra apostrophe on one line to stop the thing from thinking it was reading an unclosed string. Sorry for any trouble you might have when reading.)

As I’ve been saying throughout this little series, assembly is the closest we programmers can get to bare metal. On older systems, it was all but necessary to forgo the benefits of a higher-level language, because the speed gains from using assembly outweighed the extra developer time needed to write it. Nowadays, of course, the pendulum has swung quite far in the opposite direction, and assembly is usually only used in those few places where it can produce massive speedups.

But we’re looking at the 6502, a processor that is ancient compared to those of today. And it didn’t have the luxury of high-level languages, except for BASIC, which wasn’t much better than a prettified assembly language. The 6502, before you add in the code stored in a particular system’s ROM, couldn’t even multiply two numbers, much less perform complex string manipulation or operate on data structures.

This post has two code samples, written by myself, that demonstrate two things. First, they show you what assembly looks like, in something more than the little excerpts from last time. Second, they illustrate just how far we’ve come. These aren’t all that great, I’ll admit, and they’re probably not the fastest or smallest subroutines around. But they work for our purposes.

A debug printer

Debugging is a very important part of coding, as any programmer can (or should) agree. Assembly doesn’t give us too much in the way of debugging tools, however. Some assemblers do, and you might get something on your particular machine, but the lowest level doesn’t even have that. So this first snippet prints a byte to the screen in text form.

; Prints the byte in A to the address ($10),Y
; as 2 characters, then a space
printb:
    tax          ; save for later
    ; Some assemblers prefer these as "lsr a" instead
    lsr          ; shift A right 4 bits
    lsr          ; this moves the high bits to the bottom
    lsr
    lsr
    jsr outb     ; we use a subroutine for each character
    txa          ; reload A
    and #$0F     ; mask out the top 4 bits
    jsr outb     ; now print the bottom 4 bits
    lda #$20     ; $20 = ASCII space
    sta ($10),Y
    iny
    rts
outb:
    clc
    adc #$30     ; ASCII codes for digits are $30-$39
    cmp #$39     ; if A > 9, we print a letter, not a digit
    bmi digit
    clc
; Comment out this next line if you're using 6502asm.com '
    adc #$07     ; ASCII codes for A-F are $41-$46
digit:           ; either way, we end up here
    sta ($10),Y
    iny          ; move the "cursor" forward
    rts

You can call this with JSR printb, and it will do just what the comments say: print the byte in the accumulator. You’d probably want to set $10 and $11 to point to video memory. (On many 6502-based systems, that starts at $0400.)

Now, how does it work? The comments should help you—assembly programming requires good commenting—but here’s the gist. Hexadecimal is the preferred way of writing numbers when using assembly, and each hex digit corresponds to four bits. Thus, our subroutine takes the higher four bits (sometimes called a nibble, and occasionally spelled as nybble) and converts them to their ASCII text representation. Then it does the same thing with the lower four bits.

How does it do that part, though? Well, that’s the mini-subroutine at the end, starting at the label outb. I use the fact that ASCII represents the digits 0-9 as hexadecimal $30-$39. In other words, all you have to do is add $30. For hex A-F, this doesn’t work, because the next ASCII characters are punctuation. That’s what the CMP #$39...BMI digit check is for. The code checks to see if it should print a letter; if so, then it adds a further correction factor to get the right ASCII characters. (Since the online assembler doesn’t support true text output, we should comment out this adjustment; we’re only printing pixels, and these don’t need to be changed.)

This isn’t much, granted. It’s certainly not going to replace printf anytime soon. Then again, printf takes a lot more than 34 bytes. Yes, that’s all the space this whole subroutine needs, although it’s still about 1/2000 of the total memory of a 6502-based computer.

If you’re using the online assembler, you’ll probably want to hold on to this subroutine. Coders using a real machine (or emulation thereof) can use the available ROM routines. On a Commodore 64, for example, you might be able to use JSR $FFD2 instead.

Filling a gap

As I stated above, the 6502 processor can’t multiply. All it can do, as far as arithmetic is concerned, is add and subtract. Let’s fix that.

; Multiplies two 8-bit numbers at $20 and $21
; Result is a 16-bit number stored at $22-$23
; Uses $F0-$F2 as scratch memory
multi:
    ldx #$08    ; X holds our counter
    lda #$00    ; clear our result and scratch memory
    sta $22     ; these start at 0
    sta $23
    sta $F1

    lda $20     ; these can be copied
    sta $F0
    lda $21
    sta $F2

nxbit:
    lsr $F2
    bcc next    ; if no carry, skip the addition
    clc
    lda $22     ; 16-bit addition
    adc $F0
    sta $22
    lda $23
    adc $F1
    sta $23

next:
    asl $F0     ; 2-byte shift
    rol $F1
    dex         ; if our counter is > 0, repeat
    bne nxbit
    rts

This one will be harder to adapt to a true machine, since we use a few bytes of the zero page for “scratch” space. When you only have a single arithmetic register, sacrifices have to be made. On newer or more modern machines, we’d be able to use extra registers to hold our temporary results. (We’d also be more likely to have a built-in multiply instruction, but that’s beside the point.)

The subroutine uses a well-known algorithm, sometimes called peasant multiplication, that actually dates back thousands of years. I’ll let Wikipedia explain the details of the method itself, while I focus on the assembly-specific bits.

Basically, our routine is only useful for multiplying a byte by another byte. The result of this is a 16-bit number, which shouldn’t be too surprising. Of course, we only have an 8-bit register to use, so we need to do some contortions to get things to work, one of the problems of using the 6502. (This is almost like a manual version of what compilers call register spilling.)

What’s most important for illustrative purposes isn’t the algorithm itself, though, but the way we call it. We have to set things up in just the right way, with our values at the precise memory locations; we must adhere to a calling convention. When you use a higher-level language, the compiler takes care of this for you. And when you use assembly to interface with higher-level code (the most common use for it today), it’s something you need to watch.

As an example, take a modern x86 system using the GCC compiler. When you call a C function, the compiler emits a series of instructions to account for the function’s arguments and return value. Arguments are pushed to the stack in a call frame, then the function is called. It accesses those arguments by something like the 6502’s indexed addressing mode, then it does whatever it’s supposed to do, and returns a result either in a register (or two) or at a caller-specified memory location. Then, the caller manipulates the stack pointer—much faster than repeatedly popping from the stack—to remove the call frame, and continues execution.

No matter how it’s done, assembly code that’s intended to connect to higher-level libraries—whether in C or some other language—have to respect that language’s calling conventions. Other languages do, too. That’s what extern "C" is for in C++, and it’s also why many other languages have a foreign function interface, or FFI. In our case, however, we’re writing those libraries, and the 6502 is such a small and simple system, so we can make our own calling conventions. And that’s another reason we need good documentation when coding assembly.

Coming up

We’ll keep going through this wonderful, primitive world a while longer. I’ll touch on data structures, because they have a few interesting implications when working at this low level, but we won’t spend too much time on them. After that, who knows?

Assembly: the building blocks

(Editor’s note: Sorry about the formatting for this one. The syntax highlighter I use on here had absolutely no idea what to do with assembly—I think it thought it was Perl—and it screwed everything up. To turn it off, I had to wrap each code sample in HTML code blocks, which killed the indentation.)

So here we are. One of the reasons I chose the 6502 for this little series is because it has such a simple assembly language. It could fit in one post, even covering the intricacies of addressing, the stack, and other bits we’ll get into. Compare that to, say, 16-bit x86, with its fairly complex addressing modes, its segmented memory model, and a completely different I/O system. Add to that the requirement to have an OS, even one such as FreeDOS, and you have quite the task just getting started. The 6502, by contrast, is easy, at least as far as any assembly language can be called easy.

The idea of assembly

In most modern programming languages, things are a bit abstract. You have functions, flow control statements (if, while, and for in C, for example), variables, maybe even objects and exceptions and other neat stuff like that. You’ve got a big standard library full of pre-built routines so you don’t have to reinvent the wheel. In some styles of programming, you aren’t even supposed to care about the low-level details of just how your code runs.

With assembly, all that is gone. It’s just you and the machine. You don’t write assembly code on the level of functions or even statements. You write instructions. You’re telling the computer exactly what to do at each step, and you have to tell it in its own language.

That leads us to a couple of basic concepts regarding assembly. First, each processor has an instruction set. This, obviously, is the set of instructions that it understands. Typically, these all have a prose description like “load accumulator” or “logical shift right”. This notation is a convenience for those studying the instruction set, not for those actually using it. The processor itself doesn’t understand them; it works in numbers like (hexadecimal) $A9 and $4A, what are often called opcodes (a shortened version of “operation codes”). Assembly programmers get a compromise between these extremes: a set of mnemonics, one for each kind of instruction that the processor understands. These are abbreviations, usually only a few letters—the 6502, for example, always uses 3-letter mnemonics. In this form, the two instructions above would be written as LDA and LSR. (Most assemblers nowadays are case-insensitive, so you can write lda and lsr if you want, and I will in assembly language program listings. For describing the instructions themselves, however, I’ll stick to all-caps.)

The second thing to know about assembly also regards its lack of abstractions, but concerning the computer’s memory. Especially on early microprocessors like the 6502, the assembly programmer needs intimate knowledge of the memory layout and how the CPU can access it. Remember, we can’t call a function like in C (f(x,y)). We have to convert even that to a language that the computer understands. How we do that depends very much on the specific system we’re using, so now it’s time to look at the 6502 in particular.

Addressing the 6502

Before we get to the meat of 6502 assembly, we need to look at how a programmer can communicate with the processor. Obviously, most of the work will be done by the registers we saw last time, namely A, X, and Y. Of course, three bytes of usable data isn’t that much, so we’ll be accessing memory almost constantly. And the 6502 offers a few ways to do that—called addressing modes—although some only work with certain instructions.

The first way we can access data not in a register is by putting it in the instruction itself, as an immediate value. On 6502 assemblers, this is usually indicated by a #. For example, LDA #$10 places the value 16 (or $10 in hexadecimal) into the accumulator.

If we want to work with a known location of memory, we might be able to give that location to the instruction using absolute addressing. For example, the Commodore 64’s screen memory is set up so that the upper-left character on the screen is at address $0400. To store the value in the A register there, we could use STA $0400. When using zero-page addresses ($00xx), we can omit the top byte: LDA $FE. This actually saves a byte of memory, which is a lot more important on a system with only 64K than on todays multi-gig computers.

Most of the other addressing modes of the 6502 are in some way indirect, using a value in memory or a register like a pointer (for those of you that know a language like C). These include:

Absolute indirect. Only one instruction actually uses this one. JMP ($FFFE) jumps to the address stored at memory location $FFFE. Since the 6502 has a 16-bit address space, this actually uses the address you give and the one right after it—in this case, $FFFE and $FFFF. (The 6502, like the x86, is little-endian, meaning that the first byte is the low one.)
Relative. This mode is only used by the branching instructions, and it indicates a relative “displacement”. BEQ $08, for example, would jump forward 8 bytes “if equal”. Negative values are allowed, but they’re encoded as two’s-complement numbers (basically, think of N bytes back as $100 - N ahead): BNE $FE jumps back 2 bytes, which makes an awfully effective infinite loop.
Indexed. This is where the X and Y registers start coming into their own. With indexed addressing, one of these is added to the address you give (either 2-byte absolute or 1-byte zero-page). An example would be STA $0400,X, which stores the accumulator value in the address $0400 + X. So, if the X register contains $10, this writes to $0410. Note that some instructions can only use X, some can only use Y, and a few are limited to zero-page addresses.
Indexed indirect and indirect indexed. Don’t worry about mixing the names up on these; they don’t matter. What does matter is how they work. These both use a memory location as a pointer and a register value as an index. The difference is where they add the index in. Indexed indirect adds the X register to the address you give, and creates a pointer from that. Indirect indexed, on the other hand, adds the Y register to the stored value, then uses that as the pointer.

As an example, let’s say that memory locations $30 and $31 each contain the value $80, while $40 and $41 both have $20. Also, both X and Y are set to $10. In this setup, indexed indirect (LDA ($30,X)) takes the memory location $30 + X (i.e., $40) and loads whatever is at the address stored there, essentially as if you’d written LDA $2020. Indirect indexed (LDA ($30),Y) instead takes what is stored at the location you give ($8080, in our example), then adds Y ($10) to that to get the final pointer: $8080 + $10 = $8090. In this case, the effect is the same as LDA $8090.

Finally, assemblers allow the use of labels, so you don’t have to worry about exact addresses. These are the closest you’re going to get to something like named functions. In assembly source code, they’re defined like this: label:. Later on, you can refer to them like you would any memory address, e.g., LDA label or BEQ label. One of the assembler’s jobs is to replace the labels with the”real” addresses, and it’s pretty good at that.

The instructions

After all that, the actual instruction set of the 6502 is refreshingly uncomplicated. All told, there are only a few dozen possible instructions, all of them performing only the most basic of actions. Yet this small arsenal was enough for a generation of 8-bit computers.

Many assembly language references put the instructions in alphabetical order by mnemonics. But the 6502’s set is so small that we can get away with ordering them by what they do. As it turns out, there aren’t too many categories, only about a dozen or so. Also, I’ll have examples for some of the categories, but not all. In the code samples, a ; marks a comment; the rest of the line is ignored, just like // or #, depending on your favorite language.

Load and store

Especially on older, less capable systems like the 6502, moving data around is one of the most important tasks. And there are two ways that data can go: from memory into a register or the other way around. For our CPU, moving a byte from memory to a register is a load, while sending it from a register to a memory location is a store. (The x86, to illustrate a different way of doing things, uses a single instruction, MOV, to do both of these.)

There are three “user” registers on the 6502, and each one has a load and a store instruction. To load a value into a register, you use LDA, LDX, or LDY. To store a value from one of them into memory, it’s STA, STX, and STY. (I think you can figure out which one uses which register.)

In terms of addressing, these instructions are the most well-rounded. The accumulator gives you the most options here, offering immediate, absolute, and all the indirect options. With X and Y, you can’t use indirect addressing, and you can only use the other of the two registers as an index. So you can write LDX $30,Y, but not LDX $30,X.

This code example doesn’t do too much. It sets up the first two memory locations as a pointer to $0400, then writes the byte $36 to that location. For the online assembler I’m using, that makes a blue dot on the left side of the screen, in the middle. On a real C64 or Apple II, that address is the top-left corner of the screen, so it will display whatever the system thinks $36 should be, probably the number 6.



start:

    lda #$00    ; we need 2 loads & 2 stores

    sta $00     ; to set up a 16-bit address

    lda #$04

    sta $01

lda #$36 ldy #$00 ; clear Y to use as an index sta ($00),Y ; stores our byte at $0400

Arithmetic

Besides shuffling data around, computers mainly do math. It’s what they’re best at. As an older microprocessor, the 6502 had to cut corners; by itself, it can only add and subtract, and then only using the accumulator. These two instructions, ADC and SBC, are a little finicky, and they’re our first introduction to the processor status or “flags” register, P. So we’ll take a quick diversion to look at it.

The P register on the 6502, like all its other registers, is a single byte. But we usually don’t care about its byte value as a whole. Instead, we want to look at the individual bits. Since there are eight bits in a byte, there are eight possible flags. The 6502 uses seven of these, although the online assembler doesn’t support two of those, and a third was rarely used even back in the day. So that leaves four that are important enough to mention here:

Bit 7, the Negative (N) flag, is changed after most instructions that affect the A register. It’ll be equal to the high bit of the accumulator, which will always indicate a negative number.
Bit 6, Overflow (V), is set whenever the “sign” of the accumulator changes from arithmetic.
Bit 1 is the Zero (Z) flag, which is only set if the last load, store, or arithmetic instruction ended in a 0.
Bit 0, the Carry (C) flag, is the important one. It’s set when an addition or subtraction causes a result that can’t fit into a byte, as well as when we use some bitwise instructions.

Now, the two arithmetic instructions are ADC and SBC, which stand for “add with carry” and “subtract with carry”. The 6502 doesn’t have a way to add or subtract without involving the carry flag! So, if we don’t want it messing with us, we need to clear it (CLC, which we’ll see again below) before we start doing our addition. Conversely, before subtracting, we must set it with the SEC instruction. (The reason for this is due to the way the processor was designed.)

Also, these instructions only work with the accumulator and a memory address or immediate value. You can’t directly add to X or Y with them, but that’s okay. In the next section, we’ll see instructions that can help us.

The code example here builds on the last one. In the online assembler, it displays a brown pixel next to the blue one. On real computers, it should put a 9 to the right of the 6, because 8-bit coders have dirty minds.



    start:

        lda #$00    ; we need 2 loads & 2 stores

        sta $00     ; to set up a 16-bit address

        lda #$04

        sta $01
        lda #$36

        ldy #$00    ; clear Y to use as an index

        sta ($00),Y ; stores our byte at $0400

clc ; always clear carry first adc #$03 ; A += 3 iny ; move the position right 1 sta ($00),Y ; store the new value

Increment and decrement

The INY (“increment Y register”) instruction I just used is one of a group of six: INC, DEC, INX, DEX, INY, DEY.

All these do instructions do is add or subtract 1, an operation so common that just about every processor in existence has dedicated instructions for it, which is also why C has the ++ and -- operators. For the 6502, these can work on either of our index registers or a memory location. (If you’re lucky enough to have a later model, you also have INA and DEA, which work on the accumulator.)

Our code example this time is an altered version of the last one. This time, instead of incrementing the Y register, we increment the memory location $00 directly. The effect, though, is the same.



    start:

        lda #$00    ; we need 2 loads & 2 stores

        sta $00     ; to set up a 16-bit address

        lda #$04

        sta $01
        lda #$36

        ldy #$00    ; clear Y to use as an index

        sta ($00),Y ; stores our byte at $0400

clc ; always clear carry first adc #$03 ; A += 3 inc $00 ; move the position right 1 sta ($00),Y ; store the new value

Flags

We’ve already seen CLC and SEC. Those are part of a group of instructions that manipulate the flags register. Since we don’t care about all the flags, there’s only one more of these that is important: CLV. All it does is clear the overflow flag, which can come in handy sometimes.

By the way, the other four are two pairs. CLI and SEI work on the interrupt flag, which the online assembler doesn’t support. CLD and SED manipulate the decimal flag, which doesn’t seem to get much use.

There’s no real code example this time, since we’ve already used CLC. SEC works the same way, and I can’t think of a quick use of the overflow flag.

Comparison

Sometimes, it’s useful to just compare numbers, without adding or subtracting. For this, the 6502 offers a trio of arithmetic comparison instructions and one bitwise one.

CMP, CPX, CPY each compare a value in memory to the one in a register (CMP uses A, the others are obvious). If the register value is less than the memory one, the N flag is set. Otherwise, the C flag gets set. If they’re equal, it also sets the Z flag.

BIT works a little differently. It sets the N and V flags to the top two bits of the memory location (no indirection or indexing allowed). Then, it sets the Z flag if the bitwise-AND of the memory byte and the accumulator is zero, i.e., if they have no 1 bits in common.

Comparison instructions are most useful in branching, so I’ll hold off on the example until then.

Branching

Branching is how we simulate the higher-level control structures like conditionals and loops. In the 6502, we have the option of conditionally hopping around our code by using any of nine different instructions. Eight of these come in pairs, each pair based on one of the four main flags.

BCC and BCS branch if the C flag is clear (0) or set (1), respectively.
BNE (“branch not equal”) and BEQ (“branch equal”) do the same for the Z flag.
BVC and BVS branch based on the state of the V flag.
BPL (“branch plus”) and BMI (“branch minus”) work on the N flag.

All of these use the “relative” addressing mode, limiting them to short jumps.

The ninth instruction is JMP, and it can go anywhere. You can use it with a direct address (JMP $FFFE) or an indirect one (JMP ($0055)), and it always jumps. Put simply, it’s GOTO. But that’s not as bad as it sounds. Remember, we don’t have the luxury of while or for. JMP is how we make those.

This code sample, still building on our earlier attempts, draws nine dots (or the digits 0-9) on the screen.



    start:

        lda #$00

        sta $00

        lda #$04

        sta $01
        lda #$30

        ldy #$00

loop: sta ($00),Y ; write the byte to the screen clc adc #$01 ; add 1 to A for next character iny ; move 1 character to the right cpy #$0a ; have we reached 10 yet? bne loop ; if not, go again

For comparison, a pseudo-C version of the same thing:

char* screen = 0x0400;
char value = 0x30;
for (int i = 0; i < 10; i++) {
    screen[i] = value;
    value++;
}

The stack

The stack, on the 6502 processor, is the second page of memory, starting at address $0100. It can be used to store temporary values, addresses, and other data, but it’s all accessed through the stack pointer (SP). You push a value onto the stack, then pop (or pull, to use 6502 terminology) it back off when you need it back.

We’ve got an even half dozen instructions to control the stack. We can push the accumulator value onto it with PHA, and we can do the same with the flags by using PHP. (Not the programming language with that name, thankfully.) Popping—or pulling, if you prefer the archaic term—the value pointed to by the SP uses PLA and PLP. The other two instructions, TSX and TXS let us copy the stack pointer to the X register, or vice versa.

Subroutines

Branches give us flow control, an important part of any high-level programming. For functions, the assembly programmer uses subroutines, and the 6502 has a pair of instructions that help us implement them. JSR (“jump to subroutine”) is an unconditional jump like JMP, except that it pushes the address of the next instruction to the stack before jumping. (Since we only have a page of stack space, this limits how “deep” you can go.) When the subroutine is done, the RTS instruction sends you back to where you started, just after the JSR.

The code sample here shows a little subroutine. See if you can figure out what it does.



    start:

        lda #$00

        sta $00

        lda #$04

        sta $01
        lda #$31

        ldy #$09

        jsr show    ; call our subroutine

        jmp end     ; jump past when we're done
    show:
        sta ($00),Y ; write the byte to screen mem

        clc

        adc #$01    ; add 1 to accumulator

        dey

        bne show    ; loop until Y = 0

        rts         ; return when we're done

end: ; label so we can skip the subroutine

Bitwise

We’ve got a total of seven bitwise instructions (not counting BIT, which is different). Three of these correspond to the usual AND, OR, and XOR operations, and they work on a memory location and the accumulator. AND has an obvious name, ORA stands for “OR with accumulator”, and EOR is “exclusive OR”. (Why they used “EOR” instead of “XOR”, I don’t know.) If you’ve ever used the bit-twiddling parts of C or just about any other language, you know how these work. These three instructions also change the Z and N flags: Z if the result is 0, N if the highest bit of the result is set.

The other four manipulate the bits of memory or the accumulator themselves. ASL is “arithmetic shift left”, identical to the C << operator, except that it only works one bit at a time. The high bit is shifted into the C flag, while Z and N are altered like you’d expect. LSR (“logical shift right”) works mostly in reverse: every bit is shifted down, a 0 is moved into the high bit, and the low bit goes into C.

ROL and ROR (“rotate left/right”) are the oddballs, as few higher-level languages have a counterpart to them. Really, though, they’re simple. ROL works just like ASL, except that it shifts whatever was in the C flag into the low bit instead of always a 0. ROR is the same, but the other way around, putting the C flag’s value into the high bit.

Transfer

We could move bytes between the A, X, and Y registers by copying them to memory or using the stack instructions. That’s time-consuming, though. Instead, we’ve got the TAX, TAY, TXA, and TYA instructions. These transfer a value from one register to another, with the second letter of the mnemonic as the source and the third as the destination. (TAX copies A to X, etc.) The flags are set how you’d expect.

The other guys

There are two other 6502 assembly instructions that don’t do too much. BRK causes an interrupt, which the online assembler can’t handle and isn’t that important for user-level coding. NOP does nothing at all. It’s used to fill space, basically.

Next time

Whew. Maybe I was wrong about fitting all this in one post. Still, that’s essentially everything you need to know about the 6502 instruction set. The web has a ton of tutorials, all of them better than mine. But this is the beginning. In the next part, we’ll look at actually doing things with assembly. That one will be full of code, too.

Assembly: welcome to the machine

Well, we have a winner, and it’s the 6502. Sure, it’s simple and limited, and it’s actually the oldest (this year, it turned 40) of our four finalists. But I chose it for a few reasons. First, the availability of an online assembler and emulator at 6502asm.com. Second, because of its wide use in 8-bit computers, especially those I’ve used (you might call this nostalgia, but it’s a practical reason, too). And finally, its simplicity. Yes, it’s limited, but those limitations make it easier to understand the system as a whole.

Before we get started

A couple of things to point out before I get into the details of our learning system. I’m using the online assembler at the link above. You can use that if you like. If you’d rather use something approximating a real 6502, there are plenty of emulators out there. There’s a problem, though. The 6502 was widely used as a base for the home computers of the 70s and 80s, but each system changed things a little. And the derivative processors, such as the Commodore 64’s MOS 6510 or the Ricoh 2A03 used in the NES, each had their own quirks. So, this series will focus on an “ideal” 6502 wherever possible, only entering the real world when necessary.

A second thing to note is the use of hexadecimal (base-16) numbers. They’re extremely common in assembly programming; they might even be used more than decimal numbers. But writing them down poses a problem. The mathematically correct way is to use a subscript: 0400₁₆. That’s awfully hard to do, especially on a computer, so programmers developed alternatives. In modern times, we mostly use the prefix “0x”: 0x0400. In the 8-bit days, however, the convention was a prefixed dollar sign: $0400. Since that’s the tradition for 6502-based systems, that’s what we’ll use here.

The processor

Officially, we’re discussing the MOS Technology 6502 microprocessor. Informally, everybody just calls it the 6502. It’s an 8-bit microprocessor that was originally released in 1975, a full eight years before I was born, in what might as well be the Stone Age in terms of computers. Back then, it was cheap, easy to use, and developer-friendly, exactly what most other processors weren’t. And those qualities gave it its popularity among hobbyists and smaller manufacturers, at a time when most people couldn’t even imagine wanting a computer at home.

Electronically, the 6502 is a microprocessor. Basically, all that really means is that it isn’t meant to do much by itself, but it’s intended to drive other chips. It’s the core of the system, not the whole thing. In the Commodore 64, for example, the 6502 (actually 6510, but close enough) was accompanied by the VIC-II graphics chip, the famous SID chip for sound, and a pair of 6526 microcontrollers for input and output (I/O). Other home computers had their own families of companion chips, and it was almost expected that you’d add peripherals for increased functionality.

Internally, there’s not too much to tell. The 6502 is 8-bit, which means that it works with byte-sized machine words. It’s little-endian, so larger numbers are stored from the lowest byte up. (Example: the decimal number 1024, hexadecimal $0400, would be listed in a memory readout as 00 04.) Most 6502s run at somewhere around 1 MHz, but some were clocked faster, up to 2 MHz.

For an assembly programmer, the processor is very much bare-bones. It can access a 16-bit memory space, which means a total of 65,536 bytes, or 64K. (We’ll ignore the silly “binary” units here.) You’ve got a mere handful of registers, including:

The accumulator (A), which is your only real “working” register,
Two index registers (X and Y), used for indirect memory access,
A stack pointer (SP), which is nominally 16-bit, but the upper byte is hardwired to $01,
The processor status register (P), a set of “flag” bits that are used to determine certain conditions,
A program counter (PC) that keeps track of the address of the currently-executing assembly instruction.

That’s…not a lot. By contrast, the 8086 had 14 registers (4 general purpose, 2 index, 2 stack pointer, 4 segment registers, an instruction pointer, and the processor flags). Today’s x86-64 processors add quite a few more (e.g., 8 more general purpose, 2 more segment registers that are completely useless in 64-bit mode, 8 floating-point registers, and 8 SIMD registers). But, in 1975, it wasn’t easy to make a microprocessor that sold for $25. Or $100, for that matter, so that’s what you had to work with. The whole early history of personal computers, in fact, is an epic tale of cutting corners. (The ZX81, for example, had a “slow” and a “fast” mode. In fast mode, it disabled video output, because that was the only way the processor could run code at full speed!)

Memory

Because of the general lack of registers inside the processor, memory becomes of paramount importance on the 6502. Now, we don’t think of it much today, but there are two main kinds of memory: read-only (ROM) and writable (“random access”, hence RAM). How ROM and RAM were set up was a detail for the computer manufacturer or dedicated hobbyist; the processor itself didn’t really care. The Apple IIe, for example, had 64K of RAM and 16K of ROM; the Commodore 64 gave you the same amount of RAM, but had a total of 24K of ROM.

Astute readers—anyone who can add—will note that I’ve already said the 6502 had a 16-bit memory address space, which caps at 64K. That’s true. However much memory you really had (e.g., 80K total on the Apple IIe), assembly code could only access 64K of it at a time. Different systems had different ways of coping with this, mostly by using bank switching, where a part of address space could be switched to show a different “window” of the larger memory.

One quirk of the 6502’s memory handling needs to be mentioned, because it forms a very important part of assembly programming on the processor. Since the 6502 is an 8-bit processor, it’s natural to divide memory into pages, each page being 256 bytes. In hexadecimal terms, pages start at $xx00 (where xx can be anything) and run to $xxFF. The key thing to notice is that the higher byte stays the same for every address in the same page. Since the computer only works with bytes, the less we have to cross a page “boundary” (from $03FF to $0400, for instance), the better.

The processor itself even acknowledges this. The zero page, the memory located from $0000 to $00FF, can be used in 6502 assembly as one-byte addresses. And because the 6502 wasn’t that fast to begin with, and memory wasn’t that much slower, it’s almost like having an extra 256 registers! (Of course, much of this precious memory space is reserved on actual home computers, meaning that it’s unavailable for us. Even 6502asm uses two bytes of the zero page, $FE and $FF, for its own purposes.)

Video and everything else

Video display depended almost completely on the additional hardware installed alongside a 6502 processor. The online assembler I’ll be using has a very simplified video system: 32×32 pixels, each pixel taking up one byte, running from address $0200 to $05FF, with 16 possible colors. Typically, actual computers gave you much more. Most of them had a text mode (40×24, 80×25, or something like that) that may or may not have offered colors, along with a high-res mode that was either monochrome or very restricted in colors.

Almost any other function you can think of is also dependent on the system involved, rather than being a part of the 6502 itself. Our online version doesn’t have any extra bells and whistles, so I won’t be covering them in the near future. If interest is high enough, however, I might go back and delve deeper into one of the many emulators available.

Coming up

So that’s pretty much it for the basics of the 6502. A bare handful of registers, not even enough memory to hold the stylesheet for this post, and a bunch of peripherals that were wholly dependent upon the manufacturer of the specific computer you used. And it was still the workhorse of a generation. After four decades, it looks primitive to us, because it is. But every journey starts somewhere, and sometimes we need to go back to a simpler time because it was simpler.

That’s the case here. While I could certainly demonstrate assembly by making something in modern, 64-bit x86 code, it wouldn’t have the same impact. Modern assembly, on a larger scale, is often not much more than a waste of developer resources. But older systems didn’t have the luxury of optimizing compilers and high-level functional programming. For most people in the 80s, you used BASIC to learn how to program, then you switched to assembly when you wanted to make something useful. That was okay, because everybody else had the same limitations, and the system itself was so small that you could be productive with assembly.

In the next post of this series, we’ll actually start looking at the 6502 assembly language. We’ll even take a peek under that hood, going down to the ultimate in low-level, machine code. I hope you’ll enjoy reading about it as much as I am writing it.

Assembly: narrowing the field

Okay, I guess I told a little fib. But I’ve got a good reason. I’ve been thinking this over for a while, and I still haven’t come to a decision. I want to do a few posts introducing an assembly language and the “style” of programming it, but I don’t know which assembly language to use. So, in the interest of better informing you (and giving myself an extra week to figure out an answer), I’ve decided to put some of my thought processes into words.

The contenders

I know a handful of assembly variants, and I know of quite a few more, especially if you count different versions of architectures as separate languages. Some of these are more useful for introductory purposes than others, while some are much more useful in actual work. These two subsets, unfortunately, don’t overlap much. But I’ve come up with four finalists for this coveted crown, and here they are, complete with my own justifications.

6502

If you’re older than about 25, then you’ve probably used a 6502-based system before, whether you knew it or not. It was a mainstay for 80s “home computers”, including the Apple II and Commodore 64, and a customized version was used in the NES. It’s still a hobbyist favorite, mostly for nostalgia reasons rather than any technical superiority, and there’s no sign that it will ever become truly extinct. (Bender, after all, has one in his head, which explains a lot.)

Pros:

8-bit processors are about as simple as you can get while still being useful.
There’s a lot of work out there already, in terms of tutorials, programming guides, etc.
An online assembler exists, which makes things much easier.
Plenty of emulators are available, although these are usually for specific 6502 computers.

Cons:

8-bit can be very limiting, mostly because it is so simple.
There aren’t many registers, which slows down a lot of work and means a lot of memory trickery.
Despite what its followers might think, 6502 is pretty much a dead end for development, as it has been for about 20 years.

Early x86

By “early”, I specifically mean the 16-bit x86 processors, the 8086 and 80286. These are the CPUs of the first IBM-compatible personal computers, and the ancestors of the i5 and A10 we use today. You would think that would give it direct relevance, but you’d actually be wrong. Today’s x86 processors, when running in 64-bit mode, actually can’t directly run 16-bit code. But we’d be using an emulator of some sort, anyway, so that’s not a problem that would concern us.

Pros:

Very familiar and widespread, as (a descendent of) x86 is still used in just about every PC today.
16-bit isn’t that much more complicated than 8-bit.
All those old DOS assembly tutorials are out there, somewhere, and most of them contain useful information.
Even though current processors can’t execute 16-bit code directly, you can still use one of the dozens of emulators out there, including DOSBox.

Cons:

The segmented memory architecture is weird and hard to explain.
It’s easy to fall into the trap of using skills and tricks that were relevant here in more modern applications, where they simply don’t carry over.
A lot of people just don’t like x86 and think it’s horrible; I don’t understand this, but I respect it.

AVR

Atmel’s AVR line of microcontrollers is pretty popular in the embedded world. One of them powers the Arduino, for example. Thus, there’s actually a built-in market for AVR assembly, although most programmers now use C. Of course, AVR has its own problems, not the least is its odd way of segregating program and data memory.

Pros:

Very relevant today as an embedded platform.
A ton of support online, including tutorials, forums, etc.
The assembly language and architecture, despite a few warts, is actually nice, in my opinion.
Lots of good tools, including a port of GCC.

Cons:

Emulator quality is hit or miss. (AVR Studio was okay when I used it 4 years ago, but it’s non-free and Windows only, and the free options are mostly beta quality.)
The Harvard architecture (totally separate memory spaces for code and data) is used by almost nothing else today, and it’s cumbersome at best.
AVR is very much a microcontroller platform, not a microprocessor one. It’s intended for use in embedded systems, not PCs.

MIPS

MIPS is a bit of an oddball. Sure, it’s used in a few places out there. There’s a port of Android to it, and MIPS CPUs were used in most of the 90s consoles, like the N64 and PlayStation. There’s not much modern development on it, though, except in the Chinese Loongson, which started out as a knock-off, but then became a true MIPS implementation. But its true value seems to be in its assembly language, which is often recommended as a good way to learn the ropes.

Pros:

Fairly simple assembly language and a sensible architecture.
Tools are widespread, including cross-compilers, emulators, and even native versions of Linux.
RISC, if you like that. After all, “RISC is good”, to quote Hackers.

Cons:

Not quite as relevant as it once was. (If I had written this 20 years ago, the choice would probably be between this and x86.)
A bit more complex than the others, which actually removes some of the need for assembly.
I don’t really know it that well, so I would have to learn it first.

The also-rans

Those aren’t the only assembly language platforms out there. There are quite a few that are popular enough that they could fit here, but I didn’t pick them for whatever reason. Some of them include:

PowerPC: Used in Macs from about a decade ago, as well as the consoles of the 2000s (GameCube, Wii, PS3, XBox 360), but I don’t know much about it, and it’s mostly on servers now.
68000: The 16-bit CPU from the Sega Genesis and the original Macintosh, later developed into a true 32-bit processor. It has its supporters, and it’s not that bad, but again, I don’t know it like I do the others.
Z80: This one was used by a few home computers, like the ZX80, Kaypro, and (my favorite) the TRS-80. It’s a transparent remake of the 8080 (forerunner to the 8086) with just enough detail changed to avoid lawsuits. But I know x86 better.
PIC: One of the most popular architectures in the 8-bit embedded world. I’ve read a little bit about it, and I don’t exactly like what I see. Its assembly requires a few contortions that I think distract from the task at hand.
ARM: The elephant in the room. Technically, ARM is a whole family of architectures, each slightly different, and that is the main problem. The tools are great, the assembly language isn’t half bad (but increasingly less necessary). It’s just that there’s too much choice.
MIX: Donald Knuth invented this fictional assembly language for his series The Art of Computer Programming. Then, after 30 years of work, he scrapped it for the next edition, replacing it with the “modern” MMIX. Neither one of them works here, in my opinion. MMIX has a lack of tool support (since it’s not a “real” machine language) and the best tutorial is Knuth’s book. MIX is even worse, since it’s based on the horrible architectures of the 60s and early 70s.

Better times ahead

Hopefully, I’ll come to a decision soon. Then I can start the hard part: actually making assembly language interesting. Wish me luck.

Assembly: back in time

Assembly language, in some form, dates all the way back to the first programmable computers. It started out as machine code, a series of numbers written out and somehow entered into the computer. Soon enough, though, it became a programming language in its own right. Modern assemblers have most of the same features as compilers, and many of them allow you to use macros, constant definitions, and lots of other helpful additions, but the idea is still the same: you’re working with the computer itself, by yourself.

Sacrifices

Using assembly is, in a very real sense, like going back in time. All those things we take for granted in higher languages—functions, classes, even if and while—are lost. If we want them, we have to make them ourselves. Sure, there are standards, and we can use libraries (even those written in other languages), but assembly is…primitive. It doesn’t hold your hand. It doesn’t lead you anywhere. And it’s very much working without a net.

Think of a cabin in the woods. You might go there to relax, to find yourself, to commune with nature, or to plot a string of murders. Assembly, similarly, will make you want to do all these things. (Well, maybe not the murder part. Then again, it can be awfully frustrating.) It’s a return to your roots that appeals to the self-reliant instinct of us all, a programming Walden.

Tools of the trade

Rather than dwelling on what we don’t have when working with assembly, let’s focus on what we do have. There’s not all that much, but remember this: everything we use today is built out of these simple tools.

First, types. Most high-level languages have them. Those that don’t are lying. You typically have various kinds of numbers (integer, floating-point, maybe some higher-precision deal), single characters and strings, and then whatever else the language needs. In assembly, there’s one basic type, the machine word. It’s an integer, the size of which gives us the “bits” of the processor; 64-bit processors have 64-bit words sizes, for example. (Most modern processors have additional support for floating-point numbers, but these are usually kept separate, and lower-end embedded hardware often doesn’t bother.) It’s spartan in the extreme, true, but it’s really all we need, because that same N-bit integer can be used as a number, a character, an index, or a pointer.

Next, assembly language in itself doesn’t have variables (though many assemblers offer a simulation of them), but it does have the register. These are small areas of memory in the innermost core of the processor. They’re ultrafast but truly limited, both in size (almost always a single word) and number. It was a big deal when AMD created their 64-bit extensions to the x86 architecture and added a whopping eight new registers, because that doubled what Intel had offered since the 1980s. The 6502, mainstay of the 80s home computing revolution, managed with only six, and only one of those (the A register) was truly general-purpose. (The others were two index registers—X and Y—a stack pointer, program counter, and status or “flag” register.)

Instead of dedicated control constructs like the for-loop, while-loop, or switch, assembly works on a lower level, using basic instructions. These differ for different architectures, but there are a few that are near-universal. These include arithmetic (adding and subtracting are nearly everywhere, but multiplying and dividing aren’t), bit-twiddling (shifts, AND, OR, and XOR, but also the bitwise rotate, which high-level languages ignore), loading and storing, branching or jumping (conditionally or not), some sort of “indexing” operation, and direct hardware access.

From these little blocks, great things can be accomplished, but it takes time. The tradeoff of development time for execution time is a hard one to make, and it’s almost always uneconomical, which is why assembly is so rare today. But it’s possible and sometimes necessary.

The great divide

Like any movement, assembly language has splintered into two main camps. These are divided on the way an architecture is constructed. Particularly, the sticking point is how many instructions, and of what kinds, are provided. Some like their assembly to be full of features, with instructions for everything (the CISC style). Others prefer a simpler set of building blocks (RISC). The RISC camp has all but won the battle, as most new systems are of that style, and even the CISC holdouts like x86 actually have a RISC core underlying the assembly language that we can access. Still, it’s a good historical footnote, and both sides do have their merits.

To be continued

Next time, we’ll look at an actual assembly language instruction set and architecture. (As of this writing, I haven’t decided which. I’ve narrowed it down to a few possibilities: 6502, early x86, AVR, or MIPS.) I’ll hopefully have links to online assemblers and emulators, so that readers won’t have to download anything. Then we’ll get to see what the old days were really like.

Assembly: back to basics

When it comes to programming languages, there are limitless possibilities. We have hundreds of choices, filling just about every niche you can think of. Some languages are optimized for speed (C, C++), some for use in a particular environment (Lua), some to be highly readable (Python), some for a kind of purity (Haskell), and some for sheer perversity (Perl). Whatever you want to do, there’s a programming language out there for you.

But there is one language that underlies all the others. If you’ll pardon the cliché, there is one language to rule them all. And that’s assembly language. It’s the original, in a sense, as it existed even before the first compilers. It’s the native language of a computer, too. Like native languages, though, each computer has its own. So we can’t really talk about “assembly language” in the general sense, except to make a few statements. Rather, we have to talk about a specific kind of assembly, like x86, 6502, and so on. (We can also talk about “intermediate languages” as a form of assembly, like .NET’s IL or the LLVM instruction set. Here, I’ll mostly be focusing on processor-specific assembly.)

The reason for assembly

Of course, assembly is the lowest of low-level languages, at least of those a programmer can access. And that is the first hurdle, especially in these days of ever-increasing abstraction. When you have Python and JavaScript and Ruby, why would you ever want to do anything with assembly?

The main, overriding purpose of assembly today is speed. There’s literally nothing faster. Nothing can possibly be faster, because everything basically gets turned into assembly, anyway. Yes, compilers are good at optimizing. They’re great. On modern architectures, they’re almost always better than writing assembly by hand. But sometimes they aren’t. New processors have new features that older compiler versions might not know about, for example. And high-level languages, with their object systems and exceptions and lambdas, can get awfully slow. In a few cases, even the relatively tiny overhead of C might be too much.

So, for raw speed, you might need to throw out the trappings of the high level and drop down to the low. But size is also a factor, and for many of the same reasons. Twenty or thirty years ago, everybody worried about the size of a program. Memory was much smaller, hard drives less spacious (or absent altogether, in the earlier days), and networking horrendously slow or nonexistent. Size mattered.

At some point, size stopped mattering. Programs could be distributed on CDs, then DVDs, and those were both seen (in their own times) as near-infinite in capacity. And hard drives and installed memory were always growing. From about the mid 90s to the late 2000s, size was unimportant. (In a way, we’re starting to see that again, as “anti-piracy” measures. Look at PC games that now take tens of gigabytes of storage, including uncompressed audio and video.)

Then, just as suddenly, size became a big factor once again. The tipping point seems to be sometime around 2008 or so. While hard drives, network speeds, and program footprints kept on increasing, we started becoming more aware of the cost of size. That’s because of cache. Cache, if you don’t know, is nothing more than a very fast bit of memory tucked away inside your computer’s CPU. It’s way faster than regular RAM, but it’s much more limited. (That’s relative, of course. Today’s processors actually have more cache memory than my first PC had total RAM.) To get the most out of cache—to prevent the slowdowns that come from needing to fetch data from main memory—we do need to look at size. And there’s nothing smaller than assembly.

Finally, there are other reasons to study at least the basics of assembly language. It’s fun, in a bizarre sort of way. (So fun that there’s a game for it, which is just as bizarre.) It’s informative, in that you get an understanding of computers at the hardware level. And there are still a few places where it’s useful, like microcontrollers (such as the Arduino) and the code generators of compilers.

To be continued

If you made it this far, then you might even be a little interested. That’s great! Next week, we’ll look at assembly language in some of its different guises throughout history.

Transparent terrain with Tiled

Tiled is a great application for game developers. One of its niftiest features is the Terrain tool, which makes it pretty easy to draw a tilemap that looks good with minimal effort.

Unfortunately, the Terrain tool does have its limitations. One of those is a big one: it doesn’t work across layers. Layers are essential for any drawing but the simplest MS Paint sketches, and it’s a shame that such a valuable development tool can’t use them to their fullest potential.

Well, here’s a quick and dirty way to work around that inability in a specific case that I ran into recently.

The problem

A lot of the “indie” tile sets out there use transparency (or a color key, which has the same effect) to make nice-looking borders. The one I’m using here, Kenney’s excellent Roguelike/RPG pack, is one such set.

The problem comes when you want to use it in Tiled. Because of the transparency, you get an effect like this:

Transparent terrain

Normally, you’d just use layers to work around this, maybe by making separate “grass” and “road” layers. If you’re using the Terrain tool, though, you can’t do this. The tool relies on “transitions” between tile types. Drawing on a new layer means you’re starting with a blank slate. And that means no transitions.

The solution

The solution is simple, and it’s pretty much what you’d expect. In a normal tilemap, you might have the following layers (from the bottom up):

The bare ground (grass, sand, water, whatever),
Roads, paths, and other terrain modifications,
Buildings, trees, and other placeable objects.

My solution to the Terrain tool’s limitation is to draw all the “terrain” effects on a single layer. Below that layer would be a “base”, which only contains the ground tiles needed to fill in the gaps. So our list would look more like this:

Base (only needs to be filled in under tiles with transparency),
Terrain, including roads and other mods,
Placeable objects, as before.

For our road on grassland above, we can use the Terrain tool just as described in the official tutorial. After we’re done, we can create a new layer underneath that one. On it, we would draw the base grass tiles where we have the transparent gaps on our road. (Of course, we can just bucket fill the whole thing, too. That’s quicker, but this way is more flexible.) The end result? Something like this:

Filling in the gaps

It’s a little more work, but it ends up being worth it. And you were going to have to do it anyway.

ES6 iterators and generators

With ES6, JavaScript now has much better support for iteration. Before, all we had to work with was the usual for loop, either as a C-style loop or the property-based for...in. Then we got the functional programming tools for arrays: forEach, map, reduce, and so on. Now we have even more options that can save us from needing an error-prone C-style loop or the cumbersome for...in.

The new loop

ES6 adds a new subtype of for loop: for...of. At first glance, it looks almost exactly the same as for...in, but it has one very important difference: for...in works on property names, while for...of loops over property values. Prettified JavaScript variants (like CoffeeScript) have had this for years, but now it comes to the base language, and we get to do things like this:

var vowels = ['a','e','i','o','u','y'];

for (var v of vowels) {
    if (v != 'y') {
        console.log(v);
    } else {
        if (Math.random() < 0.5) {
            console.log("and sometimes " + v);
        }
    }
}

Most developers will, at first, use for...of to iterate through arrays, and it excels at that. Just giving the value in each iteration, instead of the index, will save the sanity of thousands of JavaScript programmers. And it’s a good substitute for Array.forEach() for those of you that don’t like the FP style of coding.

But for...of isn’t just for arrays. It works on other objects, too. For strings, it gives you each character (properly supporting Unicode, thanks to other ES6 updates), and the new Map and Set objects work in the way you’d expect (i.e., each entry, but in no particular order). Even better, you can write your own classes to support the new loop, because it can work with anything that uses the new iterable protocol.

Iterables

Protocols are technically a new addition to ES6. They lurked behind the scenes in ES5, but they were out of sight of programmers. Now, though, they’re front and center, and the iterable protocol is one such example.

If you’ve ever written code in Java, C#, C++, Python, or even TypeScript, then you already have a good idea of what a protocol entails. It’s an interface. An object conforms to a protocol if it (or some other object up its prototype chain) properly implements the protocol’s methods. That’s all there is to it.

The iterable protocol is almost too easy. For a custom iterable object, all you need to do is implement a method called @@iterator that returns an object that meets the iterator protocol.

Okay, I know you’re thinking, “How do I make a @@iterator method? I can’t use at-signs in names!” And you’re right. You can’t, and they don’t even want you to try. @@iterator is a special method name that basically means “a symbol with the name of iterator“.

So now we need to know what a symbol is. In ES6, it’s a new data type that we can use as a property identifier. There’s a lot of info about creating your own symbols out there, but we don’t actually need that for the iterable protocol. Instead, we can use a special symbol that comes built-in: Symbol.iterator. We can use it like this:

var myIterable = {
    [Symbol.iterator]: function() {
        // return an iterator object
    }
}

The square brackets mean we’re using a symbol as the name of the property, and Symbol.iterator internally converts to @@iterator, which is exactly what we need.

Iterators

That gets us halfway to a proper iterable, but now we need to create an object that conforms to the iterator protocol. That’s not that hard. The protocol only requires one method, next(), which must be callable without arguments. It returns another object that has two properties:

value: Whatever value the iterator wants to return. This can be a string, number, object, or anything you like. Internally, String returns each character, Array each successive value, and so on.
done: A boolean that states whether the iterator has reached the end of its sequence. If it’s true, then value becomes the return value of the whole iterator. Setting it to false is saying that you can keep getting more out of the iterator.

So, by implementing a single method, we can make any kind of sequence, like this:

var evens = function(limit) {
    return {
        [Symbol.iterator]: function() {
            var nextValue = 0;
            return {
                next: function() {
                    nextValue += 2;
                    return { done: nextValue > limit, value: nextValue };
                }
            }
        }
    }
}

for (var e of evens(20)) {
    console.log(e);
} // prints 2, 4, 6..., each on its own line

This is a toy example, but it shows the general layout of an iterable. It’s a great idea, and it’s very reminiscent of Python’s iteration support, but it’s not without its flaws. Mainly, just look at it. We have to go three objects deep to actually get to a return value. With ES6’s shorthand object literals, that’s a bit simplified, but it’s still unnecessary clutter.

Generators

Enter the generator. Another new addition to ES6, generators are special functions that give us most of the power of iterators with much cleaner syntax. To make a function that’s a generator, in fact, we only need to make two changes.

First, generators are defined as function*, not the usual function. The added star indicates a generator definition, and it can be used for function statements and expressions.

Second, generators don’t return like normal functions. Instead, they yield a value. The new yield keyword works just like its Python equivalent, immediately returning a value but “saving” their position. The next time the generator is called, it picks up right where it left off, immediately after the yield that ended it. You can have multiple yield statements, and they will be executed in order, one for each time you call the function:

function* threeYields() {
    yield "foo";
    yield "bar";
    yield "The End";
}

var gen = threeYields();

gen.next().value; // returns "foo"
gen.next().value; // returns "bar"
gen.next().value; // returns "The End"
gen.next().value; // undefined

You can also use a loop in your generators, giving us an easier way of writing our evens function above:

var evens = function(limit) {
    return {
        [Symbol.iterator]: function*() {
            var nextValue = 0;
            while (nextValue < limit) {
                nextValue += 2;
                yield nextValue;
            }
        }
    }
}

It’s still a little too deep, but it’s better than writing it all yourself.

Conclusion

Generators, iterators, and the for...of loop all have a common goal: to make it easier to work with sequences. With these new tools, we can now use a sequence as the sequence itself, getting its values as we need them instead of loading the whole thing into memory at once. This lazy loading is common in FP languages like Haskell, and it’s found its way into others like Python, but it’s new to JavaScript, and it will take some getting used to. But it allows a new way of programming. We can even have infinite sequences, which would have been impossible before now.

Iterators encapsulate state, meaning that generators can replace the old pattern of defining state variables and returning an IIFE that closes over them. (For a game-specific example, think of random number generators. These can now actually be generators.) Coroutines and async programming are two other areas where generators come into play, and a lot of people are already working on this kind of stuff. Looking ahead, there’s a very early ES7 proposal to add comprehensions, and these would be able to use generators, too.

Like most other ES6 features, these aren’t usable by everyone, at least not yet. Firefox and Chrome currently have most of the spec, while the others pretty much have nothing at all. For now, you’ll need to use something like Babel if you need to support all browsers, but it’s almost worth it.