First glance: C++17, part 2

Last time, we got a glimpse of what the future of C++ will look like from the language perspective. But programming isn’t just about the language, so here are some of the highlights of the C++ Standard Library. It’s getting a bit of a makeover, as you’ll see, but not enough to cover its roots.

Variants

Without even looking through the whole list of changes and additions, I already knew this was the big one, at least for me. The variant is a type-safe (or tagged) union. It’s an object that holds a value of one type chosen from a compile-time list. You know, like C’s union. Except variant keeps track of what kind of value it’s holding at the moment, and it’ll stop you from doing bad things:

variant<int, double> v;
v = 42;

// This works...
auto w = get<int> v;    // w = 42

// ...but this one throws an error
auto u = get<double> v; // nope!

Optional values

This one’s similar, and it was supposed to be in C++14. An optional is either a value, or it’s not. In that, it’s like Haskell’s Maybe. You can use it to hold the value of a function that can fail, and it works like an error signal. When converted to a boolean (as in an if), it acts as true if it contains a value, or false if it doesn’t. Not huge, but a bit of a time-saver:

optional<unsigned int> i;

// some function that can return an int or fail
i = f();

if (i)
{
    // work with a proper value
}
else
{
    // handle an error condition
}

Any values

The third of this little trinity is any, an object that can—as you might imagine—hold a value of any type. You’re expected to access the value through the any_cast function, which will throw an exception if you try the wrong type. It’s not quite a dynamic variable, but it’s pretty close, and it’ll likely be faster.

apply

If you’ve ever used JavaScript, you know about its apply method. Well, C++ will soon have something similar, but it’s a free function. It calls a function (or object or lambda or whatever) with a tuple of arguments, expanding the tuple as if it were a parameter pack.

Searching

Yes, C++ lacked a standard way of searching a sequence for a value until now. Rather, it lacked a general way of searching for a value. Some searches can be made faster by using a different algorithm, and that’s how C++17 upgrades std::search. And they’re nice enough to give you a couple to get started: bayer_moore_searcher and bayer_moore_horspool_searcher. No points for guessing which algorithms those use.

Clamping

It’s common to need to clamp a value to within certain bounds, but programming languages don’t seem to realize this. Libraries have functions to this, but languages rarely do. Well, C++ finally did it. That’ll instantly shave off 50% of the lines of code working with lighting and colors, and the rest of us will find some way to benefit.

Mathematical special functions

C++ is commonly used for numeric computation, but this set of functions is something else. They likely won’t be of interest to most programmers, but if you ever need a quick beta function or exponential integral, C++17 has got you covered.

Filesystem

Alright, I’ll admit, I was holding back. Everything above is great, but the real jewel in the C++17 Standard Library is the Filesystem library. If you’ve ever used Boost.Filesystem, you’re in luck! It’s the same thing, really, but it’s now standard. So everybody gets to use it. Files, paths, directories, copying, moving, deleting…it’s all here. It certainly took long enough.

Still not done

That’s not nearly everything, but those are my favorite parts. In next week’s finale, we’ll switch to the lowlights. We’ll see those features that just didn’t make the cut.

First glance: C++17, part 1

C++ is a language that is about as old as I am. Seriously. It was first called “C++” in December 1983, two months after I was born, although it had been in the works for a few years before that. So it’s an old language, but that doesn’t mean it’s obsolete or dead. No, far from it. In fact, the latest update to the language, called C++17, is scheduled for release in—you guessed it—2017, i.e., next year.

Why is that important? Well, if you know the history of C++, you know the story of its standardization. The first true standard only came out in 1998, and it was only then that all the template goodness was finally available to all. (Let’s all try to imagine Visual C++ 6 never happened.) Five years later, in 2003, we got a slight update that didn’t do much more than fill in a few blanks. Really, for over a decade, C++ was essentially frozen in time, and that was a problem. It missed the dot-com boom and the Java explosion, and the growth of the Internet and dynamic scripting languages seemed to relegate it to the dreaded “legacy” role.

Finally, after what seemed like an eternity, C++11 came about. (It was so delayed that its original codename was C++0x, because everyone thought it’d be out before 2010.) And it was amazing. It was such a revolution that coders speak of two different languages: C++ and Modern C++. Three years later, C++14 added in a few new bits, but it was more evolutionary than revolutionary.

What C++ did, though, was prepare programmers for a faster release schedule. Now, we’ve seen how disastrous that has been for projects like Firefox, but hear them out. Instead of waiting forever for all the dust to settle and a new language standard to form, they want to do things differently, and C++17 will be their first shot.

C++ is now built on a model that isn’t too different from version control systems. There’s a stable trunk (standard C++, of whatever vintage), and that’s the “main” language. Individual parts are built in what they call Technical Specifications, which are basically like Git branches. There’s one for the standard library, networking, filesystem support, and so on. These are largely independent of the standard, at least in development terms. When they’re mature enough, they’ll get merged into the next iteration of Standard C++. (Supposedly, that’ll be in 2019, but 2020 is far more likely.) But compilers are allowed—required, actually, as the language needs implementations before standardization—to support some of these early; these go under std::experimental until they’ve cooked long enough.

So C++17 is not exactly the complete overhaul of C++11, but neither is it the incremental improvement of C++14. It stands between the two, but it sets the stage for a future more in line with, say, JavaScript.

New features

I have neither the time nor the knowledge to go through each new feature added to C++17. Instead, I’ll touch on those I feel are most important and interesting. Some of these are available in current compilers. Others are in the planning stages. None of that matters as long as we stay in the realm of theory.

Fold expressions

Okay, I don’t care much for Haskell, but these look pretty cool. They take a parameter pack and reduce or fold it using some sort of operation, in the same way as Haskell’s foldl and foldr. Most of the binary operators can be used, which gives us some nifty effects. Here are a few basic examples:

// Returns true if all arguments are true
template <typename... Args>
bool all(Args... args) { return (... && args); }

// Returns true if *any* arguments is true
template <typename... Args>
bool any(Args... args) { return (... || args); }

// Returns the sum of all arguments
template <typename... Args>
int sum(Args... args) { return (args + ... + 0); }

// Prints all values to cout (name references JS)
template <typename... Args>
void console_log(Args&&... args)
    { (std::cout << ... << args) << '\n'; }

Yeah, implementing any, all, and even a variadic logging function can now be done in one line. And any functional fan can tell you that’s only the beginning.

Structured bindings

Tuples were a nice addition to C++11, except that they’re not terribly useful. C++, remember, uses static typing, and the way tuples were added made that all too evident. But then there’s the library function std::tie. As its name suggests, one of its uses is to “wire up” a connection between a tuple and free variables. That can be used for a kind of destructuring assignment, as found in Python. But C++17 is going beyond that by giving this style of value binding its own syntax:

using Point3D = tuple<double, double, double>;

// This function gives us a point tuple...
Point3D doSomething() { /* ... */ }

// ...but we want individual X/Y/Z

// With std::tie, we have to do this:
//
// double x, y, z;
// std::tie(x,y,z) = doSomething();

// But C++17 will let us do it this way:
auto [x,y,z] = doSomething();

Even better: this works with arrays and pairs, and it’s a straight shot from there to any other kind of object. It’s a win all around, if you ask me.

if initializers

This one’s less “Wow!” than “Finally!”, but it’s good to have. With C++17, you’ll be able to declare a variable inside the conditional of an if or switch, just like you’ve been able to do with (old-style) for loops for decades:

if (int value; value >= 0)
{
    // do stuff for positive/zero values
}
else
{
    // do stuff for negative values
    // Note: value is still in scope!
}

Again, not that big a deal, but anything that makes an overcomplicated language more consistent is for the best.

constexpr if

This was one of the later additions to the standard, and it doesn’t look like much, but it could be huge. If you’ve paid any attention to C++ at all in this decade, you know it now has a lot of compile-time functionality. Really, C++ is two separate languages at this point, the one you run and the one that runs while you compile.

That’s all thanks to templates, but there’s one big problem. Namely, you can’t use the run-time language features (like, say, if) based on information known only to the compile-time half. Languages like D solve this with “static” versions of these constructs, and C++17 gives us something like that with the constexpr if:

template<typename H, typename... Ts>
void f(H&& h, Ts&& ...ts)
{
    doItTo(h);

    // Now, we need to doItTo all of the ts,
    // but what if there aren't any?
    // That's where constexpr if helps.
    if constexpr(sizeof...(ts) > 0)
        doItTo(ts...);
}

If implemented properly (and I trust that they’ll be able to do that), this will get rid of a ton of template metaprogrammming overhead. For simple uses, it may be able to replace std::enable_if and tag dispatch, and novice C++ programmers will never need to learn how to pronounce SFINAE.

Continuation

Those are some of my favorite features that are on the table for C++17. In the next post, we’ll look at the changes to the standard library.

On game jams

This post is going up in August, and that’s the month for the summer version of everyone’s favorite game jam, Ludum Dare. But I’m writing this at the end of June, when there’s still a bit of drama regarding whether the competition will even take place. If it does, then that’s great. If not, well, that’s too bad. Neither outcome affects the substance of this text.

Ludum Dare isn’t the only game jam on the market, anyway. It’s just the most popular. But all of them have a few things in common. They’re competitive programming, in the sense of writing a program that follows certain rules (such as a theme) in a certain time—two or three days, for example, or a week—with the results being judged and winners declared. In this, it’s a little more serious than something like NaNoWriMo.

And it’s not for me. Now, that’s just my opinion. I’m not saying game jams are a bad thing in general, nor am I casting aspersions at LD in particular. I simply don’t feel that something like this fits my coding style. It’s the same thing with NaNoWriMo, actually. I’ve never truly “competed” in it, though I have followed along with the “write 50,000 words in November” guideline. Again, that’s because it’s not my style.

One reason is shyness. I don’t want people to see my unfinished work. I’m afraid of what they’d say. Another reason is the schedule, and that’s far more of a factor for a three-day game jam than a month-long writing exercise. I don’t think I could stand to code for the better part of 48 or 72 hours. Call it flightiness or a poor attention span, but I can’t code (or write) for hours on end. I have to take a break and do something else for a while.

Finally, there are the rules themselves. I don’t like rules intruding on my creative expression. In my view, trying to direct art of any kind is a waste of time. I have my own ideas and themes, thank you very much. All I need from you is the gentle nudge to get me to put them into action. That’s why I do a kind of “shadow” NaNoWriMo, instead of participating in the “real thing”. It seems antisocial, but I feel it’s a better use of my time and effort. What’s important is the goal you set for yourself. Climbing into a straitjacket to achieve it just doesn’t appeal to me.

But I do see why others look at game jams differently. They are that nudge, that impetus that helps us overcome our writing (or coding) inertia. And that is a noble enough purpose. I doubt I’ll join the next Ludum Dare or whatever, but I won’t begrudge the existence of the game jam. It does what it needs to do: it gets people to express themselves. It gets them to write code when they otherwise wouldn’t dare. There’s nothing bad about that, even if it isn’t my thing.

Software internals: Trees

We’ve talked quite a bit about data structures in this series, although not as much in recent entries. We’ve seen lists and arrays, two of the most important structures around, but they both have a severe limitation: they’re linear. While that’s great for a lot of data that computer programs manipulate, there comes a time when you need to represent both the data and its organization, and that calls for something more substantial.

The tree is a data structure that represents a hierarchy. Like a real tree, it has a root. It has branches and leaves. And a collection of data trees is also called a “forest”. But those similarities are a bit artificial; the terms were chosen for the analogy. What these trees do isn’t much like the natural ones. To a programmer, however, they’re far more useful.

In the abstract

A tree starts with a root. This is a data node, like one in a linked list, and it can hold one value of any type. If you’ll recall, nodes in linked lists also carry along a pointer to the following node (and the preceding one, for a doubly-linked list). Well, trees do something like that, too. The root node, in addition to its data value, also contains a list of pointers to its children.

These children are root nodes, and thus trees in their own right. Each will have its own data and list of children, and those, in turn, will have the same. If the child list is empty, then that node is at the “end” of one branch; it’s a leaf node. (If there are children, you might be wondering, are there parents? Yes. A node having another node in its child list is that second node’s parent. But data trees are asexual: each child only has one parent.)

Trees are a perfect example of recursion. A tree’s children are also trees—empty ones, if they’re leaf nodes—and they can be treated as such. The most used algorithms lend themselves naturally to a recursive style even in imperative languages. So that’s how we’ll be looking at them here.

Growing a tree

Trees consist of nodes, and each node has two main parts: a value and a list of children. Thus, a simple way of representing a tree is as an object (or similar structure) with two fields, like so. (This example uses TypeScript for illustration purposes.)

class TreeNode {
    value: any;
    children: TreeNode[];
}

Not that hard, huh? It’s nothing more than putting our words into code. A TreeNode‘s value could be anything, though you’d probably restrict it in a real application, and its children are more TreeNodes. Adding, removing, and altering children works in almost the same way as with linked lists, at least with this “basic” tree setup.

Traversing the tree (sometimes called walking it) is a different matter. Here, we want to go through each of the root’s children, then their children, and so on, until we’ve visited every node in the tree. There are two main ways to do this, but the depth-first method is more common and easier to explain. In code, it looks about like this:

function traverse(tree, callback) {
    for (var child of tree.children) {
        traverse(child, callback);
        callback(child);
    }
}

This algorithm is “depth-first” because it visits each of a node’s children before moving to its siblings. This is where the recursion comes into play. We loop through each of the children of the root node. They, in turn, become the roots of a new depth-first traversal. Only when we’ve exhausted the “depth” of the tree—when we reach a leaf node—do we start doing anything. (We’re assuming here that our tree doesn’t have any cycles, child nodes that point to higher levels of the tree. Those make things much more difficult.)

Now, there are a lot of things you can do with this function, but I’ve limited it to a simple, nebulous callback that is run for each node. In the abstract, that’s all there is to it. That simple block of code effectively describes the operation of some code generators, AI systems, and many other complex actions.

Special kinds of trees

The general tree described above suffices for a great many applications. Sometimes, though, it’s more efficient to be more restricted. Computer scientists have therefore developed a number of specializations of the tree for different uses.

The binary tree is probably the most well-known of these. It’s called “binary” because each node has exactly two children: left and right. These children are themselves trees, of course, but empty trees are often represented as null values. Here’s what it looks like in code, once again using TypeScript:

class BinaryTree {
    data: any;
    left: BinaryTree;
    right: BinaryTree;
}

Binary trees add an extra wrinkle to the depth-first traversal we say earlier. Now, we have three possible ways to do things: pre-order, in-order, and post-order traversal. The only thing that changes is when we process the “current” node.

function preOrder(node, callback) {
    if (node == null) return;

    callback(node.data);
    preOrder(node.left, callback);
    preOrder(node.right, callback);
}

function inOrder(node, callback) {
    if (node == null) return;

    inOrder(node.left, callback);
    callback(node.data);
    inOrder(node.right, callback);
}

function postOrder(node, callback) {
    if (node == null) return;

    postOrder(node.left, callback);
    postOrder(node.right, callback);
    callback(node.data);
}

Each approach has its pros and cons. Pre-order works for copying trees, and it’s essentially how expressions are represented in abstract syntax trees or ASTs, an important part of compilers and interpreters. In-order traversal is best for sorted trees, where an extra “key” value is added to each node, arranged so that it’s greater than all the keys in its left subtree but smaller than all those in the right; these binary search trees are used for set structures, lookup tables, and the like. (Maybe we’ll look at them in more detail later in the series.) Finally, post-order is the option of choice for deleting a node’s children and for making RPN expressions.

The binary tree also has a few of its own alterations. The red-black tree uses a “color bit” to keep the tree balanced. This is helpful because most algorithms working on binary trees work best when there the total numbers of left and right nodes are about equal. (In the worst case, all of a tree’s valid descendants are on one “side”. Then, you’re left with just a more expensive linked list.) The AVL tree is a similar variation that has its own pluses and minuses.

B-trees are a special case that sits between binary trees and the general category of trees. They’re allowed to have more than two children, but only up to a certain number. The trick is that each node is given a number of sort keys (like the one in a binary search tree), one for every child past the first. These are calculated so they fit “between” the children, allowing them to serve as index values, speeding up traversal. B-trees are mostly used in databases and filesystems—the two are practically the same thing these days—including the Windows filesystem of choice, NTFS, Apple’s HFS+, and Ext4 and Btrfs on Linux.

Seeing the forest

Trees are everywhere in programming. Whenever you have a set of data with obvious parent-child relationships, it’s probably going to be represented in tree form. Programming languages often don’t offer direct access to the trees, but they’re in there. Sets, maps, and hash tables all use them under the hood. Any kind of parsing, such as XML or a game’s custom scripting language, is going to be tree-based. Even if you never see them, you’ll be using them somewhere.

Fortunately, if you’re not dealing with the innards of the structures themselves, there’s not much you need to worry about. All the hard work has been taken care of. No one needs to manually balance a binary search tree, for instance. But if you work with them directly, it helps to know how they work.

Statistical sampling in JavaScript

It’s 2016, and that’s an election year, which means we’ll be spending the rest of the summer (and half of the fall) watching America’s most ridiculous spectator sport. The pundits and panels and polls are all quite fun, but I find that the methodology is far more interesting than the results.

One of the greatest weapons in the pollster’s arsenal is sampling, and one of those footnotes you’ll see in opinion polls in the coming weeks is the margin of error. These are basic concepts of statistics, but most people might not know how they work. Worse, some programmers might not. So here’s my attempt at rectifying that.

Definitions

Sampling is, in effect, a way of drawing conclusions about a population (such as a state or country) based on surveying only a small fraction of its members. It’s not perfect, but it turns out that, say, 500 people are actually a pretty good indicator of the rest of the nation…as long as you pick the right 500 people. In terms relatable to current events, a presidential poll that only asks people from rural parts of the South is going to get very different results from one that surveys nothing but New York City. That’s selection bias, and it’s one of the hardest things for pollsters to avoid. They’ve got a few ways around it, such as cold-calling random phone numbers, but it’s always a battle.

That very randomness is why sampling works in the first place. If you truly choose your data points (i.e., the people you ask) randomly, then they will, when put together, approximate the “true” nature of things. The more you get, the closer your picture is to the real thing. Eventually, as you’d expect, your sample size approaches the total population; at that limit, the sample obviously represents the whole to perfection.

For smaller samples, however, things aren’t so perfect. Let’s say we have two candidates: H and T. (You get no points for guessing what those stand for.) In “reality”, they aren’t quite even. Say that H has 50% of the vote, T has 45%, and the remaining 5% are undecided, independent, or whatever. Now, take some sort of random number generator and set it to give numbers from 1 to 100. Everything up to 50 is a “vote” for candidate H, 51-95 are for T, and 96-100 are undecided.

After a single generated number, you’ve got a score of 1 for one of the three, 0 for the other two. Not very predictive, but keep going. With a sample size of 10, my results were H 6, T 3, and 1 undecided. Yours might be different, but notice that that’s already looking a lot closer to the true numbers. Give it 100 tries, and it’s probably even better. (Doing this three times with a different generator, my results were: 52 T, 44 H, 4 undecided; 48 T, 47 H, 5; and 57 T, 40 H, 3. Clearly, this RNG leans right.)

The larger the sample size, the more likely the sample will match the population. If you don’t mind a bit of math, then we can look at just how good a match we can get. The basic formula is e = 1 / sqrt(N), where N is the sample size and e is the margin of error. So, for our sample size of 100 above, the math says that our expected error is somewhere within 1/sqrt(100) = 1/10 = 0.1, or 10% either way. Or, as the polls put it, ±10%. Most samples like this are assumed to be conducted at a 95% confidence interval; this basically means that there’s a 95% chance that the true results lie within that margin. (Note, however, that our third poll in the last paragraph didn’t. It’s an outlier. They happen.)

As counter-intuitive as it may seem, this formula doesn’t really depend on the population size at all, as long as the sample is sufficiently small in relation. For national polls surveying a thousand or so people, that assumption holds, so they can safely tout a margin of error of ±3% from their sample of 1,016.

The code

Now we’ll look at how you can do your own sampling. This isn’t just for opinion polls, though. Any kind of analytics could make use of sampling.

The basic function, in JavaScript, would look something like this:

/*
 * Select `k` random choices from a population
 */
function sample(population, k) {
    var psize = population.length;
    var choices = new Set();
    var result = [];

    // Choose `k` elements from the population, without repeats
    for (var i = 0; i < k; i++) {
        var ch;

        do {
            ch = Math.trunc(Math.random() * psize);
        } while (choices.has(ch))

        choices.add(ch);
    }

    for (var c of choices) {
        result.push(population[c]);
    }

    return result;
}

As always, this isn’t the only way to do what we’re trying to do, but it’s very close to what Python’s random.sample function does, so the idea is sound. To get our sample, we generate a number of array indexes, and the Set guarantees we won’t get any repeats. Our result is a new array containing only those elements we chose. We can then do whatever we need.

But how do we determine what sample size we need? Well, one way is to work backwards from the margin of error we want. Remember that this usually won’t depend on the size of the population.

/*
 * Given a desired margin of error,
 * find an appropriate sample size.
 */
function sampleSize(margin) {
    return Math.round(1 / (margin*margin), 0);
}

This is nothing more than a rearrangement of our formula. Instead of saying e = 1 / sqrt(N), we move things around to solve for N: N = 1 / e^2. Rounding to the nearest integer is just for convenience.

In a nutshell, that’s all there is behind those opinion polls you’ll be hearing so much about over the coming months. Pick enough people at random, and you’ll get a picture of the general populace. It’ll be a blurry picture, but even that’s enough to make out some of the larger details.

Software internals: Searching

Big Data is all the rage today, but what good is all that data if you can’t find what you’re looking for? Google, among others, has made billions of dollars by using and developing search algorithms, the backbone of analytics. While I couldn’t even begin to show you how their algorithms work (nor would you likely ever need to know), we can look at their precursors, the simple searches common to many programming languages.

Brute force

The simplest way to search for an item in a list (a value in an array, a substring in a larger body of text, or whatever) is to loop through the list and check each value to see if it’s the one you want. This is a linear search, and it’s not hard to implement, even in JavaScript:

function linear_search(array, value) {
    for (var i = 0; i < array.length; i++) {
        if (array[i] == value) {
            return i;
        }
    }

    return -1;
}

There’s more than one way to do it, but that’s the most straightforward. It’s also the “classic” form familiar to programmers of C and related languages. But it comes at a price: execution time. Linear searches are O(n); the time they take is dependent on the number of items you’re searching over. (Statistically speaking, a linear search of a list of size N will, on average, take N/2 iterations.) Surely we can do better, right?

Divide and conquer

If our list is sorted, we can take an approach that you might recognize from “higher or lower” number guessing games. The binary search starts at a point somewhere in the middle of a list and checks to see if it’s at the right spot. If it’s not, then—because the list is sorted—it knows which way to go for the next guess. It looks a bit like this:

function binary_search(array, value) {
    var idmin = 0;
    var idmax = array.length - 1;

    while (idmin < idmax) {
        // Note: older versions of JS use Math.floor() here
        var midpoint = idmin + Math.trunc((idmax - idmin) / 2);

        if (array[midpoint] == value) {
            return midpoint;
        }
        else if (array[midpoint] < value) {
            idmin = midpoint + 1;
        }
        else {
            idmax = midpoint - 1;
        }
    }

    return -1;
}

Again, there are other ways of writing this function, but the one shown here works for our purposes.

The binary search requires a sorted array. That’s how it’s able to deduce where in the list, relative to its pointer, the correct value is. If you ignore the sorting step (probably because you were going to do it anyway), then it comes out to O(log n) time. That’s faster than a simple linear search, and good enough to put a version of the algorithm into the standard libraries of C (bsearch()), C++ (std::binary_search()), and Java (Arrays.binarySearch()), among others.

Substrings

Searching strings for a specific substring is its own problem with its own considerations. If the substring you’re looking for is only a single character, then the whole exercise degenerates into a case of the above, and you can use linear search (as C’s strchr() likely does behind the scenes). But searching for multi-character strings is a bit harder. A naive approach might look like this:

function substring_search(text, ss) {
    var initial = ss[0];
    var found = false;

    for (var i = 0; i < text.length - (ss.length - 1); i++) {
        if (text[i] == initial) {
            found = true;
            for (var j = 1; j < ss.length, found == true; j++) {
                if (text[i+j] != ss[j]) {
                    found = false;
                }
            }

            if (found) {
                return i;
            }
        }
    }

    return -1;
}

Basically, this is a linear search for the first character of the search string, but with a twist. Whenever that first character is found, an inner loop runs, checking the following characters in the text to see if they match as well. If they do, then we’ve got a perfect match. Otherwise, we keep going from where we left off.

This “dumb” approach takes O(n+m) time, but it’s hard to improve on it without creating some more complicated data structures. Fortunately, that’s exactly where the series is going next, so stay tuned.

Godot game, part 2: Abort, retry, fail?

I don’t believe in fate. Problem is, fate doesn’t seem to care.

The week started off just fine. I got a bit of work done on the game late Wednesday night and early Thursday morning. Then, disaster struck.

For most of the next few days, I was almost totally bedridden, shivering and sweating in turns, coughing my head off, getting dizzy every time I stood up, and generally feeling awful. I figured it was nothing more than the usual allergy flareup of late spring/early summer at first, but as the days wore on, I suspected something more was afoot.

It was my mother’s idea to take me to the ER yesterday evening. I’m a poor, white man living in the rural South, so that’s effectively my only option, and it’s one I only like using as a last resort. When I go, it’s more to find out exactly what’s wrong with me than out of any hope that they can fix it. The ease of mind is just as valuable as the diagnosis and treatment.

After a 20-mile drive down there (again, rural South) and about half an hour of waiting, the doctor gave the verdict: bronchitis. Nothing worse than that, thank goodness, but that’s already bad enough, if you ask me. In the grand spirit of American doctors, he gave me a round of antibiotics (for what is probably a viral infection, naturally) and some lovely opioid-based cough syrup that is about as appealing to me as the coughing itself. Honestly, I can’t complain too much; I know from experience that there’s only so much you can do for bronchitis, apart from letting it run its course. But my mind is at ease, and that’s a far better medicine.

What does this mean for my grand “Godot Game Month” project, you ask? Well, total failure. Nothing less. Even if I felt 100% better today, I doubt I could work hard enough to catch up on the days I’ve lost. And I don’t feel much better. (Just as I wrote that sentence, I had another mild fit of coughing. Fortunately, nothing bad came of it. Correction: more bloody mucus. Yay.)

I know my limits. I know how far I can push them. I hate to admit defeat, but I am well aware when something is beyond my capability. This is one of those cases.

So, to sum up, the game is on hold, indefinitely. Once I get at least somewhat healed, I’ll start working on it again, but as a long-term project, something I do in my spare time. I tempted fate with this idea, and she slapped me down for it. I’ve learned my lesson; it won’t happen again.

As for the other posts, I have a nice queue full of them, enough to take me through the middle of July. Those will proceed as scheduled. Hopefully, by the time I need*to write again, I’ll feel like doing it.

A Godot game, part 1

So I’ve really been neglecting the “code” aspect of Prose Poetry Code lately. I’ve been far more focused on writing than writing programs, and it shows. Let’s change that. Let’s make a game.

The month of June has five Wednesdays this year. The first one is today, June 1. That’s the starting line. By June 29, four weeks from the day this is posted, I hope to have…something. I don’t know what it’ll be, but I want it to be more than my previous aborted attempts at actually creating a game.

Here’s the idea. I think Godot is a great game engine for indie developers on a budget, as I’ve said more than once on here. Now, I’m going to put my (lack of) money where my mouth is. My game will be simple enough, a little word-building game that’s a bit like a cross between Tetris and Boggle. Blocks containing letters fall from the top of the screen, and the player has to arrange them to form English words. Each word formed counts for points (based on the length of the word and possibly other factors) and it is removed from the board after it is made. If the player runs out of space on the board, it’s Game Over.

That’s what I have so far. I’ll worry about combos, difficulty levels, and things like that along the way. Think of this more like a game jam entry than a design document. Hopefully, by starting small, I’ll manage to end the month with something playable.

For those interested, here are the technicalities:

  1. Although I’m writing this post on May 4, and I’ve already created an empty Godot project, I won’t start coding until this post is up, on June 1.

  2. The goal is to have a playable game by June 29. It won’t be a finished product by any means, but I want something that could reasonably be called an alpha by that time.

  3. I’m using Godot for the engine, and the assets are coming from OpenGameArt. At the end of this, I may put all of it up for download, but I don’t know yet.

  4. Each Wednesday, I’ll post a bit of a status report. Those will make up the Code posts for this month. (Software Internals will return the first week of July.)

  5. Other posts (worldbuilding, conlang, etc.) are still on schedule. I write those beforehand, mostly when I’m bored, so I should have a month of them stored up.

You’re welcome to play along at home. This isn’t a competition, though. If I can’t do it, then I can’t. So don’t expect me to push myself like I do for NaNoWriMo. Besides, there aren’t really any metrics for development like a word count. Counting lines of code isn’t that helpful, because nobody can predict how many of them you’d need. And finally, this is meant to be fun and educational for me, but I hope you take the same time to explore for yourself.

With that, I’m out for this week. Wish me luck. I’ll need it.

Software internals: Sorting

We’ve looked at quite a few data structures in this series, from simple arrays to objects. Now, let’s turn our focus to algorithms. One class of algorithms is very important to high-level programming: the sorting algorithms. For decades, computer scientists have been developing new ways to sort data, trying to balance the needs of size and speed, and there’s no silver bullet. Some sorting algorithms are good for general use, but they fall apart on certain degenerate cases. Others are bad all around, but they’re interesting from a teaching perspective.

In this post, we’ll look at three specific algorithms that illustrate the evolution of sorting and some of its trade-offs. Mostly, we’ll treat each one as working on a simple array of integers that we want sorted in increasing order, but it’s easy enough to generalize the sort function to whatever you need. Also, the code will all be in JavaScript, but I won’t be using any trickery or obfuscation, so it shouldn’t be too hard to convert to your favorite language. (Of course, most languages—JavaScript included—already have sorting functionality built into their standard libraries, so it’s better to use those than to write your own.)

Simple and stupid

Bubble sort is the first sorting algorithm many coders see. It’s stupidly simple: move through a list, comparing each element to the one before it. If they’re out of order, swap them. When you get to the end, start over, and repeat until the list is sorted. In code, it might look like this:

function bubblesort(arr) {
    var len = arr.length;

    do {
        var swapped = false;

        // run through the list
        // compare each element to the last
        // swap those that are out of order
        for (var i = 1; i < len; i++) {
            if (arr[i] < arr[i-1]) {
                // swap elements
                var temp = arr[i];
                arr[i] = arr[i-1];
                arr[i-1] = temp;
                swapped = true;
            }
        }

        // optimization (see below)
        len--;

        // repeat until everything's sorted
    } while (swapped);
}

Even this is already optimized a bit from the basic bubble sort method. Because of the way higher values “bubble up” to the end of the list (hence the name “bubble sort”), we know that, after n iterations, the last n values will be sorted. Thus, the line len-- tells the loop to ignore those values.

That optimization doesn’t help the overall performance of bubble sort. It remains O(n²), not very good. But bubble sort’s value is in its simplicity. You wouldn’t use it in the real world, but it’s a good way to show how sorting works. It’s easy to follow along with it. Take a handful of small numbers and try sorting them by hand, maybe even on paper. It’s not that hard.

Quick and dirty

For a few decades, the go-to choice for sorting has been quicksort. It’s the reason that C’s standard sorting function is called qsort, and it’s still used in many languages, both for performance and simplicity. (Haskell lovers will be quick to point out that that language can implement quicksort in two lines of code.)

Quicksort works by a recursive approach that boils down to “divide and conquer”. Imagine a list as a line of values. Now pick one of those values as the pivot. Take everything less than the pivot and sort it, then take everything greater, and sort that. (This is the recursive step.) Your sorted list is everything in the first sublist, then all the values equal to the pivot, then the second sublist. Or, in code:

function quicksort(arr) {
    if (arr.length <= 1) {
        // recursion base case
        return arr;
    }

    // find the pivot value
    var pivotIndex = parseInt(arr.length / 2);
    var pivotValue = arr[pivotIndex];

    // these will hold our sublists
    var left = [], right = [], pivots = [];

    // partition the list into three:
    // 1. less than the pivot
    // 2. greater than the pivot
    // 3. equal to the pivot
    for (var i = 0; i < arr.length; i++) {
        if (arr[i] < pivotValue) {
            left.push(arr[i]);
        } else if (arr[i] > pivotValue) {
            right.push(arr[i]);
        } else {
            pivots.push(arr[i]);
        }
    }

    // the sorted list is (left + pivots + right)
    return quicksort(left).concat(pivots).concat(quicksort(right));
}

That’s essentially an expanded version of the Haskell two-liner. It’s not the best from a memory or speed standpoint, but it works to show you the way the algorithm works. Another way works in-place, directly operating on the array by swapping elements around so that all the values less than the pivot are placed before it, then putting those greater after it, and then recursing on the resulting partially-sorted list. That one is a lot faster, but it’s a bit harder to grasp. It also needs either a helper function or a bit of logic to allow both sorting of an entire list and of a portion of one.

The gains from that added complexity are huge, though. With it, quicksort becomes one of the faster sorting methods around, and its space efficiency (with the in-place version) can’t be beat. That’s why quicksort remains so popular, even despite some well-known shortcomings. It’s good enough for most purposes.

Good all around

The last of the “simple” sorts we’ll look at is merge sort. This one is very much like quicksort in that it uses a strategy of repeatedly subdividing a list, but it works without a pivot element. At each step, it breaks the list in half and sorts each half separately. (A list with only one element is sorted by definition, so that’s the stopping point.) Then, it merges those halves an element at a time. Here’s the code:

function mergesort(arr) {
    // An element with 0-1 elements is always sorted
    if (arr.length < 2) {
        return arr;
    }

    // break the list into halves
    var middle = arr.length / 2;
    var left = arr.slice(0, middle);
    var right = arr.slice(middle);

    // sort each half separately
    left = mergesort(left);
    right = mergesort(right);

    // now merge the halves
    // take the 1st element of each array, and compare
    // the lower one moves into the "result" list
    // repeat until there's nothing left
    var result = [];
    while (left.length && right.length)
    {
        if (left[0] <= right[0])
        {
            result.push(left.shift());
        } else {
            result.push(right.shift());
        }
    }
    // add in anything we didn't get
    // (in case we had uneven lists)
    result = result.concat(left).concat(right);

    return result;
}

Merge sort uses more space than a good quicksort, but it’s generally faster, and it’s especially good for sorting linked lists. It isn’t the fastest overall, however, nor is it the best for tight memory situations. But it’s a happy medium, all else being equal. Not that it ever is.

Never enough

These aren’t the only sorting algorithms around. They’re merely the easiest to explain. Heapsort is another good one that gets used a lot. Radix sort works best with a lot of data that can be easily indexed. And there are many more than that. There’s even a site that has more details and graphical visualizations of many of the algorithms, including those we’ve discussed in this post.

Of course, it’s very likely that you’ll never need to implement a sorting algorithm yourself. Almost every language already has dozens of them implemented either in the standard library or in open 3rd-party code. Most are even fine-tuned for you. But this whole series is about peeling back those layers to see how things work on the inside. Calling array.sort() is easy, but there are times when it might not be the best option. Learning the algorithms—how they work, their advantages and disadvantages—can give you the knowledge to find those spots when the “standard” sort is a bad idea. That’s a rare case, but finding it just once pays for itself.

Next up, we’ll look at the other side of sorting: searching. With Google and “Big Data”, searching has become more important than ever, so stay tuned.

Assembly: the old set

The x86 architecture, at the assembly level, is quite a bit more complex than our old friend, the 6502. Part of that comes from the fact that it does so much more: it has more registers, a bigger address space, and just more functionality altogether. So it has to be more complex. Maybe not as much as it is, but then it also had to maintain some backward compatibility with Intel’s older processors, and compatibility always complicates matters.

But we won’t judge it in this post. Instead, we’ll look at it as impartially as possible. If you’ve read the earlier posts in this series on assembly language, you know most of the conventions, so I won’t repeat them here. Like with the 6502,, I’ll look at the different instructions in groups based on their function, and I’m ignoring a lot of those that don’t have much use (like the BCD arithmetic group) or are specifically made for protected mode. What you get is the “core” set at the 286 level.

The x86 instruction set

Even stripped down to this essence, we’re looking at around 100 instructions, about double the 6502’s set. But most of these have obvious meanings, so I won’t have to dwell on them. Specifics will mostly come when we need them.

Also, since I’m assuming you’re using NASM (and an older version, at that), the instruction format in the examples I use will also be for that assembler. That means a couple of things:

  • The destination always comes first. So, to move the contents of the DX register to AX, you say mov ax, dx.
  • Square brackets indicate indirection. Thus, mov ax, bx moves the contents of BX into AX, while mov ax, [bx] moves the value in the memory location pointed to by BX.
  • NASM requires size suffixes on certain instructions. These are mostly the “string” instructions, such as MOVS, which you’d have to write as MOVSB or MOVSW, depending on the width of the data.
Flags

The x86, like most processors, comes with a number of flags that indicate internal state. And, as with the 6502, you can use these to control the flow of your own programs. Those that concern us the most are the carry, zero, sign, overflow, direction, and interrupt flags. The first three should be pretty obvious, even if you didn’t read the first parts of the series. The interrupt flag is likewise mostly self-explanatory. “Direction” is used for string instructions, which we’ll see later. And the overflow flag indicates that the last operation caused signed overflow based on two’s-complement arithmetic, as in this example:

overflow:
    mov al, 127
    add al, 2
; overflow flag is now set because 127 + 2 = 129,
; which overflows a signed byte (129 ~~ -127)
    add al, 2
; now overflow is clear, because -127 + 2 = -125

The carry, direction, and interrupt flags can be directly altered through code. The CLC, CLD, and CLI instructions clear them, while STC, STD, and STI set them. CMC complements the carry flag, flipping it to the opposite value. You can also use PUSHF to put whole register onto the stack, or POPF to load the flags from there; these instructions weren’t on the original 8086, however.

MOV and addressing

The MOV instruction is the workhorse of x86. It covers loads, stores, and copying between registers, and later extensions have made it Turing-complete in its own right. But in its original form, it wasn’t quite that bad. Plus, it allows all the different addressing modes, so it’s a good illustration of them.

The function of MOV is simple: copy the data in the source to the destination. Despite being short for “move”, it doesn’t do anything to the source data. The source, as for most x86 instructions can be a register, memory location, or an “immediate” value, and the destination can be memory or a register. The only general rule is that you can’t go directly from memory to memory in the same instruction. (There are, of course, exceptions.)

Moving registers (mov dx, ax) and immediate values (mov ah, 04ch) is easy enough, and it needs no further explanation. For memory, things get hairier. You’ve got a few options:

  • Direct address: a 16-bit value (or a label, in assembly code) indicating a memory location, such as mov ax, [1234] or mov dh, [data].
  • Register indirect: three registers, BX, SI, and DI, can be used as pointers within a segment: mov al, [bx] loads AL with the byte at location DS:BX.
  • Indexed: the same registers above, now with BP included, but with a displacement value added: mov al, [bx+4]. (BP is relative to the stack segment, though.)
  • Base indexed: either BX or BP plus either SI or DI, with an optional displacement: mov [bx+si+2], dx. (Again, BP uses the stack segment, all others the data segment.)

So MOV can do all of that, and that’s before it got expanded with 32-bit mode. Whew. If you don’t like clobbering the old value at the destination, you can use XCHG instead; it works the same way. (Interestingly, the x86 do-nothing instruction NOP is encoded as xchg ax, ax, which really does do nothing.)

Arithmetic and logic

After all the moving around, computing on values is the next most important task. We’ve got most of the usual suspects here: addition (ADD or the add-with-carry ADC); subtraction (SUB or SBB); logical AND, OR, NOT, and XOR (those are their mnemonics, too). There’s also a two’s-complement negation (NEG) and simple increment/decrement (INC, DEC) operations. These all do about what you’d expect, and they follow the same addressing rules as MOV above.

We can shift and rotate bits, as well. For shifting, SHL goes to the left, while SHR or SAR moves to the right; the difference is that SHR always shifts 0 into the most significant bit, while SAR repeats the bit that was already there. (Shifting left, as you probably know, is a quick and dirty way of multiplying by 2.)

Rotating moves the bits that get shifted “off the edge” back around to the other side of the byte or word, but it can optionally use the carry flag as an “extra” bit, so we have four possible permutations: ROL, ROR, RCL, RCR. The “rotate with carry” instructions effectively place the carry flag to the left of the most significant bit. Note that both shifting and rotating can take an immediate value for the number of times to shift, or they can use the value in CL.

A couple of instructions perform sign-extension. CBW takes the top bit in AL and duplicates it throughout AH. CWD works the same way, cloning the high bit of AX into every bit of DX. These are mainly used for signed arithmetic, and the registers they work on are fixed.

Unlike the 6502, the x86 has built-in instructions for multiplication and division. Unlike modern systems, the 16-bit versions are a bit limited. DIV divides either AX by a byte or DX:AX by a word. This implied register (or pair) also holds the result: quotient in AL or AX, remainder in AH or DX. MUL goes the other way, multiplying AL by a byte or AX by a word, and storing the result in AX or DX:AX. Those are more than a little restrictive, and they’re unsigned by design, so we also have IMUL and IDIV. These are for signed integers, and they let you use an immediate value instead: imul ax, -42.

Two other useful instructions can go here. CMP subtracts its source value from its destination and sets the flags accordingly, but throws away the result. TEST is similar, logical-ANDing its operands together for flag-changing purposes only. Both of these are mainly used for conditional flow control, as we’ll see below.

Flow control

We can move data around, we can operate on it, but we also need to be able to change the execution of a program based on the results of those operations. As you’ll recall, the 6502 did this with branching instructions. The x86 uses the same mechanism, but it calls them jumps instead. They come in two forms: conditional and unconditional. The unconditional one, JMP, simply causes the processor to pick up and move to a new location, and it can be anywhere in memory. Conditional jumps are only taken if certain conditions are met, and they take the form Jcc, where cc is a condition code. Those are:

  • C and NC, for “carry” and “no carry”, depending on the carry flag’s state.
  • Z and NZ, “zero” and “not zero”, based on the zero flag.
  • O and NO, for “overflow” and “no overflow”; as above, but for the overflow flag.
  • S and NS, “sign” and “no sign”, based on the sign flag; “sign” implies “negative”.
  • B and NB, “below” and “not below”, synonyms for C and NC.
  • A and NA, “above” and “not above”; “above” means neither the carry nor zero flag is set.
  • AE, BE, NAE, NBE; the same as the last two pairs, but add “or equal”.
  • L and NL, “less than” and “not less than”; “less than” requires either the sign or overflow flag set, but not both.
  • LE and NLE, “or equal” versions of the above.
  • G, GE, NG, NGE, “greater than”, etc., for the opposites of the previous four.
  • CXZ and NCXZ, “if CX is/is not zero”, usually used for loops.

These can be confusing, so here are a couple of examples:

mov ax, [value1]
mov dx, [value2]

a_loop:
add ax, dx

; jump if ax > 127,
; otherwise try again
jo end
jmp a_loop

end:
; do something else

mov ax, [bp+4]
cmp ax, 0
; if ax == 0...
jz iszero

; else if ax > 0...
jg ispos

; else if ax < 0...
jl isneg

; or if something went wrong
jmp error

CALL calls a subroutine, pushing a return address onto the stack beforehand. RET is the counterpart for returning from one. INT and IRET work the same way, but for interrupts rather than subroutines; INT doesn’t take an address, but an interrupt number, as we have seen.

A special LOOP instruction allows you to easily create, well, loops. It uses CX as an implicit counter, stopping when it reaches zero. You might use it like this:

; clear the screen
mov ax, 0b800h
mov es, ax
xor di, di  ; quicker clear to 0
mov cx, 80 * 25
mov dl, 20h ; ASCII code for space

nextchar:
mov [es:di],dl
add di,2    ; skip video attribute
loop nextchar

Two variant instructions LOOPZ and LOOPNZ, require that the zero flag be set or cleared, respectively, or they end the loop prematurely.

The stack

All x86 programs have use of a stack, and it’s not limited to 256 bytes like its 6502 cousin. Accessing the stack can’t be done directly in 16-bit land, as there’s no way to address relative to SP, but copying its value into BP and accessing from that is common. But even better are PUSH and POP, which take care of everything for you. They can be used on any register—except that you can’t pop into CS—and even memory; PUSH can also put an immediate value on the top of the stack, though not on the original 8086.

The stack grows “downward”. When a value is pushed onto it, that value is moved into the position pointed to by SP, and SP is decremented by 2. Popping does the opposite. Effectively, the instructions work like this:

do_push:
    mov [sp], value
    sub sp, 2

do_pop:
    mov value, [sp]
    add sp, 2

PUSHA and POPA are shortcuts for pushing all of the main 8 registers, helpful when you need to save state before starting a big subroutine.

Strings

The x86 can’t really work on strings, but it can work with arrays of bytes or 16-bit words using simple instructions. This is done through five instructions that operate on either bytes or words; NASM requires a suffixed “B” or “W” for these, but I’ll refer to them with a general “x”.

In all these cases, the “source” address is, by default, DS:SI, and the destination is ES:DI. Also, because these instructions were meant to be done in blocks, they can take a special REP prefix. This works a bit like LOOP, in that it stops when CX reaches 0. (REPE and REPNE are also available, and they work like LOOPZ and LOOPNZ.) After the instruction performs its operation, it increments both SI and DI by 1 for the byte version, 2 for the word version. This is where the direction flag comes into play, however: if it’s set, these instructions instead subtract 1 or 2 from those registers, effectively performing the operation in reverse.

LODSx and STOSx load and store, respectively. LODSx puts the value at [DS:SI] into AL or AX, while STOSx moves AL or AX into memory at [ES:DI]. Either way, they then change the index register (SI or DI) as described above. REP doesn’t really make sense with these, but they can work in hand-rolled loops.

MOVSx is a little more general, and it’s one of the few memory-to-memory operations available on the early x86. It copies a byte or word at [DS:SI] to [ES:DI], then changes both index registers based on the data width (1 for byte, 2 for word) and the direction flag (up for cleared, down for set). It’s all but made for block copying, as in this example:

; assumes SI and DI point to appropriate memory areas
; and CX holds a count of bytes to move
memcpy:
rep movsb
ret

CMPSx compares bytes or words, setting the flags accordingly. It could be used to implement a string comparison function like so:

; assumes SI and DI point where they should,
; and CX contains the max number of characters to test
; returns a value in AL:
; -1 if the "source" (1st) string is less,
; +1 if it's greater,
; 0 if they're equal
strncmp:
xor al, al
repe cmpsb
jg greater
dec al  ; sets to FFh, or -1
jmp exit

greater:
inc al  ; sets to 01h

ret

Finally, SCASx sets the flags based on a comparison between AL (for bytes) or AX (for words) and the value at [ES:DI]. The mnemonic stands for “scan string”, and that’s what it can do:

; assumes DI points to a string,
; CX holds the length of the string,
; and AL holds the character to search for
; returns in AX:
; position of found character, or -1 if not found

contains:
mov dx, cx
repne scasb
jncxz found

; character not found, since we ran out of string
mov ax, 0ffffh
jmp end

found:
; CX now holds the number of characters from string end,
; but we saved the original length in DX
; thus, the position is DX - CX - 1
inc cx
sub dx, cx
mov ax, dx

end:
ret
Input and output

Input and output send bytes or words between registers and the I/O space. This is a special set of 64K (65,536) memory locations, though only the first 1,024 were used on early PCs. Using them involves the IN and OUT instructions. These are fairly restrictive, in that they imply the AL or AX register for the data and DX for the I/O port: in ax, dx or out dx, al. However, for the “low” ports, those with addresses up to 0xff, you can instead use an immediate version: in al, 40h.

The 286 added in string I/O with the INSx and OUTSx instructions. These work similarly to LODSx and STOSx above, but the data is either coming from or going to an I/O port instead of main memory. (This was a bit faster than doing a manual loop, and some early peripherals actually couldn’t handle that!) The port number is always in DX, while [DX:SI or [ES:DI] is the data pointer, as above.

Enough for now

And we’re finally done. Next time, we can start programming this thing, but this post is already way too long, so I’ll see you later.