May 2016 – Prose Poetry Code

Summer reading list 2016

In the US, Memorial Day is the last Monday in May, and it is considered the unofficial start of summer. Time for the kids to get out of school, time to fire up the grills or hit the water. Although the solstice itself isn’t for three more weeks, late May feels like summer, and that’s good enough for most people.

But there’s one hint of school that stays with us through these next glorious weeks of peace: the summer reading list. Many will remember that awful thing, the educational system’s attempt to infringe on a child’s last refuge. I hated it, and you probably did, too. The books they chose were either awful (Ayn Rand’s Anthem) or tainted by association with school (Into Thin Air, by Jon Krakauer). Just like the reading assignments of the other nine months, the summer reading list seemed designed to suck all the enjoyment out of a book.

Now that we’ve outgrown it, things are different. We no longer need to read to please others. But that doesn’t mean we stop reading. No, we instead choose our own path.

So here’s a bit of a challenge for you. Read three books between now and Labor Day (September 5). That’s roughly one a month, so it shouldn’t be too hard. There won’t be any reports due, so you don’t have to worry about that, either. Remember, adults can read for fun rather than work.

It wouldn’t be a challenge if there weren’t rules, so here they are:

You have to read three (3) complete books between May 30 and September 5 of this year. (For following years, it’s the last Monday in May to the first Monday in September.) Giving up halfway doesn’t get you partial credit, so make sure you pick something you can stand to finish.
One (1) of these books should be nonfiction. It can be anything from history to self-help, but it has to be real. (Historical fiction doesn’t count for this, by the way.)
If you’re an aspiring fiction writer, then one (1) of the books must not be from your preferred genre. For example, a fantasy writer should read a non-fantasy book, perhaps sci-fi or a modern detective story. The idea is to branch out, expand your horizons.
Graphic novels count, but comic books don’t. The distinction is subtle, I’ll admit. I’d say a comic book is a short periodical, usually in magazine-style binding, while a graphic novel is a longer work presented in the same way as a text-only work. You can be your own judge, as long as you’re honest with yourself.

And that’s it!

Rhyme in conlangs

I’ve been doing posts on here for a year now, and there’s been one glaring omission. The name of the place is “Prose Poetry Code”, but we have yet to see any actual posts about poetry. So let’s fix that by looking at how we can add a poetic touch to a constructed language by using that most famous of poetic devices: rhyme.

You most likely already know what rhyme is, so we can skip the generalities. It’s all around us, in our songs, in our video game mysteries, in our nursery rhymes and limericks and everywhere else you look. Do we need a definition?

The sound of a rhyme

From a linguistic perspective, we probably do. Yes, it’s easy to point at two words (“sing” and “thing”, for instance) and say that they rhyme. But where’s the boundary? Rhyme is a similarity in the final sounds of words or syllables, but we have to define how close these sounds must be before they’re considered to rhyme. Do “sing” and “seen” rhyme? The English phonemes /n/ and /ŋ/ aren’t too far apart, as any dialectal speech illustrates.

So there’s your first “dimension” to rhyme. Clearly, there are limits, but they can be fluid. (Poetry is all about breaking the rules, isn’t it?) Most languages would allow inexact rhymes, as long as there’s enough of a connection between the sounds, but how much is necessary will depend on the language and its culture. You can go where you want on this, but a good starting point is the following set of guidelines:

A sound always rhymes with itself. (This one’s obvious, but there’s always allophonic variation to worry about.)
Two consonants rhyme if they differ only in voice. (You can add aspiration or palatalization here, if that’s appropriate for your conlang.)
Two vowels rhyme if they differ only in length. (Again, if this is a valid distinction.)
A diphthong can rhyme with its primary vocalic component. (In other words, /ei/ can rhyme with /e/ but not /i/.)
Nasal consonants rhyme with any other nasal. (This is a generalization of the explanation above.)

This isn’t perfect, and it’s especially not intended to be a list of commandments from on high. Feel free to tweak a bit to give your conlang its own flavor. And if you’re using an odder phonology, look for the contrasts that make it distinct.

Where to begin, where to end

Another thing to think about is how much of a syllable is considered for a rhyme. In Chinese, for instance, it’s basically everything but an initial consonant. English, with its complicated phonological and syllabic systems, allows more freedom. Clusters can count as simplified consonants or stand on their own. Reduced or unstressed vowels can be omitted, as can consonants: “’twas”, “o’er”.

Once again, this is creativity at work, so I can’t tell you what to do. It’s your decision. Really, the only “hard” rule here is that the initial part of a syllable rarely, if ever, has to match for a rhyme. Everything else is up for grabs.

With longer words, it’s the same way, but this is a case where different languages can do things differently. Stress patterns can play a role, and so can the grammar itself. To take one example, Esperanto’s system for marking word class by means of a change in final vowels is interesting from a mechanical point of view, but it’s awful for rhyming poetry. One could argue that all nouns rhyme, which is…suboptimal. (A better option there might be to require rhyming of the penultimate syllable, since Esperanto always stresses it, ignoring the “marker” vowel altogether.)

Going in a different direction, it’s easy to see that a language with CV syllables—think something in the Polynesian family here—will tend to have very long words. With a small set of phonemes, there aren’t too many combinations, and that could lead to too much rhyming. Maybe a language like that requires multiple matching syllables, but it might just discard rhyme as a poetic device instead.

And then there’s tone. I don’t speak a tonal language, so I’ve got little to go on here, but I can see a couple of ways this plays out. Either tone is ignored for rhyming, in which case you have nothing to worry about, or it’s important. If that’s true, then you have to work out which tones are allowed to rhyme. For “level” tones (high, low, medium), you could say that they have to be within one “step”. “Contour” tones may have to end at roughly the same pitch. Why the end, you may ask? Because rhyming is inherently tied to the ends of syllables.

Different strokes

As rhyme is tied to the spoken form of a language, it will be affected by the different ways that language is spoken—in other words, dialects.

One good example of this in English is “marry”. Does it rhyme with “tarry”? Most people would say so. What about “gory”? Probably not. “Berry”? Ah, there you might have a problem. Some dialects merge the vowels in “marry” and “merry”, while most other (American) ones don’t.

Rhyming verse is made to be spoken, recited, chanted, or sung, not merely read, so this is not a theoretical problem. It’s important for anyone writing in a natural language with any significant dialectal variation. Nor is it limited to slight changes in vowel quality. What about English /r/? It disappears at the end of words in England, but not America…at least in speech. Except for country music, most singers tend to drop the R because it sounds better, which has the side effect of creating more opportunities to rhyme.

Of course, for a conlang, you probably don’t have to think about dialects unless you’re specifically creating them. Still, it might be useful to think about for more “hardcore” worldbuilding.

Sing a song

Rhyming isn’t everything in poetry. It’s not even the most important part, and many types of verse get by just fine without it. But I started with it for two reasons: it’s the easiest to explain, and it’s the simplest to build into your conlangs. In fact, you’ve probably already got it, if you look close enough. (If you’re using random generation to create your words, however, you may not have enough similar words to get good rhymes. That’s where author fiat has to come in. Get in there and make them.)

If you don’t care for rhymes, that’s not a problem. Others do, and if you’re making a language for other people to speak, such as an auxlang, you have to be prepared for it. Poetry is all about wordplay, and creativity is an unstoppable force. Whether song or spoken word, people will find ways to make things work.

Democratization of development

Ten years ago, you only had a very few options for making a game. I know. I was there. For indies, you were basically limited to a few open-source code libraries (like Allegro) or some fairly expensive professional stuff. There were a few niche options—RPG Maker has been around forever, and it’s never cost too much—but most dev tools fell into the “free but bad” or “good if you’ve got the money” categories. And, to top it off, you were essentially limited to the PC. If you wanted to go out of your way, Linux and Mac were available, but most didn’t bother, and consoles were right out.

Fast forward five years, to 2011. That’s really around the time Unity took off, and that’s why we got so many big indie games around that time. Why? Because Unity had a free version that was more than just a demo. Sure, the “pro” version cost an arm and a leg (from a hobbyist perspective), but you only had to get it once you made enough profit that you could afford it. And so the 2010s have seen a huge increase in the number—and quality—of indie titles.

Five more years bring us to the present, and it’s clear that we’re in the midst of a revolution. Now, Unity is the outlier because it costs too much. $1500 (or $75/month) is now on the high end for game engines. Unreal uses a royalty model. CryEngine V is “pay what you want”. Godot and Atomic lead a host of free engines that are steadily gaining ground on the big boys. GameMaker, RPG Maker, and the like are still out there, and even the code-only guys are only getting better.

Engines are a solved problem. Sure, there’s always room for one more, and newcomers can bring valuable insights and new methods of doing things. The basics, though, are out there for everyone. Even if you’re the most hardcore free-software zealot, you’ve got choices for game development that simply can’t be beat.

If you follow development in other media, then you know what’s happening. Game development is becoming democratized. It’s the same process that is bringing movie production out of the studio realm. It’s the same thing that gave every garage band or MIDI tinkerer a worldwide audience through sites like SoundCloud and Bandcamp. It’s why I was able to put Before I Wake into a store offering a million other e-books.

Games are no different in this respect. The production costs, the costs of the “back-end” necessities like software and distribution, are tending towards zero. Economists can tell you all about the underlying reasons, but we, as creators, need only sit back and enjoy the opportunity.

Of course, there’s still a ways to go. There’s more to a game than just the programming. Books are more than collections of words, and it takes more than cameras to make a movie. But democratization has cut down one of the barriers to entry.

Looking specifically at games, what else needs to be done? Well, we’ve got the hard part (the engine) out of the way. Simpler ways of programming are always helpful; Unreal’s Blueprints and all the other “code-less” systems have some nifty ideas. Story work doesn’t look like it can be made any easier than it already is, i.e., not at all. Similarly, game balance is probably impossible to solve in a general sense. Things like that will always have to be left to a developer.

But there is one place where there’s lots of room for improvement: assets. I’m talking about the graphics, sounds, textures, and all those other things that go into creating the audiovisual experience of a game. Those are still expensive or time-consuming, requiring their own special software and talent.

For asset creation, democratization is hard at work. From the venerable standbys of GIMP and Inkscape and Blender to the recently-freed OpenToonz, finding the tools to make game assets isn’t hard at all. Learning how to use them, on the other hand, can take weeks or months. That’s one of the reasons why it’s nearly impossible to make a one-man game these days: audiences expect the kind of polish that comes with having a dedicated artist, a dedicated musician, and so on.

So there’s another option, and that’s asset libraries. We’ve got a few of those already, like OpenGameArt.org, but we can always use more. Unity has grown so popular for indie devs not because it’s a good engine, but because it’s relatively inexpensive and because it has a huge amount of assets that you can buy from right there in the editor. When you can get everything you need for a first-person shooter for less than a hundred dollars (or that Humble Bundle CryEngine collection from a while back), that goes a long way towards cutting your development costs even further.

Sure, asset libraries won’t replace a good team of artists working on custom designs specifically for your game, but they’re perfect for hobbyists and indies in the “Early Access” stage. If you hit it big, then you can always replace the stock artwork with something better later on. Just label the asset-library version “Alpha”, and you’re all set.

Looking ahead to the beginning of the next decade, I can’t say how things will play out. The game engines that are in the wild right now won’t go away, especially those that have been released for free. And there’s nowhere to go but up for them. On the asset side of things, it’s easy to see the same trend spreading. A few big “pack” releases would do wonders for low-cost game development, while real-world photography and sound recording allow amateurs to get very close to professional quality without the “Pro” markup.

As a game developer, there’s probably no better time to be alive. The only thing that comes close is the early generation of PC games, when anyone could throw together something that rivaled the best teams around. Those days are long past, but they might be coming back. We may be seeing the beginning of the end for the “elite” mentality, the notion that only a chosen few are allowed to produce, and everyone else must be a consumer. Soon, the difference between “indie” and “AAA” won’t be because the tools used. And that’s democracy in action.

Building aliens – Introduction

Is there anything more “sci-fi” than an alien? Sure, some of the best science fiction stories are wholly concerned with humanity, but the most popular tend to be the ones with aliens. Star Trek, Star Wars, and any other franchise beginning with the word “star” are the best illustrations of that point, but it’s easy to see anywhere you look. Aliens are all over the place, in movies, TV, video games, books, and every other creative media you can think of.

But there are aliens and there are aliens. In earlier days of TV and movies, for example, most aliens were typically just actors in makeup, which severely limited their appearance to the humanoid. Modern video games have returned to this state, mainly because of the cost of 3D modeling. (In other words, if everything is close enough to human, then they can all use the same base model.) Books were never under this sort of pressure, so authors’ imaginations could run wild. Think of Larry Niven’s two-headed, three-legged Puppeteers, for instance.

Looks, however, aren’t everything. In visual media, they’re a lot, but for the written word, it’s more about how an alien thinks, acts, sees the world. It’s how aliens are characterized. In harder sci-fi, it’s even about how they exist in the first place.

This series of posts, if I may be so ambitious, will cover all of that. I’ll probably only write about one of these a month, each covering a small part of the topic. As has become my usual pattern, we’ll start with the broader strokes, then fill in the details later on. Along the way, I’ll try to keep a balance between the hard worldbuilding bits and the space-opera fun. Because aliens are both.

First, though, let’s cover the basics.

What we know

This is an easy one: nothing. At this point in time—unless something has happened in the three weeks since I wrote this post—we don’t know if aliens exist. (Ignore fringe theories for the moment.) We really don’t even know if they can exist. All we have are theories, hypotheses, and speculation. In other words, a perfect breeding ground for the imagination.

Of course, we’ve worked out the basics of how astrobiology (life outside of Earth) would work. We can confidently say that a few old theories are wrong, like the fabled canals on Mars or jungles of Venus. But what we don’t know is a vast field. Are we alone? The premise of this whole series is that we are not, but we can’t yet be sure. Are we the first intelligent life in the universe…or the last? Did life arise on Earth, or did it spread here from elsewhere?

Today, in 2016, we simply cannot answer any of those questions in a rigorously scientific manner. Thus, it falls to us creative writers to fill in the blanks. How you do that will depend on the expectations of your genre and medium, but also how deep you wish to delve.

Forks in the road

We have a few different ways to play this. Some will work better than others, obviously, and some will resonate better with different segments of your audience. So this is our first big “branching point”, the first decision you’ll have to make.

The hard way

Here, “hard” doesn’t mean “difficult”. Well, it kinda does, but not in the way you think. No, this is a reference to hard science fiction, where the object is the most realistic and plausible scenario, based on as few “miracles” of technology, biology, and the like as possible. Yes, that does make the creation of aliens more difficult, because you have to think more about them, but the results can be amazing.

Hard SF aliens are best suited to written works, if for no other reason than they’re the least likely to be humanoid in body or mind. (We’ll see why in a later post.) Those visual media that have tried to build aliens in this harder style tend to make them incomprehensible to mere humans, or they focus on the ramifications of their existence more than their appearance. But hard sci-fi is often seen as too boring and too “smart” for movie and TV audiences, so there aren’t very many good examples.

The easy way

Now, this time I’m talking about difficulty. In total contrast to the harder style above, many works opt to make their aliens to fit the needs of the story, with varying degrees of care for their actual plausibility. In a few cases, they can be made to illustrate a concept or explore a particular section of human psychology. (Older Star Trek series often did both of those.) This might be termed the space-opera method of alien creation.

Obviously, this is more palatable for visual and interactive media, because space-opera aliens tend to fall into the category of Humans But. In other words, this type of alien race can be described as, “They’re humans, but…” Maybe they’re all women, or they have catlike features, or they’re overly aggressive. They could have multiple differences, but they’re still largely human at heart. What makes them special is how they are different from humanity.

Examples of this style aren’t hard to find at all. They probably make up the majority of aliens in fiction. Why? Because they’re easy. Easy to create, easy to visualize, easy to characterize.

The PPC way

For our series, we’ll take a hybrid approach, if only because we have so much ground to cover. I’ll spare you the highly technical intricacies of biology and biochemistry, but we’ll certainly be going deeper into those fields than the shoulder-pads-and-forehead-ridges crowd. The idea is to keep suspension of disbelief while still allowing for a good story. (Honestly, the hard sci-fi approach only really makes for one good story: discovery.)

Likewise, I’ll assume you’re the best one to know what sort of character you need, so we won’t really cover that too much. We’ll probably touch on the psychological aspects, but those are most definitely not my specialty. And we’ll try to make something more interesting than humans in makeup.

Where we go from here

As I said, this series will probably be something close to monthly, but I already have the first few posts planned out. Again, these mostly cover things from a higher level. The finer details will be in the nebulous future.

Here’s what I have so far:

This introduction
Biochemistry, DNA, and alternative forms of life
Evolution and genetics
Interaction with the environment
Physiology
Intelligence, sentience, and sapience

These won’t be the only posts, and they likely won’t even be consecutive. If I come up with something that I think needs to be said, I’ll say it, no matter what the schedule reads. But these six are a good start, and they outline the main areas I feel should be covered.

Remember, we’re making “softer” aliens out of “harder” stuff. That’s why you don’t see a post dedicated to characterization, or one specifically focusing on appearance or mating rituals. Those can come later. (If you’re worried at the lack of language as a topic, also remember that a third of the site is dedicated to exactly that. I will be writing “alien languages” posts, but those will show up on Fridays.)

So that’s it for the intro. Come back soon for the real start to the series. I’ll see you then.

On writing systems

Most conlangers work with text: text files, wordlists, and the like. It’s very much a visual process, quite the opposite of “real” languages. Yes, we think about the sound of a language while we’re making it, but the bulk of the creation is concerned with the written word. It’s just easier to work with, especially on a computer.

Writing, of course, has a long history in the real world, and many cultures have invented their own ways of recording the spoken word. For a conlang, however, the usual form of writing is a transcription into our own alphabet. Few go to the trouble of creating their own system of writing, their own script. Tolkien did, to great effect, but he was certainly an outlier. That makes sense. After all, creating a language is hard enough. Giving it its own script is much more effort for comparatively little payoff.

But some are willing to try. For those who are, let’s see what it takes to create writing. Specifically, we’ll look at the different kinds of scripts out there in this post.

Alphabet

The alphabet is probably the simplest form of script, from the point of view of making one. You don’t really need an example of an alphabet—unless this post was translated into Chinese while I wasn’t looking, you’re reading one! Still, our familiar letters aren’t the only possibility. There’s the Greek alphabet, for example, as well as Cyrillic and a few others.

Alphabets generally have a small inventory of symbols, each used (more or less) for a single phoneme. Obviously, English is far from perfect on that front, but that’s okay. It doesn’t have to be perfect. The principle stands, even if it’s stretched a bit. None of our 26 letters stands for a full syllable, right?

That’s why alphabets are so easy to make, and why they’re (probably) the most common form of writing for conlangs. You only need a few symbols—and there’s nothing saying you can’t borrow a few—and you’re all but done. Writing in the script you make can be as simple as exchanging letters for glyphs.

Abjad and abugida

These two foreign terms name two related variations on the alphabet. The abjad is a script where only consonants are directly written; vowels are represented by diacritics, if at all. That’s the basic system used by Arabic and many of its cousins, as in “ةباتك” (kitāba). Note that Arabic isn’t a “pure” abjad, though. The third letter (reading right-to-left) stands for the long a, while the final a has its own letter. As with English, that’s fine. Nobody’s perfect.

The abugida is similar to the abjad, but it does mark vowels. Unlike an alphabet, this is usually with some form of diacritic or as an “inherent” vowel, but it’s always there. Many of the various languages of India use this type of script, such as the Devanagari used by Hindi: लेखन (lekhan). This particular word has three “letters”, roughly standing for l, kh, and n. The vowel a (actually a schwa) is implicit, and it’s omitted at the end of words in Hindi, so only the first letter needs a diacritic to change its vowel. Once more, the scheme isn’t perfect, but it works for a few hundred million people, so there you go.

Syllabary

Alphabets, abjads, and abugidas all have one thing in common: they work on the level of phonemes. That makes intuitive sense, particularly in languages with complex phonotactics. When there are hundreds of thousands of possible syllables, but only a few dozen individual phonemes, the choice is clear. (That hasn’t stopped some crazy people from trying to make a syllabary for English, but I digress.)

The syllabary, by contrast, gives each syllable its own symbol. Realistically, to use a “pure” syllabary, a language almost has to have a very simple syllabic structure. It works best with the CV or CVC languages common to Asia and Oceania, and that’s probably why the most well-known syllabary comes from that region, the Japanese kana: てがき (tegaki).

A syllabary will always have more symbols than an alphabet (about 50 for Hiragana, plus diacritics for voicing), but not an overwhelming number of them. Syllabaries made for more complicated structures usually have to make a few sacrifices; look at the contortions required in Japanese to convert foreign words into Katakana. But with the right language, they can be a compact way of representing speech.

Featural

A featural alphabet is another possibility, sitting somewhere between an alphabet and a syllabary. In this type of script, the letter forms are phonemic, but they are constructed to illustrate their phonetic qualities. Korean is the typical example of a featural script: 필적 (piljeog). As you can see (hopefully; I don’t seem to have the right font installed on this computer), each character does encode a syllable, but it’s obviously made up of parts that represent the portions of that syllable.

Featural alphabets might be overrepresented in conlanging, because they appeal to our natural rationality. Like agglutinative languages, they’re almost mechanical in their elegance. They only require the creation of an alphabet’s worth of symbols, but they give the “look” of a more complex script. If you like them, go for it, but they’re probably rare in the world for a reason.

Logographic

Finally, we come to the logographic script. In this system, each glyph stands for a morpheme or word, with the usual caveat that no real-world system is perfectly pure. Chinese is far and away the most popular logographic script these days: 写作 (xiězuò). Chinese characters have also been borrowed into Korean, Japanese, and other neighboring languages, but they aren’t the only logograms around. Cuneiform, hieroglyphs (Egyptian, Mayan, or whatever), and a few other ancient scripts are logographic in nature.

It should be blatantly obvious what the pros and cons are. The biggest downside to logograms is the sheer number of them you need. About half of Unicode’s Basic Multilingual Plane is composed of Chinese characters, and that’s still not enough. Everything about them is harder, whether writing, inputting, or even learning them. In exchange, you get the most compressed, most unambiguous script possible. But the task might be too daunting for a conlanger.

The mix

In truth, no language falls neatly into one of the above categories. English is written in an alphabet, yes, but we also have quite a few logograms, such as those symbols on the top row of your keyboard. And with the advent of emoji, the logographic repertoire has grown exponentially. Similarly, Arabic has alphabetic properties, Japanese uses Chinese logograms and Latin letters in addition to its syllabic kana, and the phonetic diacritics used by languages such as German are essentially featural.

For your conlang, the style you choose is just that: a style. It’s an artistic choice. Alphabets (including abjads and abugidas) are far easier. Syllabaries can work if you have the right language, or are willing to play around. Logograms require an enormous effort, but they’re so rare that they might be interesting in their own right. And featural systems have the same “logical” appeal as conlangs like Lojban. Which you choose is up to you, but a natural script won’t be limited to one of them. It will borrow parts from the others.

Creating a script for a conlang can be a rewarding task. It’s not the type of thing to undertake lightly, however. It’s a lot of work, and it takes a bit of artistic vision. But you wouldn’t be making a language if you weren’t something of an artist, right?

Software internals: Sorting

We’ve looked at quite a few data structures in this series, from simple arrays to objects. Now, let’s turn our focus to algorithms. One class of algorithms is very important to high-level programming: the sorting algorithms. For decades, computer scientists have been developing new ways to sort data, trying to balance the needs of size and speed, and there’s no silver bullet. Some sorting algorithms are good for general use, but they fall apart on certain degenerate cases. Others are bad all around, but they’re interesting from a teaching perspective.

In this post, we’ll look at three specific algorithms that illustrate the evolution of sorting and some of its trade-offs. Mostly, we’ll treat each one as working on a simple array of integers that we want sorted in increasing order, but it’s easy enough to generalize the sort function to whatever you need. Also, the code will all be in JavaScript, but I won’t be using any trickery or obfuscation, so it shouldn’t be too hard to convert to your favorite language. (Of course, most languages—JavaScript included—already have sorting functionality built into their standard libraries, so it’s better to use those than to write your own.)

Simple and stupid

Bubble sort is the first sorting algorithm many coders see. It’s stupidly simple: move through a list, comparing each element to the one before it. If they’re out of order, swap them. When you get to the end, start over, and repeat until the list is sorted. In code, it might look like this:

function bubblesort(arr) {
    var len = arr.length;

    do {
        var swapped = false;

        // run through the list
        // compare each element to the last
        // swap those that are out of order
        for (var i = 1; i < len; i++) {
            if (arr[i] < arr[i-1]) {
                // swap elements
                var temp = arr[i];
                arr[i] = arr[i-1];
                arr[i-1] = temp;
                swapped = true;
            }
        }

        // optimization (see below)
        len--;

        // repeat until everything's sorted
    } while (swapped);
}

Even this is already optimized a bit from the basic bubble sort method. Because of the way higher values “bubble up” to the end of the list (hence the name “bubble sort”), we know that, after n iterations, the last n values will be sorted. Thus, the line len-- tells the loop to ignore those values.

That optimization doesn’t help the overall performance of bubble sort. It remains O(n²), not very good. But bubble sort’s value is in its simplicity. You wouldn’t use it in the real world, but it’s a good way to show how sorting works. It’s easy to follow along with it. Take a handful of small numbers and try sorting them by hand, maybe even on paper. It’s not that hard.

Quick and dirty

For a few decades, the go-to choice for sorting has been quicksort. It’s the reason that C’s standard sorting function is called qsort, and it’s still used in many languages, both for performance and simplicity. (Haskell lovers will be quick to point out that that language can implement quicksort in two lines of code.)

Quicksort works by a recursive approach that boils down to “divide and conquer”. Imagine a list as a line of values. Now pick one of those values as the pivot. Take everything less than the pivot and sort it, then take everything greater, and sort that. (This is the recursive step.) Your sorted list is everything in the first sublist, then all the values equal to the pivot, then the second sublist. Or, in code:

function quicksort(arr) {
    if (arr.length <= 1) {
        // recursion base case
        return arr;
    }

    // find the pivot value
    var pivotIndex = parseInt(arr.length / 2);
    var pivotValue = arr[pivotIndex];

    // these will hold our sublists
    var left = [], right = [], pivots = [];

    // partition the list into three:
    // 1. less than the pivot
    // 2. greater than the pivot
    // 3. equal to the pivot
    for (var i = 0; i < arr.length; i++) {
        if (arr[i] < pivotValue) {
            left.push(arr[i]);
        } else if (arr[i] > pivotValue) {
            right.push(arr[i]);
        } else {
            pivots.push(arr[i]);
        }
    }

    // the sorted list is (left + pivots + right)
    return quicksort(left).concat(pivots).concat(quicksort(right));
}

That’s essentially an expanded version of the Haskell two-liner. It’s not the best from a memory or speed standpoint, but it works to show you the way the algorithm works. Another way works in-place, directly operating on the array by swapping elements around so that all the values less than the pivot are placed before it, then putting those greater after it, and then recursing on the resulting partially-sorted list. That one is a lot faster, but it’s a bit harder to grasp. It also needs either a helper function or a bit of logic to allow both sorting of an entire list and of a portion of one.

The gains from that added complexity are huge, though. With it, quicksort becomes one of the faster sorting methods around, and its space efficiency (with the in-place version) can’t be beat. That’s why quicksort remains so popular, even despite some well-known shortcomings. It’s good enough for most purposes.

Good all around

The last of the “simple” sorts we’ll look at is merge sort. This one is very much like quicksort in that it uses a strategy of repeatedly subdividing a list, but it works without a pivot element. At each step, it breaks the list in half and sorts each half separately. (A list with only one element is sorted by definition, so that’s the stopping point.) Then, it merges those halves an element at a time. Here’s the code:

function mergesort(arr) {
    // An element with 0-1 elements is always sorted
    if (arr.length < 2) {
        return arr;
    }

    // break the list into halves
    var middle = arr.length / 2;
    var left = arr.slice(0, middle);
    var right = arr.slice(middle);

    // sort each half separately
    left = mergesort(left);
    right = mergesort(right);

    // now merge the halves
    // take the 1st element of each array, and compare
    // the lower one moves into the "result" list
    // repeat until there's nothing left
    var result = [];
    while (left.length && right.length)
    {
        if (left[0] <= right[0])
        {
            result.push(left.shift());
        } else {
            result.push(right.shift());
        }
    }
    // add in anything we didn't get
    // (in case we had uneven lists)
    result = result.concat(left).concat(right);

    return result;
}

Merge sort uses more space than a good quicksort, but it’s generally faster, and it’s especially good for sorting linked lists. It isn’t the fastest overall, however, nor is it the best for tight memory situations. But it’s a happy medium, all else being equal. Not that it ever is.

Never enough

These aren’t the only sorting algorithms around. They’re merely the easiest to explain. Heapsort is another good one that gets used a lot. Radix sort works best with a lot of data that can be easily indexed. And there are many more than that. There’s even a site that has more details and graphical visualizations of many of the algorithms, including those we’ve discussed in this post.

Of course, it’s very likely that you’ll never need to implement a sorting algorithm yourself. Almost every language already has dozens of them implemented either in the standard library or in open 3rd-party code. Most are even fine-tuned for you. But this whole series is about peeling back those layers to see how things work on the inside. Calling array.sort() is easy, but there are times when it might not be the best option. Learning the algorithms—how they work, their advantages and disadvantages—can give you the knowledge to find those spots when the “standard” sort is a bad idea. That’s a rare case, but finding it just once pays for itself.

Next up, we’ll look at the other side of sorting: searching. With Google and “Big Data”, searching has become more important than ever, so stay tuned.

Magic and tech: medicine

Human history is very much a history of medicine and medical technology. You can even make the argument that the very reason we’re able to have the advanced society we have today is because of medical breakthroughs. Increased life expectancy, decreased infant mortality, hospitals, vaccines, antibiotics—I can go on for hours. It all adds up to a longer, healthier life, and that means more time to participate in society. The usual retirement age is 65, and it’s entirely likely it’ll hit 70 before I do, and the quality of life at such an advanced age is also steadily rising. That means more living grandparents (and great-grandparents and great-uncles and so on) and more people with the wisdom that hopefully comes with age.

Not too long ago, things were different. The world was full of dangers, many of them fatal. Disease could strike at any time, without warning, and there was little to be done but wait or pray. Childbirth was far more often deadly to the mother or the child…or both. Even the simplest scratches could become infected. Surgery was as great a risk as the problems it was trying to solve. (Thanks to MRSA and the like, those last two are becoming true again.) If you dodged all those bullets, you still weren’t out of the woods, because you had to worry about all those age-related troubles: blindness, deafness, weakness.

Life in, say, the Middle Ages was very likely a life of misery and pain, but that doesn’t mean there wasn’t medicine, as we’ll see. It was far from what we’re used to today, but it did exist. And there is probably no part of civilization more strongly connected to magic than medicine. What would happen if the two really met?

Through the ages

Medicine, in the sense of “things that can heal you”, dates back about as far as humanity itself. And for all of that history except the last few centuries, it was almost exclusively herbal. Every early culture has its own collection of natural pharmaceuticals (some of them even work!) accompanied by a set of traditional cures. In recent decades, we’ve seen a bit of a revival of the old herbalism, and every drugstore is stocked with ginkgo and saw palmetto and dozens of other “supplements”. Whether they’re effective or not, they have a very long history.

Non-living cures also existed, and a few were well-known to earlier ages. Chemical medicine, however, mostly had to wait for, well, chemistry. The alchemists of old had lists of compounds that would help this or that illness, but many of those were highly toxic. We laugh and joke about the side effects of today’s drugs, but at least those are rare; mercury and lead are going to be bad for you no matter what.

Surgery is also about as old as the hills. The Egyptians were doing it on eyes, for example, although I think I’d rather keep the cataracts. (At least then I’d be like the Nile, right?) Amputation was one of the few remedies for infection…which could also come from surgery. A classic Catch-22, isn’t it? Oh, and don’t forget the general lack of anesthesia.

What the earlier ages lacked most compared to today was not the laundry list of pills or a dictionary of disorders. No, the thing that most separates us from earlier times when it comes to medicine is knowledge. We know how diseases spread, how germs affect the body, how eyes and ears go bad. We’re unsure on a few minor details, but we’ve got the basics covered, and that’s why we can treat the sick and injured so much better than before. Where it was once thought that an illness was the will of God, for instance, we can point to the virus that is its true cause.

And then comes magic

So let’s take that to the magical world. To start, we’ll assume the mundane niceties of medieval times. That’s easier than you might think, because our world’s magic won’t be enough to let its users actually see viruses and other infectious agents. Nor will it allow them to see into the human body at the same level of detail as a modern X-ray, CT scan, or ultrasound. And we’ll short-circuit the obvious idea by saying that there are no cure-all healing spells. Real people don’t have hit points.

But improvements aren’t hard to find. Most of medicine is observation, and we’ve already seen that the magical world has spells that can aid in knowledge, recall, and sensory perception. An increase in hearing, if done right, is just as good as a stethoscope, and we can imagine similar possibilities for the other senses.

Decreasing the ability of the senses is another interesting angle. In normal practice, it’s bad form to blind someone, but a numbing spell would be an effective anesthetic. A sleeping spell is easy to work and has a lot of potential in a hospital setting. And something to kill the sense of smell might be a requirement for a doctor or surgeon as much as the patient!

The practice of surgery itself doesn’t seem like it can benefit much from the limited magic we’re giving this world. It’s more the peripheral aspects that get improved, but that’s enough. Think sharper scalpels, better stitches, more sterilization.

Herbal medicine gets better in one very specific way: growth. It’s not that our mages can cast a spell to make a barren field bloom with plant life, but those plants that are already there can grow bigger and faster. That includes the pharmaceuticals herbs as well as grain crops. Magic and alchemy are closely related, so it’s not a stretch to get a few early chemical remedies; magic helps here by allowing easier distillation and the like.

Some of the major maladies can be cured by magical means in this setting. Mostly, this goes back to the sensory spells earlier, but now as enchantment. We’ve established that spells can be “stored”, and this gets us a lot of medical technology. An amulet or bracelet to deaden pain (pain is merely a subset of touch, after all) might be just as good as opium—or its modern equivalents. Sharpened eyesight could be achieved by magic as easily as eyeglasses or Lasik surgery.

In conclusion

The field of medicine isn’t one that can be solved by magic alone. Not as we’ve defined it, anyway. But our magical kingdom will have counterparts to a few of the later inventions that have helped us live longer, better lives. This world will still be dangerous, but prospects are a bit brighter than in the real thing.

What magic does give our fantasy world is a kind of analytical framework, and that’s a necessary step in developing modern medicine. Magic in this world follows rules, and the people living there know that. It stands to reason that they’ll wonder if other things follow rules, as well. Investigating such esoteric mysteries will eventually bear fruit, as it did here. Remember that chemistry was born from alchemy, and thus Merck and Pfizer owe their existence to Geber and Paracelsus.

Chemistry isn’t the only—or even the most important—part of medicine. Biology doesn’t directly benefit from magic, but it shares the same analytical underpinnings. Physical wellness is harder to pin down, but people in earlier times tended to be far more active than today. For the most part, they ate healthier, too. But magic won’t help much there. Indeed, it might make things worse, as it means less need for physical exertion. Also, the “smaller” world it creates is more likely to spread disease.

In the end, it’s possible that magic’s medical drawbacks outweigh its benefits. But that’s okay. Once the rest of the world catches up, it’ll be on its way to fixing those problems, just like we have.

Let’s make a language – Part 15b: Color terms (Conlangs)

So we’ve seen how real-world languages (or cultures, to be more precise) treat color. Now let’s take a look at what Isian and Ardari have to say about it.

Isian

Isian has a fairly short list of basic color terms. It’s got the primary six common to most “developed” languages, as follows:

Color	Word
white	bid
black	ocom
red	ray
green	tich
yellow	majil
blue	sush

We’ve actually seen these before, in the big vocabulary list a few parts back, but now you know why those colors were picked.

There are also three other “secondary” terms. Mesan is the Isian word for “gray”, and it runs the gamut from black to white. Sun covers browns and oranges, with an ochre or tawny being the close to the “default”. In the same way, loca is the general term for purple, pink, magenta, fuchsia, and similar colors. Finally, mays and gar are “relative” terms for light and dark, respectively; gar sush is “dark blue”, which could be, say, a navy or royal blue.

All these words are adjectives, so we can say e sush lash “the blue dress” or ta ocom bis “a black eye”. Making them into nouns takes the same effort as any other adjective, using the suffix -os. Thus, rayos refers to the color of red; we could instead say rayechil “red-color”.

Derivation is also at the heart of most other Isian color names. Compounds of two adjectives aren’t too common in the language, but they are used for colors. In all cases, the “primary” color is taken as the head of the compound. Some examples include:

raysun, a reddish-brown or red-orange; some hair colors, like auburn, might also fit under this term.
majiltich, a yellow-green close to chartreuse.
tichmajil, similar to majiltich, but more yellow, like lime.
locasush, a mix of blue and purple, a bit like indigo.

Most other colors are named after those things that have them. “Blood red”, for instance, is mirokel (using the adjectival form of miroc “blood”). Halakel is “sky blue”, and so on. As with English, many of the names come from flowers, fruits, woods, and other botanical origins. We’ll look at those in a later post, though.

Ardari

To look at Ardari’s color terminology, we’ll need to work in stages, as this uncovers a bit of the language’s history. First, it seems that Ardari went a long time with four basic colors:

Color	Word
white	ayzh
black	zar
red	jor
green	rhiz

Yellow (mingall) and blue (uswall) got added later, likely beginning as derivations from some now-lost roots. (The sun and the sky are good bets, based on what we know about real-world cultures.)

Next came a few more unanalyzable roots:

Color	Word
brown	dir
orange	nòrs
purple	plom
pink	pyèt
gray	rhuk

That gives the full array of eleven that many languages get before moving on to finer distinctions. Add in wich “light” and nyn “dark”, and you’re on your way to about 30 total colors.

Ardari doesn’t use compounds very often, so most of the other color terms are derived in some fashion. Two good examples are the similar-sounding wènyät “gold” and welyät “sky blue”. These started out as nothing more than adjectival forms of owènyi “gold” and weli “sky”, turned into adjectives by the -rät suffix we met not too long ago, and worn down a bit over time.

Another color word, josall, is an example of a more abstract or general term. It covers very light colors like beige and the pastels. It’s lighter even than wich nòrs or wich jor would be, but with more color than pure white. The word itself probably derives from josta “shell”, so you could describe it as a seashell color.

Grammatically, Ardari color terms are adjectives, so they inflect for gender just like any other. They can be used directly as nouns. And you can add the suffix -it to make something like English “-ish”: jorit “reddish”. That’s really all there is to it.

Moving on

Both our conlangs could easily have a hundred more words for various colors, but these are enough for now. You get the idea, after all. So it’s time to head to the next topic. I still haven’t thought of what that will be. At some point (probably by the time I write Part 16), I’ll have to make some tough decisions about the world around Isian and Ardari, because we’re fast approaching the point where that will matter. So the series might go on a hiatus of a few weeks while I brainstorm. We’ll see.

Assembly: the old set

The x86 architecture, at the assembly level, is quite a bit more complex than our old friend, the 6502. Part of that comes from the fact that it does so much more: it has more registers, a bigger address space, and just more functionality altogether. So it has to be more complex. Maybe not as much as it is, but then it also had to maintain some backward compatibility with Intel’s older processors, and compatibility always complicates matters.

But we won’t judge it in this post. Instead, we’ll look at it as impartially as possible. If you’ve read the earlier posts in this series on assembly language, you know most of the conventions, so I won’t repeat them here. Like with the 6502,, I’ll look at the different instructions in groups based on their function, and I’m ignoring a lot of those that don’t have much use (like the BCD arithmetic group) or are specifically made for protected mode. What you get is the “core” set at the 286 level.

The x86 instruction set

Even stripped down to this essence, we’re looking at around 100 instructions, about double the 6502’s set. But most of these have obvious meanings, so I won’t have to dwell on them. Specifics will mostly come when we need them.

Also, since I’m assuming you’re using NASM (and an older version, at that), the instruction format in the examples I use will also be for that assembler. That means a couple of things:

The destination always comes first. So, to move the contents of the DX register to AX, you say mov ax, dx.
Square brackets indicate indirection. Thus, mov ax, bx moves the contents of BX into AX, while mov ax, [bx] moves the value in the memory location pointed to by BX.
NASM requires size suffixes on certain instructions. These are mostly the “string” instructions, such as MOVS, which you’d have to write as MOVSB or MOVSW, depending on the width of the data.

Flags

The x86, like most processors, comes with a number of flags that indicate internal state. And, as with the 6502, you can use these to control the flow of your own programs. Those that concern us the most are the carry, zero, sign, overflow, direction, and interrupt flags. The first three should be pretty obvious, even if you didn’t read the first parts of the series. The interrupt flag is likewise mostly self-explanatory. “Direction” is used for string instructions, which we’ll see later. And the overflow flag indicates that the last operation caused signed overflow based on two’s-complement arithmetic, as in this example:

overflow:
    mov al, 127
    add al, 2
; overflow flag is now set because 127 + 2 = 129,
; which overflows a signed byte (129 ~~ -127)
    add al, 2
; now overflow is clear, because -127 + 2 = -125

The carry, direction, and interrupt flags can be directly altered through code. The CLC, CLD, and CLI instructions clear them, while STC, STD, and STI set them. CMC complements the carry flag, flipping it to the opposite value. You can also use PUSHF to put whole register onto the stack, or POPF to load the flags from there; these instructions weren’t on the original 8086, however.

MOV and addressing

The MOV instruction is the workhorse of x86. It covers loads, stores, and copying between registers, and later extensions have made it Turing-complete in its own right. But in its original form, it wasn’t quite that bad. Plus, it allows all the different addressing modes, so it’s a good illustration of them.

The function of MOV is simple: copy the data in the source to the destination. Despite being short for “move”, it doesn’t do anything to the source data. The source, as for most x86 instructions can be a register, memory location, or an “immediate” value, and the destination can be memory or a register. The only general rule is that you can’t go directly from memory to memory in the same instruction. (There are, of course, exceptions.)

Moving registers (mov dx, ax) and immediate values (mov ah, 04ch) is easy enough, and it needs no further explanation. For memory, things get hairier. You’ve got a few options:

Direct address: a 16-bit value (or a label, in assembly code) indicating a memory location, such as mov ax, [1234] or mov dh, [data].
Register indirect: three registers, BX, SI, and DI, can be used as pointers within a segment: mov al, [bx] loads AL with the byte at location DS:BX.
Indexed: the same registers above, now with BP included, but with a displacement value added: mov al, [bx+4]. (BP is relative to the stack segment, though.)
Base indexed: either BX or BP plus either SI or DI, with an optional displacement: mov [bx+si+2], dx. (Again, BP uses the stack segment, all others the data segment.)

So MOV can do all of that, and that’s before it got expanded with 32-bit mode. Whew. If you don’t like clobbering the old value at the destination, you can use XCHG instead; it works the same way. (Interestingly, the x86 do-nothing instruction NOP is encoded as xchg ax, ax, which really does do nothing.)

Arithmetic and logic

After all the moving around, computing on values is the next most important task. We’ve got most of the usual suspects here: addition (ADD or the add-with-carry ADC); subtraction (SUB or SBB); logical AND, OR, NOT, and XOR (those are their mnemonics, too). There’s also a two’s-complement negation (NEG) and simple increment/decrement (INC, DEC) operations. These all do about what you’d expect, and they follow the same addressing rules as MOV above.

We can shift and rotate bits, as well. For shifting, SHL goes to the left, while SHR or SAR moves to the right; the difference is that SHR always shifts 0 into the most significant bit, while SAR repeats the bit that was already there. (Shifting left, as you probably know, is a quick and dirty way of multiplying by 2.)

Rotating moves the bits that get shifted “off the edge” back around to the other side of the byte or word, but it can optionally use the carry flag as an “extra” bit, so we have four possible permutations: ROL, ROR, RCL, RCR. The “rotate with carry” instructions effectively place the carry flag to the left of the most significant bit. Note that both shifting and rotating can take an immediate value for the number of times to shift, or they can use the value in CL.

A couple of instructions perform sign-extension. CBW takes the top bit in AL and duplicates it throughout AH. CWD works the same way, cloning the high bit of AX into every bit of DX. These are mainly used for signed arithmetic, and the registers they work on are fixed.

Unlike the 6502, the x86 has built-in instructions for multiplication and division. Unlike modern systems, the 16-bit versions are a bit limited. DIV divides either AX by a byte or DX:AX by a word. This implied register (or pair) also holds the result: quotient in AL or AX, remainder in AH or DX. MUL goes the other way, multiplying AL by a byte or AX by a word, and storing the result in AX or DX:AX. Those are more than a little restrictive, and they’re unsigned by design, so we also have IMUL and IDIV. These are for signed integers, and they let you use an immediate value instead: imul ax, -42.

Two other useful instructions can go here. CMP subtracts its source value from its destination and sets the flags accordingly, but throws away the result. TEST is similar, logical-ANDing its operands together for flag-changing purposes only. Both of these are mainly used for conditional flow control, as we’ll see below.

Flow control

We can move data around, we can operate on it, but we also need to be able to change the execution of a program based on the results of those operations. As you’ll recall, the 6502 did this with branching instructions. The x86 uses the same mechanism, but it calls them jumps instead. They come in two forms: conditional and unconditional. The unconditional one, JMP, simply causes the processor to pick up and move to a new location, and it can be anywhere in memory. Conditional jumps are only taken if certain conditions are met, and they take the form Jcc, where cc is a condition code. Those are:

C and NC, for “carry” and “no carry”, depending on the carry flag’s state.
Z and NZ, “zero” and “not zero”, based on the zero flag.
O and NO, for “overflow” and “no overflow”; as above, but for the overflow flag.
S and NS, “sign” and “no sign”, based on the sign flag; “sign” implies “negative”.
B and NB, “below” and “not below”, synonyms for C and NC.
A and NA, “above” and “not above”; “above” means neither the carry nor zero flag is set.
AE, BE, NAE, NBE; the same as the last two pairs, but add “or equal”.
L and NL, “less than” and “not less than”; “less than” requires either the sign or overflow flag set, but not both.
LE and NLE, “or equal” versions of the above.
G, GE, NG, NGE, “greater than”, etc., for the opposites of the previous four.
CXZ and NCXZ, “if CX is/is not zero”, usually used for loops.

These can be confusing, so here are a couple of examples:

mov ax, [value1]
mov dx, [value2]

a_loop:
add ax, dx

; jump if ax > 127,
; otherwise try again
jo end
jmp a_loop

end:
; do something else

mov ax, [bp+4]
cmp ax, 0
; if ax == 0...
jz iszero

; else if ax > 0...
jg ispos

; else if ax < 0...
jl isneg

; or if something went wrong
jmp error

CALL calls a subroutine, pushing a return address onto the stack beforehand. RET is the counterpart for returning from one. INT and IRET work the same way, but for interrupts rather than subroutines; INT doesn’t take an address, but an interrupt number, as we have seen.

A special LOOP instruction allows you to easily create, well, loops. It uses CX as an implicit counter, stopping when it reaches zero. You might use it like this:

; clear the screen
mov ax, 0b800h
mov es, ax
xor di, di  ; quicker clear to 0
mov cx, 80 * 25
mov dl, 20h ; ASCII code for space

nextchar:
mov [es:di],dl
add di,2    ; skip video attribute
loop nextchar

Two variant instructions LOOPZ and LOOPNZ, require that the zero flag be set or cleared, respectively, or they end the loop prematurely.

The stack

All x86 programs have use of a stack, and it’s not limited to 256 bytes like its 6502 cousin. Accessing the stack can’t be done directly in 16-bit land, as there’s no way to address relative to SP, but copying its value into BP and accessing from that is common. But even better are PUSH and POP, which take care of everything for you. They can be used on any register—except that you can’t pop into CS—and even memory; PUSH can also put an immediate value on the top of the stack, though not on the original 8086.

The stack grows “downward”. When a value is pushed onto it, that value is moved into the position pointed to by SP, and SP is decremented by 2. Popping does the opposite. Effectively, the instructions work like this:

do_push:
    mov [sp], value
    sub sp, 2

do_pop:
    mov value, [sp]
    add sp, 2

PUSHA and POPA are shortcuts for pushing all of the main 8 registers, helpful when you need to save state before starting a big subroutine.

Strings

The x86 can’t really work on strings, but it can work with arrays of bytes or 16-bit words using simple instructions. This is done through five instructions that operate on either bytes or words; NASM requires a suffixed “B” or “W” for these, but I’ll refer to them with a general “x”.

In all these cases, the “source” address is, by default, DS:SI, and the destination is ES:DI. Also, because these instructions were meant to be done in blocks, they can take a special REP prefix. This works a bit like LOOP, in that it stops when CX reaches 0. (REPE and REPNE are also available, and they work like LOOPZ and LOOPNZ.) After the instruction performs its operation, it increments both SI and DI by 1 for the byte version, 2 for the word version. This is where the direction flag comes into play, however: if it’s set, these instructions instead subtract 1 or 2 from those registers, effectively performing the operation in reverse.

LODSx and STOSx load and store, respectively. LODSx puts the value at [DS:SI] into AL or AX, while STOSx moves AL or AX into memory at [ES:DI]. Either way, they then change the index register (SI or DI) as described above. REP doesn’t really make sense with these, but they can work in hand-rolled loops.

MOVSx is a little more general, and it’s one of the few memory-to-memory operations available on the early x86. It copies a byte or word at [DS:SI] to [ES:DI], then changes both index registers based on the data width (1 for byte, 2 for word) and the direction flag (up for cleared, down for set). It’s all but made for block copying, as in this example:

; assumes SI and DI point to appropriate memory areas
; and CX holds a count of bytes to move
memcpy:
rep movsb
ret

CMPSx compares bytes or words, setting the flags accordingly. It could be used to implement a string comparison function like so:

; assumes SI and DI point where they should,
; and CX contains the max number of characters to test
; returns a value in AL:
; -1 if the "source" (1st) string is less,
; +1 if it's greater,
; 0 if they're equal
strncmp:
xor al, al
repe cmpsb
jg greater
dec al  ; sets to FFh, or -1
jmp exit

greater:
inc al  ; sets to 01h

ret

Finally, SCASx sets the flags based on a comparison between AL (for bytes) or AX (for words) and the value at [ES:DI]. The mnemonic stands for “scan string”, and that’s what it can do:

; assumes DI points to a string,
; CX holds the length of the string,
; and AL holds the character to search for
; returns in AX:
; position of found character, or -1 if not found

contains:
mov dx, cx
repne scasb
jncxz found

; character not found, since we ran out of string
mov ax, 0ffffh
jmp end

found:
; CX now holds the number of characters from string end,
; but we saved the original length in DX
; thus, the position is DX - CX - 1
inc cx
sub dx, cx
mov ax, dx

end:
ret

Input and output

Input and output send bytes or words between registers and the I/O space. This is a special set of 64K (65,536) memory locations, though only the first 1,024 were used on early PCs. Using them involves the IN and OUT instructions. These are fairly restrictive, in that they imply the AL or AX register for the data and DX for the I/O port: in ax, dx or out dx, al. However, for the “low” ports, those with addresses up to 0xff, you can instead use an immediate version: in al, 40h.

The 286 added in string I/O with the INSx and OUTSx instructions. These work similarly to LODSx and STOSx above, but the data is either coming from or going to an I/O port instead of main memory. (This was a bit faster than doing a manual loop, and some early peripherals actually couldn’t handle that!) The port number is always in DX, while [DX:SI or [ES:DI] is the data pointer, as above.

Enough for now

And we’re finally done. Next time, we can start programming this thing, but this post is already way too long, so I’ll see you later.

Mothers

Yesterday was Mother’s Day. (Coincidentally enough, it was also my mom’s birthday.) That’s a time to reflect on motherhood in general, or on how it has affected us. If you think about it, being a mother is the most altruistic thing a woman can do. Months of pregnancy, hours of labor, and years of care, all for the biological, psychological, and physical well-being of another. Look at it from that perspective, and they deserve a whole lot more than just one day.

I consider myself blessed to have grown up so close to my mother and, until about a year ago, her mother. Not everyone has that opportunity, unfortunately. So let’s take this day off to think about what we do have. Let’s bring a little bit of yesterday into today.

If you’re a writer, what are your characters’ mothers like? What do they think of the adventures of their children?

For worldbuilders, today’s the day to contemplate motherhood as a cultural notion. Are mothers revered among your fictitious people? Are they granted leniency or honor that a childless woman wouldn’t receive?

And for everybody: we all had a mother at some point. Some of us still do, and we should be thankful.