Software internals: Searching

Big Data is all the rage today, but what good is all that data if you can’t find what you’re looking for? Google, among others, has made billions of dollars by using and developing search algorithms, the backbone of analytics. While I couldn’t even begin to show you how their algorithms work (nor would you likely ever need to know), we can look at their precursors, the simple searches common to many programming languages.

Brute force

The simplest way to search for an item in a list (a value in an array, a substring in a larger body of text, or whatever) is to loop through the list and check each value to see if it’s the one you want. This is a linear search, and it’s not hard to implement, even in JavaScript:

function linear_search(array, value) {
    for (var i = 0; i < array.length; i++) {
        if (array[i] == value) {
            return i;
        }
    }

    return -1;
}

There’s more than one way to do it, but that’s the most straightforward. It’s also the “classic” form familiar to programmers of C and related languages. But it comes at a price: execution time. Linear searches are O(n); the time they take is dependent on the number of items you’re searching over. (Statistically speaking, a linear search of a list of size N will, on average, take N/2 iterations.) Surely we can do better, right?

Divide and conquer

If our list is sorted, we can take an approach that you might recognize from “higher or lower” number guessing games. The binary search starts at a point somewhere in the middle of a list and checks to see if it’s at the right spot. If it’s not, then—because the list is sorted—it knows which way to go for the next guess. It looks a bit like this:

function binary_search(array, value) {
    var idmin = 0;
    var idmax = array.length - 1;

    while (idmin < idmax) {
        // Note: older versions of JS use Math.floor() here
        var midpoint = idmin + Math.trunc((idmax - idmin) / 2);

        if (array[midpoint] == value) {
            return midpoint;
        }
        else if (array[midpoint] < value) {
            idmin = midpoint + 1;
        }
        else {
            idmax = midpoint - 1;
        }
    }

    return -1;
}

Again, there are other ways of writing this function, but the one shown here works for our purposes.

The binary search requires a sorted array. That’s how it’s able to deduce where in the list, relative to its pointer, the correct value is. If you ignore the sorting step (probably because you were going to do it anyway), then it comes out to O(log n) time. That’s faster than a simple linear search, and good enough to put a version of the algorithm into the standard libraries of C (bsearch()), C++ (std::binary_search()), and Java (Arrays.binarySearch()), among others.

Substrings

Searching strings for a specific substring is its own problem with its own considerations. If the substring you’re looking for is only a single character, then the whole exercise degenerates into a case of the above, and you can use linear search (as C’s strchr() likely does behind the scenes). But searching for multi-character strings is a bit harder. A naive approach might look like this:

function substring_search(text, ss) {
    var initial = ss[0];
    var found = false;

    for (var i = 0; i < text.length - (ss.length - 1); i++) {
        if (text[i] == initial) {
            found = true;
            for (var j = 1; j < ss.length, found == true; j++) {
                if (text[i+j] != ss[j]) {
                    found = false;
                }
            }

            if (found) {
                return i;
            }
        }
    }

    return -1;
}

Basically, this is a linear search for the first character of the search string, but with a twist. Whenever that first character is found, an inner loop runs, checking the following characters in the text to see if they match as well. If they do, then we’ve got a perfect match. Otherwise, we keep going from where we left off.

This “dumb” approach takes O(n+m) time, but it’s hard to improve on it without creating some more complicated data structures. Fortunately, that’s exactly where the series is going next, so stay tuned.

Revolution

On this day, it’s hard for an American to not think about revolution and rebellion. But what does it mean to revolt? And how can we incorporate that into worldbuilding?

In real life, revolutions are bloody business. Here, we’re familiar with the American Civil War. Essentially, the southern portion of the US declared itself an independent, sovereign state. The rest took exception to that, and the next four years were spent fighting it out. Less than a century before that, British colonists did the same thing, but with one key difference: they won.

Some rebellions are successful, like the American one or Russia’s October Revolution of November 1917 (don’t ask about the dates). Others, such as the Confederacy (1861-65) and the Boxer Rebellion of China (1899-1901), were crushed. In modern times, further examples are the Arab Spring uprisings of five years ago, the Sudanese Civil War that created the nation of South Sudan (now undergoing its own rebellion), or the seemingly interminable conflict in Syria.

All these have one thing in common: they were or are violent affairs. When Scotland attempted to secede from the United Kingdom, it was seeking to become recognized peacefully, and such a feat is nearly unheard of. Nationalism is a strong force, and those who have power rarely want to give it up; those factors combine to ensure that independence will almost always come from the barrel of a gun, not a signature and a stamp.

Building the revolution

For authors, that’s a good thing. Horrible though it may be to live through a revolution, writing about it from afar is safe enough. The setting is ripe with conflict, from the military to the dramatic. And since these times of upheaval are all but guaranteed to be laced with violence, they fit any of the bloodier genres, too.

What makes a good revolution, though? First, there needs to be a reason for it to exist. Why do certain people think they’d be better off without their mother country? The colonists in America complained about unfair taxation and, well, colonialism. So did India, for that matter. The Bolsheviks had a particular political ideology they wanted to enforce on their country. ISIS seems to want general discord and chaos. The reasons are varied, and the exact nature of the revolutionaries’ aims will determine much about the setting.

After you know why these guys are fighting, you can look at how. Pitched battles are nice if you can afford the manpower, the weapons, and the skilled officers, and if the time period fits. Nowadays, the mother country would just drop bombs on the front lines. But your options don’t have to go straight to armed conflict. Some revolutions might start that way, but many begin as political movements that later snowball into an unstoppable—or entirely stoppable, for those that failed—forces of change. Early on, a small, close-knit group can wage a war of words, spreading propaganda and information, whispering against the establishment, and so on.

Once the true fighting starts, then you’ve moved beyond a simple revolution and into a civil war. A civil war is, at its heart, a war, and we know how to write those. But its nature will add new dimensions. It’s not a case of fighting a bunch of strangers from a faraway land. Now you’re fighting your neighbors, your former countrymen. Nor will it be a symmetrical conflict. The rebels will most likely be outnumbered, outgunned, and outmaneuvered. They’ll have, at best, the support of the people and whatever materiel they were able to smuggle, steal, or bring from home. They are what we might call an underdog.

That leads us to a third question, one that only the author can answer for certain. Are the rebels right? It’s not necessarily a case of good versus evil. The mother country may very well have a legitimate government, rather than being a tyrannical empire. The rebels could openly advocate terrorism. And there can be factions on both sides. Black and white worked for Star Wars, but some works need to take things more seriously, and that means shades of gray.

A simple illustration of our own time should make all of this more clear. Take Syria, as it has been for the entirety of this decade. Bashar al-Assad is the country’s head of state. By all accounts, he’s a nasty sort, with the dictatorial bent so common in the Middle East, but he has the legitimate claim to rule. The wave of rebellions sparked by Arab Spring came to Syria, and the populace rose up against him, just like they did in Tunisia, Egypt, Algiers, and elsewhere. Assad fought back, and the two sides have been locked in civil war ever since.

But here’s the rub. The rebels of Syria were not one cohesive unit. They were made up of a number of smaller groups, each with its own grievances and goals. But in 2011, most people would be rooting for their side, because they were perceived as fighting the good fight. All would have been well, except that one of those rebel factions was ISIS. A work as simplistic as Star Wars could never hope to convey the machinations of a revolutionary force containing a subgroup objectively worse than the Evil Empire, not while trying to maintain the “David and Goliath” narrative.

Revolutions, to put it plainly, are complex. They’re tricky business. Writing one is hard work, a juggling act that many might want to avoid. For those who try, I wish you all the luck in the world. Whether you go the easy “rebels fighting the empire” route or all the way to Game of Thrones-level political scheming, you know it’s gonna be alright.

Alien phonologies

As promised, this post will begin a look at creating a conlang for aliens. By “alien”, I mean any intelligent, non-human species, not just those living on other planets. A race of uplifted cats in a far-future science fiction story would be no less alien than an Area 51 Gray. Either way, we’re not dealing with the same structures that give humans the power of speech.

And that’s the first thing to think about when designing an alien conlang. How is this race different? It may not be as satisfying from an astrobiological standpoint, but readers (or players or viewers, depending on your chosen media) will tend to judge aliens contrastively. We’re only human, so we’ll look at aliens through human-tinted glasses. If you’re taking the “harder” design approach, then it’s your job to please those anthropocentric sensibilities and build the creatures that express what you’re trying to say. At the same time.

It’s a difficult, thankless task. I can’t help you earn recognition and praise, but I can make things a little easier. Just as our interminable series on creating human languages began with sounds, so will our look at alien conlangs. But first, we’re going to see what makes human language, well, human.

The ability of speech

Human beings have a vast capability for making sounds. Developed over countless generations through the slow, gradual process of evolution, our speech organs are far and away our primary means of communication. Other species can create sound. Dogs bark, cats meow, and so on. Some of them even have a small capacity for rudimentary language.

What makes humans different is a matter of speculation, but one thing we have over “lesser” animals is our larger brain. Among its many other useful properties, it gives us a greater grasp on the abstract. Language, at its core, can be described as a system of logical relations, and humans are equipped—uniquely so, on this planet—to consider these relations in an abstract context.

For aliens, brainpower will be of utmost importance. How capable are they of this “higher” thought? The better they are able to comprehend the abstractions of language, the more complex that language can become.

The vocal tract

Higher thought wouldn’t be very useful without a way to express it, so we have a well-developed means of creating a vast array of sounds: the vocal tract. Put simply, sounds are produced in the larynx, then modified in various ways on their journey into and out of the mouth and nose. If, for example, the vocal cords vibrate, then the effect is to create a “voiced” sound. A phoneme produced with the back of the tongue placed against the soft palate (or velum, hence velar) will sound different from one made when the front of the tongue touches the teeth (a dental consonant). It’s these distinctions—and many others—that combine to give us the IPA chart.

If you know anything about conlanging in general, you probably already have a good understanding of this part of phonetics, so let’s switch our focus to how it affects alien races. Here, physiology will play a role. The human body’s linguistic implements are not a given. They are by no means universal. We have a mouth, a nose, a tongue, a set of teeth, and so on, but there’s no guarantee that aliens will have all of those…or only those.

It’s always easier to take away, so we’ll start with that. “Humanoid” aliens without noses, for instance, will obviously find nasal phonemes impossible. Those with less dextrous tongues would have a hard time with a retroflex consonant. Without our level of conscious control over the vocal folds, voicing distinctions are out of the question. If the lips can’t round, you can’t have rounded vowels. Basically, any part of the vocal tract you remove obliterates an entire series of potential phonemes.

On the other hand, each new area will add whole classes of phonemes that are beyond human capabilities. A race with a larger, more complex mouth would have more points of articulation. One with extra tongue muscles might have whole new manners of articulation.

Mental and sensory development can come into play here, too. An alien species with poor pitch detection (in other words, a tone-deaf race) won’t be speaking something like Chinese, while one with evolved perfect pitch might be more likely to speak a tonal language. Races with better hearing may have an entire class of “whispered” phonemes. Anything you can think of, as long as it makes sense for the body and brain of your alien creation, is in play here.

Other cues

But language isn’t all about generating and shaping sound. There’s more to it than that, even in the realm of phonology and phonetics. Tone, as mentioned above, can be integral to a race’s language. Stress and rhythm also play a role. Add in syllabic constraints like the sonority hierarchy (which may be different for aliens), and you’ve got an enormous playground for creation.

For the truly alien, though, it’s not entirely clear that they’ll speak with words as we know them. Some species might also use other auditory cues, from grunts to whistles to whatever else you can think of. Others may have a complex secondary language of gesture and sign, which could accompany spoken language or even replace it under certain circumstances. It’s even possible that the other senses may come into play. It’s been said that if they could write, dogs’ stories would mostly be about smells. Aliens with advanced senses of smell and the means to generate a small range of scents could incorporate those into their language as a suprasegmental component, an olfactory counterpart to tone.

In the end, it comes down to this: how do your aliens work? Once again, that contrast between human and alien helps us. Find the ways they’re different, and you’ll know where to begin. From there, the sky really is the limit.

A final word

Lastly, I’d like to make a note about orthography, because I don’t have the slightest idea how it would work. Alien languages with similar sounds can be transcribed into something resembling human tongues. If the biology works out, it might even be the right thing to do.

But how do we represent new points (or manners) of articulation? How do you indicate that this syllable is spoken at a pitch so high it’s above the range of human hearing? Or say a word means “mother” if you speak it while holding your left hand up, but it’s “death” if you raise your right eyebrow. How would you write that? Figuring out how to represent alien language might be just as hard as communicating in it, and that’s before we even start looking at the differences in grammar and lexicon.

In case you haven’t noticed, I’m on a bit of an alien kick right now, and this is the conlang portion of that. Later on, I hope to explore the other dark corners of xenolinguistics, a word I may have made up this very moment. If you like that, be sure to check out my sporadic Monday posts about creating the aliens themselves.