Let’s make a language, part 18a: Geography (Intro)

The world is a very big place, and it contains a great many things. Even before you start counting those that are living—from plants and animals down to microbes—you can find a need for hundreds or thousands of words. So that’s what we’ll do in this entry. We’ll look at the natural world, but we’ll avoid talking about its flora and fauna for the moment. Instead, the focus will be on what we might call the natural geography. The lay of the land, if you will.

The world itself

For us, “world” is virtually synonymous with “earth” and “planet”. But that’s an artifact of our high-tech society. In older days, these concepts were pretty separate. The earth was the surface, the ground—the terra firma. Planets were wandering stars in the sky, so named because they seemed to change their positions from night to night, relative to the “fixed” background stars. And the world was everything that could be observed, closer to what we might call the “universe” or “cosmos”.

Within this definition of the world, many cultures (and thus languages) create a three-way distinction between the earth, sea, and sky. Earth is solid, dry land, where people live and work and farm and hunt. Sea is the open water, from the Mediterranean to the Pacific, but not necessarily rivers and lakes; it’s the place where man cannot live. And the sky is the vast dome above, home of the sun, moon, and stars, and often whatever deity or deities the speakers worship. In pre-flight cultures, it tends to have dreamlike connotations, due to its effective inaccessibility. People can visit the sea, even if they can’t stay there, but the sky is always out of our reach.

Here, the details of your speakers’ world come into play. If they’re on Earth, then they’ll probably follow this terrestrial model to some extent. Aliens, however, will tailor their language to their surroundings. A world without a large moon like ours likely won’t have a word for “moon”; ancient Martians, for instance, might consider Phobos and Deimos nothing more than faster planets. Those aliens lucky enough to have multiple moons, on the other hand, will develop a larger vocabulary for them. The same goes for other astronomical phenomena, from the sun to the galaxy.

Land and sea

Descending to that part of the world we can reach, we find a bounty of potential words. There’s flat land, in the form of plains and valleys and fields. More rugged are the hills and mountains, distinguished with separate words in many languages; hills are really not much more than small mountains, but few languages conflate the two. Abundant plant life can create forests or, in some places, jungles, and a culture adapted to either of these areas will likely make far finer distinctions than we do. On the opposite end are the dry deserts, which aren’t necessarily hot (the Gobi is a cold desert, as is Antarctica). These don’t seem truly hospitable for life, but desert cultures exist all across the globe, from the Bedouins of the Middle East to the natives of the American Southwest, but they’ll always seek out sources of water.

Fresh water is most evident in two forms. We have the static lakes and the moving rivers as the most generic descriptors, but they’re far from all there is. Ponds are small lakes, for example, and swamps are a bit like a combination of lake and land. Rivers, owing to their huge importance for travel in past ages, get a sizable list: streams, creeks, brooks, and so on. All of these have slightly different meanings, but those can vary between dialects: what I call a creek, someone in another state may deem a brook. And the shades of meaning don’t cross language barriers, either, but a culture depending on moving bodies of water will tend to come up with quite a few words describing different kinds of them.

In another of the grand cycles of life, fresh water spills into the seas. Now, English has two words for salty bodies of water, “sea” and “ocean”, but that doesn’t mean they’re two separate things. Many languages have only one word covering both, and that’s fine. Besides, a landlocked language won’t really need to spend two valuable words on something that might as well not exist.

In addition to the broad range of terrain, terms also exist for smaller features. Caves, beaches, waterfalls, islands, and cliffs are just some of the things we name. Each one tends to be distinctive, in that speakers of a language have a set image in their minds of the “ideal” cave or bluff or whatever. That ideal will be different for different people, of course, but few would, for instance, think of the fjords of Norway when imagining a beach.

Talking about the weather

The earth and sea are, for the most part, unchanging. Scientifically, we know that’s not the case, but it’s close enough for linguistic purposes. The weather, however, is anything but static. (Don’t like the weather in {insert place name here}? Wait five minutes.) Languages have lots of ways to talk about the weather, and not just so that speakers will have a default topic for conversation.

Clouds are the most visible sign of a change in weather, but the wind can also tell you what’s to come. And for reasons that are probably obvious, there seems to be a trend: the worse the weather, the more ways a language has to talk about it. We can have a rain shower, a drizzle, maybe some sprinkles, or the far more terrible torrent, deluge, or flood. Thunder, lightning, snow (in places that have it), and more also get in on the weather words. In some locales, you can add in the tornado (or whirlwind) and hurricane to that list.

Culture and geography

Hurricane is a good example of geographical borrowing. It refers to a storm that can only form in the tropics, generally moving westward. That’s why the Spanish had to borrow a name from Caribbean natives—it was something they never really knew. True, hurricanes can strike Spain. Hurricane Vince made landfall in 2005, but 2005 was a weird year for weather all around, and there’s no real evidence that medieval and Renaissance Spaniards had ever seen a hurricane.

And that’s an important point for conlangers. Speakers of languages don’t exist in a vacuum, but few languages ever achieve the size of English or Spanish. Most are more limited in area, and their vocabulary will reflect that. We’ll see it more in future parts looking at flora and fauna, but it’s easy to illustrate in geography, too, as the hurricane example shows.

People living in a land that doesn’t have some geographical or meteorological feature likely won’t have a native word for it. The Spanish didn’t have a word for a hurricane. England never experienced a seasonal change in prevailing winds, so English had to borrow the word monsoon. Europe doesn’t have a lot of tectonic activity, but Japan does, so they’re the ones that came up with tsunami. The fjords of Scandinavia are defining features, but ones specific to that region, so we use the local name for them.

Conversely, those things a culture experiences more often will gain the focus of its wordsmiths. It says something about the English speaker’s native climate that there are so many ways to describe rain. Eskimo words for “snow” are a running linguistic joke, but there’s a kernel of truth in there. And English’s history had plenty of snow, otherwise we wouldn’t have flurries, flakes, and blizzards.

Time is also a factor in which lexical elements a language will have. Some finer distinctions require a certain level of scientific advancement. The cloud types—cumulus, nimbus, cirrus, etc.—were only really named two centuries ago, and they used terms borrowed from Latin. That doesn’t mean no one noticed the difference between puffy clouds and the grim deck of a nimbostratus before 1800, just that there was never a concerted effort to adopt fixed names for them. The same can be said for most other classification schemes.

Weather verbs

Finally, the weather deserves a second look, because it’s the reason for a very special set of verbs. In English, we might say, “It’s raining.” Other languages use an impersonal verb in this situation, with no explicit subject. (Our example conlang Ardari uses a concord marker of -y in this case.) For whatever reason, weather verbs are some of the most likely to appear in a form like this.

Perhaps it’s because the weather is beyond anyone’s control. It’s a force of nature. There’s no subject making it rain. It’s just there. But it’s one more little thing to consider. How does your conlang talk about the weather? You need to know, because how else are you going to start a conversation with a stranger?

First glance: C++17, part 2

Last time, we got a glimpse of what the future of C++ will look like from the language perspective. But programming isn’t just about the language, so here are some of the highlights of the C++ Standard Library. It’s getting a bit of a makeover, as you’ll see, but not enough to cover its roots.

Variants

Without even looking through the whole list of changes and additions, I already knew this was the big one, at least for me. The variant is a type-safe (or tagged) union. It’s an object that holds a value of one type chosen from a compile-time list. You know, like C’s union. Except variant keeps track of what kind of value it’s holding at the moment, and it’ll stop you from doing bad things:

variant<int, double> v;
v = 42;

// This works...
auto w = get<int> v;    // w = 42

// ...but this one throws an error
auto u = get<double> v; // nope!

Optional values

This one’s similar, and it was supposed to be in C++14. An optional is either a value, or it’s not. In that, it’s like Haskell’s Maybe. You can use it to hold the value of a function that can fail, and it works like an error signal. When converted to a boolean (as in an if), it acts as true if it contains a value, or false if it doesn’t. Not huge, but a bit of a time-saver:

optional<unsigned int> i;

// some function that can return an int or fail
i = f();

if (i)
{
    // work with a proper value
}
else
{
    // handle an error condition
}

Any values

The third of this little trinity is any, an object that can—as you might imagine—hold a value of any type. You’re expected to access the value through the any_cast function, which will throw an exception if you try the wrong type. It’s not quite a dynamic variable, but it’s pretty close, and it’ll likely be faster.

apply

If you’ve ever used JavaScript, you know about its apply method. Well, C++ will soon have something similar, but it’s a free function. It calls a function (or object or lambda or whatever) with a tuple of arguments, expanding the tuple as if it were a parameter pack.

Searching

Yes, C++ lacked a standard way of searching a sequence for a value until now. Rather, it lacked a general way of searching for a value. Some searches can be made faster by using a different algorithm, and that’s how C++17 upgrades std::search. And they’re nice enough to give you a couple to get started: bayer_moore_searcher and bayer_moore_horspool_searcher. No points for guessing which algorithms those use.

Clamping

It’s common to need to clamp a value to within certain bounds, but programming languages don’t seem to realize this. Libraries have functions to this, but languages rarely do. Well, C++ finally did it. That’ll instantly shave off 50% of the lines of code working with lighting and colors, and the rest of us will find some way to benefit.

Mathematical special functions

C++ is commonly used for numeric computation, but this set of functions is something else. They likely won’t be of interest to most programmers, but if you ever need a quick beta function or exponential integral, C++17 has got you covered.

Filesystem

Alright, I’ll admit, I was holding back. Everything above is great, but the real jewel in the C++17 Standard Library is the Filesystem library. If you’ve ever used Boost.Filesystem, you’re in luck! It’s the same thing, really, but it’s now standard. So everybody gets to use it. Files, paths, directories, copying, moving, deleting…it’s all here. It certainly took long enough.

Still not done

That’s not nearly everything, but those are my favorite parts. In next week’s finale, we’ll switch to the lowlights. We’ll see those features that just didn’t make the cut.

First glance: C++17, part 1

C++ is a language that is about as old as I am. Seriously. It was first called “C++” in December 1983, two months after I was born, although it had been in the works for a few years before that. So it’s an old language, but that doesn’t mean it’s obsolete or dead. No, far from it. In fact, the latest update to the language, called C++17, is scheduled for release in—you guessed it—2017, i.e., next year.

Why is that important? Well, if you know the history of C++, you know the story of its standardization. The first true standard only came out in 1998, and it was only then that all the template goodness was finally available to all. (Let’s all try to imagine Visual C++ 6 never happened.) Five years later, in 2003, we got a slight update that didn’t do much more than fill in a few blanks. Really, for over a decade, C++ was essentially frozen in time, and that was a problem. It missed the dot-com boom and the Java explosion, and the growth of the Internet and dynamic scripting languages seemed to relegate it to the dreaded “legacy” role.

Finally, after what seemed like an eternity, C++11 came about. (It was so delayed that its original codename was C++0x, because everyone thought it’d be out before 2010.) And it was amazing. It was such a revolution that coders speak of two different languages: C++ and Modern C++. Three years later, C++14 added in a few new bits, but it was more evolutionary than revolutionary.

What C++ did, though, was prepare programmers for a faster release schedule. Now, we’ve seen how disastrous that has been for projects like Firefox, but hear them out. Instead of waiting forever for all the dust to settle and a new language standard to form, they want to do things differently, and C++17 will be their first shot.

C++ is now built on a model that isn’t too different from version control systems. There’s a stable trunk (standard C++, of whatever vintage), and that’s the “main” language. Individual parts are built in what they call Technical Specifications, which are basically like Git branches. There’s one for the standard library, networking, filesystem support, and so on. These are largely independent of the standard, at least in development terms. When they’re mature enough, they’ll get merged into the next iteration of Standard C++. (Supposedly, that’ll be in 2019, but 2020 is far more likely.) But compilers are allowed—required, actually, as the language needs implementations before standardization—to support some of these early; these go under std::experimental until they’ve cooked long enough.

So C++17 is not exactly the complete overhaul of C++11, but neither is it the incremental improvement of C++14. It stands between the two, but it sets the stage for a future more in line with, say, JavaScript.

New features

I have neither the time nor the knowledge to go through each new feature added to C++17. Instead, I’ll touch on those I feel are most important and interesting. Some of these are available in current compilers. Others are in the planning stages. None of that matters as long as we stay in the realm of theory.

Fold expressions

Okay, I don’t care much for Haskell, but these look pretty cool. They take a parameter pack and reduce or fold it using some sort of operation, in the same way as Haskell’s foldl and foldr. Most of the binary operators can be used, which gives us some nifty effects. Here are a few basic examples:

// Returns true if all arguments are true
template <typename... Args>
bool all(Args... args) { return (... && args); }

// Returns true if *any* arguments is true
template <typename... Args>
bool any(Args... args) { return (... || args); }

// Returns the sum of all arguments
template <typename... Args>
int sum(Args... args) { return (args + ... + 0); }

// Prints all values to cout (name references JS)
template <typename... Args>
void console_log(Args&&... args)
    { (std::cout << ... << args) << '\n'; }

Yeah, implementing any, all, and even a variadic logging function can now be done in one line. And any functional fan can tell you that’s only the beginning.

Structured bindings

Tuples were a nice addition to C++11, except that they’re not terribly useful. C++, remember, uses static typing, and the way tuples were added made that all too evident. But then there’s the library function std::tie. As its name suggests, one of its uses is to “wire up” a connection between a tuple and free variables. That can be used for a kind of destructuring assignment, as found in Python. But C++17 is going beyond that by giving this style of value binding its own syntax:

using Point3D = tuple<double, double, double>;

// This function gives us a point tuple...
Point3D doSomething() { /* ... */ }

// ...but we want individual X/Y/Z

// With std::tie, we have to do this:
//
// double x, y, z;
// std::tie(x,y,z) = doSomething();

// But C++17 will let us do it this way:
auto [x,y,z] = doSomething();

Even better: this works with arrays and pairs, and it’s a straight shot from there to any other kind of object. It’s a win all around, if you ask me.

if initializers

This one’s less “Wow!” than “Finally!”, but it’s good to have. With C++17, you’ll be able to declare a variable inside the conditional of an if or switch, just like you’ve been able to do with (old-style) for loops for decades:

if (int value; value >= 0)
{
    // do stuff for positive/zero values
}
else
{
    // do stuff for negative values
    // Note: value is still in scope!
}

Again, not that big a deal, but anything that makes an overcomplicated language more consistent is for the best.

constexpr if

This was one of the later additions to the standard, and it doesn’t look like much, but it could be huge. If you’ve paid any attention to C++ at all in this decade, you know it now has a lot of compile-time functionality. Really, C++ is two separate languages at this point, the one you run and the one that runs while you compile.

That’s all thanks to templates, but there’s one big problem. Namely, you can’t use the run-time language features (like, say, if) based on information known only to the compile-time half. Languages like D solve this with “static” versions of these constructs, and C++17 gives us something like that with the constexpr if:

template<typename H, typename... Ts>
void f(H&& h, Ts&& ...ts)
{
    doItTo(h);

    // Now, we need to doItTo all of the ts,
    // but what if there aren't any?
    // That's where constexpr if helps.
    if constexpr(sizeof...(ts) > 0)
        doItTo(ts...);
}

If implemented properly (and I trust that they’ll be able to do that), this will get rid of a ton of template metaprogrammming overhead. For simple uses, it may be able to replace std::enable_if and tag dispatch, and novice C++ programmers will never need to learn how to pronounce SFINAE.

Continuation

Those are some of my favorite features that are on the table for C++17. In the next post, we’ll look at the changes to the standard library.

Building aliens – Evolution

Whether life is made from DNA, some sort of odd molecule, or binary data, it will be subject to evolution. That’s inherent in the definition of life. Everything living reproduces, and reproduction is the reason why evolution takes place. Knowing the how and the why of evolution can help you delve deeper into the creation of alien life.

How it happens

For life as we know it, evolution is the result of, basically, copying errors. DNA doesn’t replicate perfectly; there are always some bits that get flipped, or segments that are omitted or repeated. In that, our cells are a bit like an old record or CD player, skipping at the slightest bump. Sometimes, it knocks playback ahead, and you don’t get to hear a few seconds of your favorite song. Other times, it goes back, replaying the same snippet again. It’s the same for a strand of DNA.

Mutations, as these genetic alterations are called, happen for a variety of reasons. Maybe there was a glitch in the chemical reaction that produces the DNA replication. Perhaps a stray bit of radiation hit a base molecule at just the right time. (Digital organisms would not be immune to that one. Programs can crash due to bad memory, but also from cosmic rays—interstellar radiation—hitting the components. And as our processors and memory chips get ever smaller, the risk only increases.) Anything that can interrupt the reproduction process can be at fault, and there’s almost no way to predict what will happen on the base level.

Most of the time, these errors are harmless. A single base being swapped usually doesn’t do much by itself, although there are cases where they do. Our genetic code has builtin redundancy and error correction mechanisms to prevent this “drift” from causing too much harm. Single-celled organisms have a little more trouble, as they don’t have billions of copies of their genes lying around. They tend to bear the brunt of evolution, but it can be in their best interest, as anyone who knows about MRSA can attest.

A few larger errors (or a compounding of many smaller ones) can cause a greater change in an organism. That’s where natural selection comes in. Species adapt to their environments. All else being equal, those that are better adapted tend to reproduce more, thus ensuring their genes have a higher likelihood of passing on to further generations. Thus, evolution acts as a sort of feedback loop: beneficial mutations ensure their own survival, while harmful ones are stopped before they can get a foothold. Neutral mutations, however, can linger on, as they have little outward effect; its these that can give a species its variety, such as human hair and eye color.

How you can use it

Assuming current theories are anywhere close to correct, all life on Earth derives from some microbial organism that lived three or four billion years ago. Through evolution, everything from dogs to sharks to apple trees to, well, us came to be. There are a few open questions (What was that primordial organism? Is there a “shadow” biosphere? Etc.), but that’s the gist of it. And that tells us something important about alien life. If it exists, it’s probably going to work the same way. The Grays of Planet X, for example, would be related to everything native to their homeworld, but not to the aquatic beings of Planet Y. (Unless you count panspermia, but that’s another story.)

That does not mean that all life on a planet will look the same. How could it? A quick glance out your window should show you anywhere from ten to a thousand species, none of which are visibly alike, and that’s not counting the untold millions that we can’t see. Gut bacteria are necessary for life, and their also our ten-billionth cousins. Nobody would mistake a dog for a dogwood, but they both ultimately come from the same stock. So try to avoid the tired trope of “everything on this planet looks that same”.

On the other hand, the vagaries of evolution also mean that life on one planet probably won’t look like life on another. Sure, there may be broad similarities (physiology will be the subject of the next part of this series), but it’s highly unlikely that an alien world will have, say, lions or bears. (However, this doesn’t necessarily apply at microscopic scales, as there are fewer permutations.)

Classification

For worldbuilding, you’ll likely be most interested in the species level. That’s how we define humans, as well as many of the “higher” animals. We’re Homo sapiens, our faithful pets are Canis familiaris or Felis catus, and that nasty bug we picked up is Escherichia coli.

But closely related species share a genus, and this might be something to keep in mind, especially if you’re creating a…less-realistic race. Unfortunately for us, genus Homo doesn’t have any other (surviving) members; the Neanderthals, Homo erectus, and the “hobbits” of Flores Island were all wiped out millennia ago. But that doesn’t mean your world can’t have multiple intelligent species that are closely related. They can even interbreed.

Higher levels of classification (family, order, etc.) are less useful to the builder of worlds. The traits that members of these share are more broad, like mammals’ method of live birth or the social patterns of the hominids. Really, everything above the genus is an implementation detail, as far as we’re concerned.

Adaptation

Now, back to natural selection. Species, as I’ve already said, adapt to their environments over time. We can see that in animals, plants, and any other organism you care to name. Fur changes color to provide camouflage, beaks alter their shape to better fit in nooks and crannies. Blood cells change to protect against malaria—but that leaves them more susceptible to sickle-cell anemia.

If an organism’s environment shifts, then that can render the adaptations useless. The most dramatic instances of this are impact events such as the one that killed the dinosaurs, but ice ages, “super” El Niños, and other climate change can destroy those species that find themselves no longer suited to their surroundings. And species are interconnected, so the loss of population in one can trigger the same in another that depends on it, and so on.

Apex

Much of this is background material for most aliens. The ones that are most interesting to the public at large are those that are intelligent, civilized. Like us, in other words.

We are not immune to natural selection. Far from it. But we have managed to short-circuit it to a degree. People with debilitating disorders can live long lives, potentially even reproducing and thus furthering their genetic lines. Adding to this is artificial selection, as we have performed on hundreds of plant and animal species. That’s how domestication works, as much for a wolf as for a grapevine. We take those individuals with the most desirable qualities and work things out so those are the ones that get to reproduce. It works, as attested by the vast array of dog breeds.

So aliens like us—in the sense of having civilization and technology—won’t be as beholden to their environment as their “lesser” relations. They won’t be bound to a specific climate, and they’ll be largely immune to the small shifts. Does that mean evolution stops?

Nope. We’re still evolving. It’s just that the effects haven’t really shown themselves that much. We’re taller than our ancestors, for example, because taller men and women are generally seen as more attractive. (A personal data point: I’m 6 feet tall, a full 12 inches taller than my mother, and my father was 5’8″. Not that that seems to make me any more attractive.) We live longer, but that’s more a function of medicine, hygiene, and diet, not so much genetics. Parts of us that have evolved relatively recently include Caucasian skin and adult lactose tolerance.

If our species continues to thrive, it will continue to evolve. One sci-fi favorite is space colonization, and that’s a case where evolution will make a difference. It won’t take too many generations before denizens of Mars have adapted to lower gravity, for instance. People living on rotating stations might learn to cope with the Coriolis forces they would constantly feel. It’s possible that there may come a time when there are living humans that cannot survive on their original homeworld.

And the same may be true for aliens. As an example, take Mass Effect‘s quarians. In the third installment of the series, they can (if you play things right) return to their homeworld of Rannoch. But centuries of living as space nomads spoil the homecoming, as they find themselves poorly adapted to their species’ original environment. A race of many worlds will discover the same truth: evolution is unceasing.

On alliteration and assonance

When most people think about verse, they tend to think of rhyme first and foremost. Understandable, since that’s the defining quality of so much poetry. But there’s a whole other side of the word to explore, a front-end counterpart to the back-end rhyme.

Alliteration

Alliteration is the repetition of a sound at the beginning of a word, a mirror image to rhyming. It’s not quite as obvious these days, as rhyme and rhythm have won our hearts and minds, but it has an illustrious history. Some of the earliest Anglo-Saxon verse was composed using alliteration, as were epics from around the Western world. Classics such as “The Raven” and “Rime of the Ancient Mariner” have sections of alliterative verse, as do children’s nursery rhymes. Peter Piper probably needed something to catch the spit from all those P sounds. And who can forget all those old cartoons with hilariously alliterative newspaper headlines? Those were a thing, and they still are in places.

Echoes of alliteration are all around us. Like rhyme, the reason borders on the psychological. In oration, the beginning of the word tends to be more forceful than the end, more evocative. So punctuating your point with purpose (see what I did there?) helps to get your message stuck in the minds of your listeners. They can “latch on” to the repetition. Wikipedia’s article on alliteration uses King’s “I Have a Dream” speech as an example: “not by the color of their skin but by the content of their character.” Notice how the hard K sounds beginning each of the “core” words grab your attention.

To be alliterative, you don’t have to use the same sound at the beginning of every word. The rules of English simply can’t accommodate that. (Newspapers cheated by removing extraneous words such as “a” and “the”.) It’s the content words that are most important, especially the adjectives and nouns. However, alliteration tends to be stricter than rhyme in what’s considered the “same” sound. Voicing differences change the quality of the sound, so they’re out. Clusters are in the same boat. On the other hand, sometimes an unstressed syllable (like un- or a-) can be ignored for the purposes of alliteration.

Assonance

Alliteration is concerned with consonant sounds. (I did it again!) Assonance is different; it’s all about the vowels. What’s more, it’s not limited to the beginnings of words. Rather, it’s a vowel sound repeated throughout a phrase or line of verse. Vowel rhyming can be considered a form of assonance, but it’s so much more than that.

Assonance pops up everywhere there are vowels, which means everywhere. It’s very well suited to small utterances, such as a single line of a song or a proverb. As with alliteration, it’s not an absolute requirement for all the vowels to be the same, but those that are need to be essentially identical. And it’s the content words that are most important. Schwas, ineffectual as they are, don’t even appear on the radar; a and the aren’t going to mess up assonance. But any other vowel is fair game, in English or whatever language you’re using.

In conlangs

Alliteration and assonance are perfectly usable in any context, and they can be made to fit any language. They might not be quite as permissive as rhyme, but they can have a greater lyrical effect when used properly. (And sparingly. Don’t overdo it.)

These literary devices work best in languages with patterns of stress. That stress can be fixed, but that narrows your options slightly. Inflectional languages with fixed final stress are probably the worst for alliteration, while initial stress gives the most “punch”. For assonance, it’s not so vital, but you want to make sure your vowels aren’t being forced to a fit a pattern.

Both alliteration and assonance are easiest to accomplish in languages with smaller phonemic inventories. That shouldn’t be surprising. It’s far less work to find two words that both begin with a P if your only other options are B, D, K, and S. With these smaller sound sets (are you kidding me?), you can even create more complex styles of alliterative verse. Imagine a CV-type language with interwoven alliteration patterns, where the first and third words of a line start with one sound, while the second and fourth begin with a different one.

The other end of the spectrum holds English and most European languages, and it’s less amenable. You need lots of words, or you’ll have to get some help from stress and syllabics. That’s how we can have alliterative English: by ignoring those tiny, unstressed prefixes that pop up everywhere. It’s possible to make it work, but you have to try harder. But trying is what this is all about.

On game jams

This post is going up in August, and that’s the month for the summer version of everyone’s favorite game jam, Ludum Dare. But I’m writing this at the end of June, when there’s still a bit of drama regarding whether the competition will even take place. If it does, then that’s great. If not, well, that’s too bad. Neither outcome affects the substance of this text.

Ludum Dare isn’t the only game jam on the market, anyway. It’s just the most popular. But all of them have a few things in common. They’re competitive programming, in the sense of writing a program that follows certain rules (such as a theme) in a certain time—two or three days, for example, or a week—with the results being judged and winners declared. In this, it’s a little more serious than something like NaNoWriMo.

And it’s not for me. Now, that’s just my opinion. I’m not saying game jams are a bad thing in general, nor am I casting aspersions at LD in particular. I simply don’t feel that something like this fits my coding style. It’s the same thing with NaNoWriMo, actually. I’ve never truly “competed” in it, though I have followed along with the “write 50,000 words in November” guideline. Again, that’s because it’s not my style.

One reason is shyness. I don’t want people to see my unfinished work. I’m afraid of what they’d say. Another reason is the schedule, and that’s far more of a factor for a three-day game jam than a month-long writing exercise. I don’t think I could stand to code for the better part of 48 or 72 hours. Call it flightiness or a poor attention span, but I can’t code (or write) for hours on end. I have to take a break and do something else for a while.

Finally, there are the rules themselves. I don’t like rules intruding on my creative expression. In my view, trying to direct art of any kind is a waste of time. I have my own ideas and themes, thank you very much. All I need from you is the gentle nudge to get me to put them into action. That’s why I do a kind of “shadow” NaNoWriMo, instead of participating in the “real thing”. It seems antisocial, but I feel it’s a better use of my time and effort. What’s important is the goal you set for yourself. Climbing into a straitjacket to achieve it just doesn’t appeal to me.

But I do see why others look at game jams differently. They are that nudge, that impetus that helps us overcome our writing (or coding) inertia. And that is a noble enough purpose. I doubt I’ll join the next Ludum Dare or whatever, but I won’t begrudge the existence of the game jam. It does what it needs to do: it gets people to express themselves. It gets them to write code when they otherwise wouldn’t dare. There’s nothing bad about that, even if it isn’t my thing.

Summer Reading List 2016: halfway home

We’re halfway through the official summer, about two-thirds of the way done with the unofficial season we’re using for our Summer Reading List. I don’t know about you, but I’ve got two out of three.

Fiction

  • Title: Shadows of Self
  • Author: Brandon Sanderson
  • Genre: Fantasy
  • Year: 2015

This is the fifth book in Brandon Sanderson’s Mistborn series, the second of the second trilogy. It’s a pretty good one, though I feel it’s a bit weaker than some of the previous four. Compared to its predecessor, The Alloy of Law, it’s a bit lighter on the action, but far heavier on the worldbuilding. That’s fine by me. If you haven’t noticed, I love worldbuilding, and Sanderson is one of the best there is when it comes to it. I’ll definitely give this one high marks, and I can’t wait to read the trilogy’s finale, The Bands of Mourning. (It’s already out, by the way.)

Nonfiction

  • Title: A Million Years in a Day: A Curious History of Everyday Life
  • Author: Greg Jenner
  • Genre: History
  • Year: 2016

I found this one not too long ago (somewhere…), and I’m glad I did. It’s a fun look back through the history of everyday things and activities, following and relating to one modern man’s Saturday. I love history, and I especially love those smaller, less popular bits of it. History is not all about wars and religion and politics and race. It’s about people living their lives, and those lives never really change that much. And that’s the message of this book. Definitely worth a look, especially from a worldbuilding perspective. (Funny how that works out, huh?)

And one more…

I haven’t decided what the final book on the list will be, but I’ve got another month, so I should be okay. I hope you’re playing along at home, and that you’re having fun doing it.

Conlangs as passwords

Keeping our information secure is a high priority these days. We hear a lot about “two-factor authentication”, which usually boils down to “give us your mobile number so we can sell it”, but the first line of defense for most accounts remains the humble password.

The problem, as eloquently stated by XKCD #936, is that we’ve trained ourselves to create passwords that are all but impossible to remember. And the arcane rules required by some services—banks are the worst offenders—can actually serve to make passwords less secure than they otherwise could be. There are two reasons for that. One, the rules of what’s an “acceptable” password restrict the options available to us. An eight-character password where one of those characters must be a capital letter, one must be a number, and a third must be a “special” character (but not those that might interfere with the site’s code, like the semicolon) really only gives you five characters of leeway.

The obvious solution is to make passwords even longer, but that brings into play the second problem. A password like eX24!mpR is hard to remember, and that’s only eight characters. Extend that to twelve (Ty93M@tsD14k) or sixteen (AsN3P45.tVK23hU!) and you’ve created a monster. Yes, muscle memory can help here, but the easiest way to “remember” a password like that is to write it down, which defeats the whole purpose.

The XKCD comic linked above outlines a way to solve this mess. By taking a few common English words and smashing them together, we can create passwords that are easy to remember yet hard to crack by brute force. It’s ingenious, and a few sites already claim to be “XKCD-936 compliant”.

But I had a different idea. I’ve made my own languages, and I’m still making them. What if, I thought, I could use those for passwords? So I tried it, and it works. In the last year or so, I’ve created a few of these “conlang passwords”. And here’s how I did it, and how you can use the same method.

Rather than a few unrelated words, a conlang password is a translation of a simple phrase. Usually, I try to use something closely related to the function of the site. For example, my account on GOG.com is the phrase “good old games”—the site’s original name—translated into one of my older (and unpublished) conlangs. Similarly, my start page/feed reader has a passphrase that means “first page”. My password on Voat translates as “free speech”. All very easy to guess, except for the fact that you don’t know the language. Only I do, so only I can do the necessary translation.

Going this way gives you a couple of extra benefits. Case is up to you, so you could use a phrase in title case for those sites which require a capital letter. Or you can use a language like Klingon, with capital letters already in the orthography. Special characters work about the same way; add them if you need to, but in a more natural way than the line-noise style we’re used to. And since our password is a full phrase, it’s likely going to be near the upper end of the length range, making brute-forcing an impossible task. If it’s allowed, you can even add proper spacing between words, further lengthening the password and frustrating hackers. Also, if the site requires a “security question” (a misnomer if I’ve ever heard one), and it lets you use a custom one, then you never have to worry about forgetting the password, as long as you remember the language.

There are, of course, downsides to this method. Numbers are…difficult; the best option I’ve found for places that make you put one in is a kind of checksum. At the end of the password, simply put the number of letters you used. As an example, let’s say we want to use our example conlang Isian to make a password at Amazon.com. (By the way, that’s a bad idea, as information on Isian is open to all, even if no one’s really looking.) In my opinion, a good phrase to describe Amazon is “they sell everything”. In Isian, that translates to is dule lichacal. Thus, our password could be something like IsDuleLichacal. Fourteen characters, three of them capital letters. And we can take on a 14 at the end to up the strength a little more, or satisfy overly strict systems. As long as you’re consistent, memorization is less of a problem. And you don’t need to write down the password itself; just the key phrase is enough.

Now, not every language works for this. For very good reasons, passwords using Unicode characters are not recommended, even in those rare cases where they’re supported. The ideal conlang for password use is something more like Isian: no diacritics, no funky letters like ə, just basic ASCII. Letters, numbers, and a few symbols—in other words, the same set of characters that passwords can use.

The best conlangs are probably the most English-like in style. Somewhat isolating, but not too much. Relatively short words. A reasonably uncomplicated grammar, so you don’t have to sort through all the rules. Oh, and you’ll definitely need a sizable vocabulary to cover all the concepts you might want to use in your passwords. Just a grammar sketch and the Swadesh List won’t cut it.

Not everybody will want to go through the effort needed for this scheme. But, if you’ve got an extra little conlang around, one you’re not using for anything else, you might want to give it a shot. It can hardly be less secure than the sticky note on your monitor, right?

Software internals: Trees

We’ve talked quite a bit about data structures in this series, although not as much in recent entries. We’ve seen lists and arrays, two of the most important structures around, but they both have a severe limitation: they’re linear. While that’s great for a lot of data that computer programs manipulate, there comes a time when you need to represent both the data and its organization, and that calls for something more substantial.

The tree is a data structure that represents a hierarchy. Like a real tree, it has a root. It has branches and leaves. And a collection of data trees is also called a “forest”. But those similarities are a bit artificial; the terms were chosen for the analogy. What these trees do isn’t much like the natural ones. To a programmer, however, they’re far more useful.

In the abstract

A tree starts with a root. This is a data node, like one in a linked list, and it can hold one value of any type. If you’ll recall, nodes in linked lists also carry along a pointer to the following node (and the preceding one, for a doubly-linked list). Well, trees do something like that, too. The root node, in addition to its data value, also contains a list of pointers to its children.

These children are root nodes, and thus trees in their own right. Each will have its own data and list of children, and those, in turn, will have the same. If the child list is empty, then that node is at the “end” of one branch; it’s a leaf node. (If there are children, you might be wondering, are there parents? Yes. A node having another node in its child list is that second node’s parent. But data trees are asexual: each child only has one parent.)

Trees are a perfect example of recursion. A tree’s children are also trees—empty ones, if they’re leaf nodes—and they can be treated as such. The most used algorithms lend themselves naturally to a recursive style even in imperative languages. So that’s how we’ll be looking at them here.

Growing a tree

Trees consist of nodes, and each node has two main parts: a value and a list of children. Thus, a simple way of representing a tree is as an object (or similar structure) with two fields, like so. (This example uses TypeScript for illustration purposes.)

class TreeNode {
    value: any;
    children: TreeNode[];
}

Not that hard, huh? It’s nothing more than putting our words into code. A TreeNode‘s value could be anything, though you’d probably restrict it in a real application, and its children are more TreeNodes. Adding, removing, and altering children works in almost the same way as with linked lists, at least with this “basic” tree setup.

Traversing the tree (sometimes called walking it) is a different matter. Here, we want to go through each of the root’s children, then their children, and so on, until we’ve visited every node in the tree. There are two main ways to do this, but the depth-first method is more common and easier to explain. In code, it looks about like this:

function traverse(tree, callback) {
    for (var child of tree.children) {
        traverse(child, callback);
        callback(child);
    }
}

This algorithm is “depth-first” because it visits each of a node’s children before moving to its siblings. This is where the recursion comes into play. We loop through each of the children of the root node. They, in turn, become the roots of a new depth-first traversal. Only when we’ve exhausted the “depth” of the tree—when we reach a leaf node—do we start doing anything. (We’re assuming here that our tree doesn’t have any cycles, child nodes that point to higher levels of the tree. Those make things much more difficult.)

Now, there are a lot of things you can do with this function, but I’ve limited it to a simple, nebulous callback that is run for each node. In the abstract, that’s all there is to it. That simple block of code effectively describes the operation of some code generators, AI systems, and many other complex actions.

Special kinds of trees

The general tree described above suffices for a great many applications. Sometimes, though, it’s more efficient to be more restricted. Computer scientists have therefore developed a number of specializations of the tree for different uses.

The binary tree is probably the most well-known of these. It’s called “binary” because each node has exactly two children: left and right. These children are themselves trees, of course, but empty trees are often represented as null values. Here’s what it looks like in code, once again using TypeScript:

class BinaryTree {
    data: any;
    left: BinaryTree;
    right: BinaryTree;
}

Binary trees add an extra wrinkle to the depth-first traversal we say earlier. Now, we have three possible ways to do things: pre-order, in-order, and post-order traversal. The only thing that changes is when we process the “current” node.

function preOrder(node, callback) {
    if (node == null) return;

    callback(node.data);
    preOrder(node.left, callback);
    preOrder(node.right, callback);
}

function inOrder(node, callback) {
    if (node == null) return;

    inOrder(node.left, callback);
    callback(node.data);
    inOrder(node.right, callback);
}

function postOrder(node, callback) {
    if (node == null) return;

    postOrder(node.left, callback);
    postOrder(node.right, callback);
    callback(node.data);
}

Each approach has its pros and cons. Pre-order works for copying trees, and it’s essentially how expressions are represented in abstract syntax trees or ASTs, an important part of compilers and interpreters. In-order traversal is best for sorted trees, where an extra “key” value is added to each node, arranged so that it’s greater than all the keys in its left subtree but smaller than all those in the right; these binary search trees are used for set structures, lookup tables, and the like. (Maybe we’ll look at them in more detail later in the series.) Finally, post-order is the option of choice for deleting a node’s children and for making RPN expressions.

The binary tree also has a few of its own alterations. The red-black tree uses a “color bit” to keep the tree balanced. This is helpful because most algorithms working on binary trees work best when there the total numbers of left and right nodes are about equal. (In the worst case, all of a tree’s valid descendants are on one “side”. Then, you’re left with just a more expensive linked list.) The AVL tree is a similar variation that has its own pluses and minuses.

B-trees are a special case that sits between binary trees and the general category of trees. They’re allowed to have more than two children, but only up to a certain number. The trick is that each node is given a number of sort keys (like the one in a binary search tree), one for every child past the first. These are calculated so they fit “between” the children, allowing them to serve as index values, speeding up traversal. B-trees are mostly used in databases and filesystems—the two are practically the same thing these days—including the Windows filesystem of choice, NTFS, Apple’s HFS+, and Ext4 and Btrfs on Linux.

Seeing the forest

Trees are everywhere in programming. Whenever you have a set of data with obvious parent-child relationships, it’s probably going to be represented in tree form. Programming languages often don’t offer direct access to the trees, but they’re in there. Sets, maps, and hash tables all use them under the hood. Any kind of parsing, such as XML or a game’s custom scripting language, is going to be tree-based. Even if you never see them, you’ll be using them somewhere.

Fortunately, if you’re not dealing with the innards of the structures themselves, there’s not much you need to worry about. All the hard work has been taken care of. No one needs to manually balance a binary search tree, for instance. But if you work with them directly, it helps to know how they work.