The problem with emoji

Emoji are everywhere these days. Those little icons like 📱 and 😁 show up on our phones, in our browsers, even on TV. In a way, they’re great. They give us a concise way to express some fairly deep concepts. Emotions are hard to sum up in words. “I’m crying tears of joy” is so much longer than 😂, especially if you’re limited to 140 characters of text.

From the programmer’s point of view, however, emoji can rightfully be considered a pox on our house. This is for a few reasons, so let’s look at each of them in turn. In general, these are in order from the most important and problematic to the least.

  1. Emoji are Unicode characters. Yes, you can treat them as text if you’re using them, but we programmers have to make a special effort to properly support Unicode. Sure, some languages say they do it automatically, but deeper investigation shows the hollowness of such statements. Plain ASCII doesn’t even have room for all the accented letters used by the Latin alphabet, so we need Unicode, but that doesn’t mean it’s easy to work with.

  2. Emoji are on a higher plane. The Unicode character set is divided into planes. The first 65,536 code points are the Basic Multilingual Plane (BMP), running from 0x0000 to 0xFFFF. Each further plane is considered supplemental, and many emoji fall in the second plane, with code points around 0x1F000. At first glance, the only problem seems to be an additional byte required to represent each emoji, but…

  3. UCS-2 sucks. UCS-2 is the fixed-width predecessor to UTF-16. It’s obsolete precisely because it can’t handle higher planes, but we still haven’t rid ourselves of it. JavaScript, among others, essentially uses UCS-2 strings, and this is a very bad thing for emoji. They have to be encoded as a surrogate pair, using two otherwise-invalid code points in the BMP. It breaks finding the length of a string. It breaks string indexing. It even breaks simple parsing, because…

  4. Regular expressions can’t handle emoji. At least in present-day JavaScript, they can’t. And that’s the most used language on the web. It’s the front-end language of the here and now. But the JS regex works in UCS-2, which means it doesn’t understand higher-plane characters. (This is getting fixed, and there are libraries out there to help mitigate the problem, but we’re still not to the point where we can count on full support.)

  5. Emoji are hard to type. This applies mostly to desktops. Yeah, people still use those, myself included. For us, typing emoji is a complicated process. Worse, it doesn’t work everywhere. I’m on Linux, and my graphical applications are split between those using GTK+ and those using Qt. The GTK+ ones allow me to type any Unicode character by pressing Ctrl+Shift+U and then the hexadecimal code point. For example, 😂 has code point 0x1F602, so I typed Ctrl+Shift+U, then 1f602, then a space to actually insert the character. Qt-based apps, on the other hand, don’t let me do this; in an impressive display of finger-pointing, Qt, KDE, and X all put the responsibility for Unicode handling on each other.

So, yeah, emoji are a great invention for communication. But, speaking as a programmer, I can’t stand working with them. Maybe that’ll change one day. We’ll have to wait and see.

The future of government

This year of 2016 is, in the US, an election year. For weeks we’ve been mired in the political process, and we’ve had to suffer through endless debating and punditry. The end isn’t near, either. We’ve got to endure this all the way to November.

It’s impossible to not think about government right now. As a builder of worlds and settings, I’m naturally drawn to the idea of government as a concept, rather than as its concrete implementation today. Churchill is usually quoted as saying that democracy is the worst form of government, apart from all others that have been tried. We know what others have been tried: republic, monarchy, communism, theocracy, and so on. Looking at the list, maybe it’s true.

What about the future, though? We’re in the midst of a technological revolution that shows little sign of stopping, yet it seems that little of that has paid off in the political sphere. (If you look at some of the computerized voting systems in use today, you might even think we’ve regressed!) But that could be a transitional thing. In the far future, when we of humanity have moved outward, to the rest of the Solar System and beyond, what will government look like then?

Status quo

It’s easy to think that the way things are is the way they will forever be. Conservatism is a natural thing, because it’s the path of least resistance. And in the near-term, it’s the most likely outcome. Barring some major upheaval, the US will remain a federal republic, China an authoritarian, communist regime, and most of the Middle East an anarchic disaster.

There will be a few slight changes, for sure. The Commonwealth nations are always talking about dissolving the monarchy; it’s reasonable to assume that, one day, talk will beget action. The same with most of the other Western monarchies remaining. As jobs are increasingly given over to robots, socialist tendencies will only increase, as they are doing right now in Europe. Something will eventually bring stability to Iraq and Syria. (Okay, that last one is awfully far-fetched.)

But the advance of technology will open up new avenues of government. And if we do manage self-sustaining colonies beyond Earth, then “self-sustaining” may eventually become “self-governing”. A well-settled Solar System means ample opportunity for new nations to spring up, a breeding ground for new experiments in government. So what might those look like?

Direct democracy

One possibility that isn’t that hard to imagine is direct democracy. As opposed to a democratic republic—like most democracies today—a direct democracy dispenses with the elected officers. It is literally of, by, and for the people. Everybody gets to vote. On everything. (Within reason, of course.)

We can’t really do this today on anything higher than a local level, because nobody would have time for anything else! But a few special situations can arise that would make it palatable. Small colonies are the obvious place for a direct democracy; they work just like towns. A very well-connected and well-educated society could bring direct democracy to a larger populace, but likely only on a limited scale. Mundane things might be left to the elected, while serious matters are voted on by the public at large.

The chief downside to direct democracy is that it relies on the knowledge and wisdom of the masses. It requires faith in humanity, not to make the right decision, but only to make an informed one. And, as I said, it’s also too easy to overload the populace. Partisan voting seems like a major trap here, if only because choosing a party is easier than voting on each individual issue.


By 2020, a mere four years away, millions of people will have lost their jobs to robots, and it’ll only go downhill from there. A few decades out, and half the world’s population will be looking for work in the ever-fewer fields left to living humans. There are some things computers can’t do, but not everybody has the skills necessary for them.

One solution to this looming employment crunch is already being tested in parts of Europe: the universal basic income. It’s nothing more than a monthly stipend, a kind of all-encompassing unemployment/welfare check. Combine this with the possibility of technology ending the “demand economy”, and you have the makings of a true socialist state: a planned government and economy designed to create and uphold a welfare state. Most people would live on the basic income, with their needs met by government-provided facilities, while those who can have jobs are a cut above, but there’s always the chance of moving up in the world.

This one’s big flaw is human nature. We’re greedy, and we don’t really trust other people to know what’s best for us. This kind of techno-socialism doesn’t remove either need or want, but leaves it in the hands of a (hopefully) benevolent government, and it easily falls prey to a pigeonholing “everyone’s the same” mentality. For the “have-nots”, basic income is enough to provide for, well, basic needs, but not much else. The “haves” would be able to get more in the way of amenities, but the high taxes they would have to pay to provide the public services are definitely a turn-off.

AI autocrats

If you believe the AI singularity folks, advanced artificial intelligence isn’t that far away. The day it surpasses human ingenuity might even be within our lifetimes. It’s only natural to put faith in a higher power, and the AIs might become higher powers, relative to us.

There’s two ways this could go: computer-controlled utopia or tyrannical killbots. Those, however, are two sides of the same coin. Either way, its the AI in charge, not us. If artificial intelligence reaches a point where we can no longer understand it, then we won’t know what it’s thinking. At that point, it’s almost like a “direct” theocracy.

We might willingly put ourselves in such a situation, though. How alluring would it be, the idea of handing control to somebody, something else? You don’t have to worry about anything anymore, because The Computer Is Your Friend.

An AI-controlled society all but leads itself to being planned to the point of ruthless efficiency. It might even work out like an extreme version of the techno-socialism above, except that an even smaller fraction of the populace is gainfully employed.

Corporate oligarchy

Corporations already control most governments from behind the scenes. At some point in the future, they might come out of the shadows. If land rights in space are granted to private firms—under the Outer Space Treaty, they can’t be claimed by nations—then we may see a revival of the old “company town” idea. You work for the Company, you live in its houses, you buy its food, and so on. They’re in control, but you can always end up as one of the shareholders, or make your own corporation.

In practice, this form of government isn’t all that exciting. It boils down to a kind of neo-feudalism where the corporations are the lords and their employees are the serfs…with one exception. Corporations try to maximize profits. If they’re allowed to openly run the show, that will be the number one goal for everybody.

This kind of oligarchy can work, especially if you’re one of the higher-ups, but it’s not without its faults. All those people need to be employed somehow, not to mention fed, clothed, educated, and protected. The ideal corporatist system would have all those needs met by private industry, of course, but automation means there’s only so much work left to be done. Still, for a small society, it might work.

Other possibilities

The imagination can run wild here. The only limits are in the mind. But people are going to be people—unless they’re transhumans and cyborgs—and human nature is one of the strongest forces we know. Most importantly, we won’t change overnight. There will be transition periods, no matter what form of government we eventually reach. There’s even the chance that, given some sort of apocalyptic event, we’ll revert to the tried and true methods of the past. A town in the middle of a disaster will, by necessity, be authoritarian, even dictatorial. With years of peace, though, new ideas can find their footing. With time and space, they may even have their moment in the sun.

Sound changes: everything else

Not every sound change works on just consonants or just vowels. Some can transmute one into the other. Others affect entire syllables or words. A few work on a different level entirely. So, we’ll finish this series by looking at these “miscellaneous” types of evolution.


Tones have to come from somewhere. One of the ways they can appear (tonogenesis) is through the loss of consonants preceding or following a vowel. A voiced consonant, for instance, can cause the vowel after it to be spoken at a lower pitch. If those consonant go away, the change in pitch can remain: a low tone. As another example, a number of consonant elisions led to the tonal system of Chinese, along with its restrictive syllable structure.

Once a language has tone, it becomes a target for evolution. Tones can change, merge, split, and disappear, exactly as phonemes. Unstressed syllables may develop a neutral tone, which might get reanalyzed as one of the existing tones. Sequences of tones can affect each other, as well, a complex process called tone sandhi.

Like any other part of language, tone is subject to the same forces that drive all sound change, which can be summed up as human laziness. More on that later.


The term sandhi comes from Sanskrit; roughly speaking, it means “joining”. In modern linguistics, it’s a catch-all term used for any kind of sound change that crosses the boundary between morphemes. The “linking” R in some English dialects is a kind of sandhi, and so is the use of the article “an” before vowels. Romance languages show a couple more instances of the process: Spanish de eldel; Italian delladell’; the heavy use of liaison in French.

When sandhi becomes systematic, it can create new words, like Spanish del and al. These, of course, can then be changed by any other sound change. And it’s not limited to vowels. Consonants can also be affected by sandhi. The most common expression of this is anticipatory voicing across word boundaries, but other types of assimilation are equally valid.


Epenthesis is the adding of a sound, the opposite of elision. It’s another way of breaking up a cluster that violates a language’s phonological rules or aesthetic sensibilities. Some epenthesis is a kind of sandhi, like English “an”, and the diaresis discussed last week is another form. Those aren’t the only possibilities, though.

An epenthetic vowel can be inserted between two consonants, and this will usually be a neutral vowel, whatever the language considers that to be. Schwa (/ə/) is a common choice, but /e/, /a/, and /o/ also pop up. /i/ and /u/, however, are usually too strong.

Similarly, strings of vowels may be broken up by epenthetic consonants. Again, something weak and unassuming is needed, something like /r/, /n/, /l/, /h/, or /ʔ/. /w/ and /j/ can be used as glides, as we have seen, but they’ll tend to be used only when they can relate to one of the vowels.

Another option for consonant clusters is an epenthetic consonant, one that bridges the gap between the two. Greek, for example, shows a sound change /mr/ → /mbr/, as seen in words like “ambrosia”. Many speakers of English insert epenthetic consonants like this all over, without even knowing it, like the [p] in “something”. (If this became phonemic, it would be essentially the same thing that happened to Greek.)


Two syllables that are fairly close in sound may not stay together for very long. Haplology is a sound change that involves the deletion of one syllable of such a pair. It can be either one, and there’s no standard for how “close” two syllables need to be to trigger the change. English examples include the common pronunciations of “probably” and “February”, and others aren’t hard to find. (In another one of those linguistic oddities, “haplology” itself can fall victim to this, becoming “haplogy”.)

Applying the rules

Although there are plenty of other sound changes out there—again, I refer you to Index Diachronica for more—we have gathered enough over the last three posts to start looking at how to apply them to a conlang. There are plenty of programs out there that can do this for you, but it helps to know the rules. These aren’t set in stone, mind you, but you should have a good reason for breaking them. (That reason would probably lead to more conlanging, so I’m not complaining.)

First, evolutionary sound changes are regular. They’ll almost always happen when the right conditions are met. If you’ve got devoicing of final stops, as in German, then essentially every final stop is going to get devoiced. Sure, there may be exceptions, but those exceptions can be explained. Maybe those words appeared in their current forms after the sound change.

Second, remember that sound changes don’t care. This is a subset of regularity, but it bears repeating. A sound change will affect a word no matter what that word’s history. A particular evolutionary condition may be met because of an earlier sound change, but later changes won’t know that. They’ll only “see” a word ripe for alteration.

Third, sound changes operate on a lower level. They’re “below” grammar and, as such, aren’t affected by it. But this means that grammatical ambiguity can arise, as when sounds of case endings are merged or dropped. (This one happened in both English and the Romance languages.) Speakers will then need to find ways of clearing things up, leading to innovations on the grammar side of things.

Fourth, sound change stems from laziness, a desire to minimize the effort required in speaking and conveying our thoughts. Weak sounds disappear, dissimilar sounds merge, and it’s all because we, as a whole, know we can get away with it. As long as there’s enough left to get the message across, all else is simply extraneous baggage. And that’s what’s most likely to change.

Finally, evolution is unceasing. When it comes to language, the only constant is change. Even our best efforts at writing and education and language academies can’t stop sound change. There will always be differences in speech. Those will form dialects, and then those may split into new, mutually unintelligible languages.

Software internals: Classes

Whether you use object-oriented programming or not, you’ve most likely encountered the class and the object. You’re probably coding in a language that has or uses both of them. If you’re not (say you’re using C or something), you still might be using the idea of a class and an object. How they actually work depends on the language, library, or framework in question, but the basic implementation isn’t too different from one to the next.

Objects, no matter what kind you’re using, are data structures. Classes are data types. In many languages, both of these work together to make OOP possible. But you can have one without the other. JavaScript, for instance, has objects, but no classes. (ES6’s class is mere syntactic sugar over a prototype-based implementation.) Since that’s more common than the alternative (classes but no objects), we’ll look at objects first, then add in the bits that classes or their replacements provide.


Objects, from a language-neutral point of view, are made up of a number of different variables usually called fields. They’re a composite kind of data, like C the struct, except that higher-level languages have a lot more support for doing things with them. But that’s all they really are: a bunch of fields.

That already suggests one way of laying out an object in memory: allocate the fields consecutively. For a struct, you don’t need to do anything else except initialize the block of memory. If one of your fields is itself an object, then that’s okay. You can nest them. You have (rather, the compiler has) the knowledge of what goes where, so it’s not a problem.

The above option works fine in a system where objects are static, where their layouts don’t change, only the contents of their fields. Dynamic languages that let you add and remove object fields need a different approach. One that’s commonly used is a map. Basically, field names (as strings or a special, faster “symbol” type) are associated with chunks of data, and the object is nothing more than a collection of name-value pairs. Using hashes and other tricks (that we may visit in a future post), this can be very fast, though never as fast as direct memory access.


Methods are functions that just happen to have a special binding to a particular type of object. Different object systems define them in different ways, but that core is always the same. When code calls a method, that method “knows” which object it belongs to, even though it was (usually) defined generically.

Python and object-oriented C libraries like GTK+ make this explicit: every method takes an object as its first parameter. JavaScript takes a different tack, adding methods to the prototype object. Most every other case where methods exist, they’re implicitly made such by their definitions. C++, C#, and Java, for instance, simply let you define functions inside the class, and those are the methods for objects of that class. When they’re called, they receive a “hidden” parameter, a reference to the object they’re called on: this.

As functions can contain arbitrary code, we can’t exactly put them in memory with the fields. One, it kills caching, because you might have kilobytes of method code in between fields. Two, some operating systems have protection systems in place to prevent code and data from intermingling, for very good security reasons.

Instead of having the code and data together, we must separate them. But that’s fine. They don’t need to be mixed in together anyway. However methods are defined, they’ll always have some sort of connection to an object—a pointer or reference, in low-level terms—and they can use that to access its fields. Conversely, the structure of an object can contain function pointers that refer to its methods.


We don’t really start getting into object-oriented programming until we add in inheritance. Coincidentally, here’s where the internals start to become more and more complex. Simple single inheritance lets an object take on parts of a parent object. It can use the parent’s methods as if they were its own, as well as some of the fields. Multiple inheritance is the same thing, but with more than one parent; it can get quite hairy, so most common languages don’t allow it.

The methods don’t really care whether they’re operating on a base or derived class, a parent or a child. For static languages, this is because of the way objects using inheritance are laid out in memory. Broadly speaking, the parent object’s fields come first, and those are followed by the child’s fields. As you go further down the inheritance chain, this means you can always backtrack to the root object. Just by doing that, we get a few things for free. A pointer to an object is the same as a pointer to its parent, for example. (GTK+ makes this explicit: objects are structs, and a child object simply lists its parent as its first field. Standard C memory access does the rest. Problem is, you have to use pointers for this to work, otherwise you get slicing, a mistake every C++ programmer knows all too well.)

Dynamic languages don’t get this benefit, but they all but emulate it. Objects might have a hidden field pointing to the parent object, or they may just copy the parent’s map of names and values into their own as their first act. The latter means extra metadata to keep track of which fields were defined where, but the former is slower for every access of an inherited field. It’s a classic size/speed tradeoff; most languages opt for the faster, but slightly more bloated, map-mixing approach.

For multiple inheritance, well, it’s a lot harder. In dynamic languages, it’s not quite as crazy, but the order of inheritance can make a difference. As an example, take a class C that inherits from classes D and E. If both of those have a field named foo, there’s a problem. C can’t have two foos, but the different base classes might use theirs in different ways. (The only modern static language I know that allows multiple inheritance is C++, and I don’t want to try to explain the memory scheme it uses. I’ll leave that to you to find out.)


What makes object-oriented programming truly object-oriented is polymorphism. Child classes are allowed to effectively redefine the methods they inherit from their parents, customizing them, but the caller neither knows nor cares about this fact. This is used for abstraction, and it’s not immediately obvious how they do it.

Dynamic languages have a map for their methods, as they do for fields. For them, polymorphism is as easy as changing an entry in the map to refer to a different function…if that’s the way they choose to do it. Another option is only keeping track of the methods directly defined by this object, referring access to all others to the parent class, who might pass it up another level, and so on. For a language using single inheritance, this is linear, and not too bad. With multiple inheritance, method resolution becomes a tree walk, and it can get quite intensive.

Static languages can take advantage of the fixed nature of their classes and reduce polymorphism to a table of function pointers. C++ programmers know this as the v-table (or vtbl), so called because polymorphic methods in that language are prefixed with the keyword virtual; hence, “virtual table”. This table is usually kept in memory somewhere close to the rest of the object, and it will contain, at the very least, a function pointer for each polymorphic method. Those that aren’t overriding a method from a parent class don’t have to be listed, but not every language lets you make that decision.

Construction and destruction

An object’s constructor isn’t necessarily a method. That’s because it also has to do the work of allocating memory for the object, setting up any inheritance-related framing (v-tables, prototypes, whatever), and general bookkeeping. Thus, the constructor doesn’t even have to be connected to the class. It could just as easily be a factory-like function. Destructors are the same way. They aren’t specifically methods, but they’re too bound to a class to be considered free functions. They have to deallocate memory, handle resource cleanup, call parent destructors, and so on.

On the lower levels, constructors and destructors aren’t really part of an object. The object can never call them directly in most languages. (In garbage-collected languages, nobody can call a destructor directly!) Therefore, they don’t need to know where they are. The same is generally true for the C++-specific notions of copy and move constructors. The only wrinkle comes in with inheritance in the case of destructors, and then only in C++, where you can have polymorphic methods but not a polymorphic destructor; this is a bad thing, and it’s a newbie mistake.

Next up

I’ll admit that this post felt a lot smoother than the last two. Next time, we’ll look at another data structure that shows up everywhere, the list. Linked lists, doubly-linked lists, list processing, we’ll see it all. As it turns out, there’s not too much to it. Maybe those Lisp guys were onto something…

Building the pantheon

In fantasy worlds, unlike our modern, Western one, monotheism seems to be quite uncommon. Maybe it’s a way to show the “otherness” of the story, or a method of inserting larger-than-life characters into the world in a way that they can interact with the protagonist. Perhaps the intent is to illustrate a “war of ideas” in a metaphorical way. I’m sure you can think of plenty of other reasons, but they all end with the same result: a pantheon.

Now, there are two different concepts at work here. First is the “traditional” polytheism, like the Greeks, Romans, Norse, and Egyptians. In all of these cases—and others from around the world—you have a multitude of gods. They all have their own niches (Aphrodite, goddess of love, for example) and they have a body of lore surrounding them. This is the idea we’ll be exploring in this post. The other is pantheism, which you’d expect to be related to the word pantheon. It’s not; “pantheism” isn’t the belief in multiple gods, but the belief that (roughly speaking) God is everywhere and everything. From a worldbuilding perspective, that doesn’t offer too much, so we’ll stick with polytheism. We can live with the minor etymological confusion.

The pantheon

As usual, the best way to start creating something is to look at similar things that already exist. Most early cultures in history were polytheistic, and a few have left a large amount of mythology. That’s the key to polytheism: the myth. With dozens, hundreds, or even thousands of gods, stories are the way to keep them straight. Stories bring them to life, bring them into the world. They show why these gods should be worshipped…or even how.

Polytheistic gods, unlike the solitary God of monotheistic religions, are, in a very real way, superhuman. They wouldn’t be gods if they didn’t have some sort of supernatural power or ability attributed to them, although heroic humans can be, and often were, deified. (Castor, Imhotep, and Guan Yu are all examples here.) But gods of a pantheon are unlike a single God in another way: they can be flawed. Zeus is well-known as lecherous, while Hera was the personification of jealousy, and Loki would, today, be a troll in the Internet sense. A far cry from the perfect divinity of the Judeo-Christian God.

This humanization of the divine means that gods can be characters in a literary sense. They can have conflict, both with each other and with outside forces. They can walk the mortal world, interact with living people in more than just visions. But they’re still gods. They can just as easily be unseen, nothing more than the intended recipients of prayers and pleas and sacrifices. They can work behind the scenes as easily as on the stage. And if they’re never visible, then you get to pose the interesting question: did they ever truly exist?

Creating the creators

Of course, no matter how you use the pantheon, you’re going to need one. This doesn’t have to be too elaborate. A list of names will suffice, maybe with a note as to the purpose of each one. If you want to go deeper, though, you can.

One question you don’t have to answer is “how many?” The trick with polytheism is that there doesn’t have to be a set number of deities. You can have two, or twenty, or twelve hundred, and it won’t matter much. If you have a small, set number, it’ll be easier to enumerate them all, but you can always leave room for expansion.

In a way, creating a pantheon is dividing up the universe, decomposing it into its fundamental parts. The exact criteria will depend on the culture—a typical medieval fantasy people won’t have a god of computers, for instance—but a few things are near-universal. Remember that the more gods you have, the less each one has to do. With a vast array of deities, you can get into some pretty fine distinctions.

Creator gods are probably everywhere. Naturally, monotheistic faiths only have (exactly) one of these, but polytheism gives you more authorial options. Creators can be distant, aloof of their creation. Alternatively, they might prefer to be up close and personal with their masterpiece. Maybe there are multiple creators, each given a different element; one god created the land, another the sea, for example.

The creation of the world can be extremely interesting in its own right. Perhaps there was a great battle among the gods. Or the world could have been created by more primordial beings, with the gods as their children. Or maybe the world is a song given physical form by the highest of gods, while the others merely inhabit and protect it.

Local gods exist in many pantheons. These are typically small-time guys, possibly deified humans. Ancient, half-legendary rulers or wise men are good candidates. But it’s also possible that the local god is a “spirit” of a place, like the Roman genius loci. Another possibility is a more powerful god who is intimately connected with a city, such as Athena. Any way you look at it, local gods will have the center of their worship in a particular area. Their greatest shrines or temples will be there, and outsiders may not even consider them true gods.

Elemental deities make up another common type. These are your gods of fire and water and weather and the like. In larger pantheons, especially early on, these will form the bulk of the roll of divinity, if only because older cultures, lacking modern technology, had less control over the natural world. Everything that man couldn’t control, almost by definition, the gods could, so one of them would be given an elemental role. Plenty of overlap is possible here; creators can be elemental. Local gods can, too, especially if a type of weather is strongly associated with a certain place, like snow on the highest mountaintop.

Patron deities come to the fore as a polytheistic civilization develops. Eventually, they will begin to outnumber the elemental gods, Patrons can be of a craft (Vulcan and smithing), an act (Ares and war), or just about anything else. Like some theological Rule 34, if people can do something, there will be a patron for it. (We see this even in monotheism, with the Catholic patron saints.) This is a place where the fine divisions of a vast pantheon come into the spotlight. Why have a single god of agriculture, when you can have one for grain, another for fruit, and half a dozen for different kinds of trees? Patrons can be creators, too; art and fertility work well for these. (Why? Because these are both acts of creation.) Local gods, by contrast, are often patrons of those things the local place is known for.

Antagonistic gods sometimes exist. These don’t necessarily have to be evil—look at Loki—but they can be: Titans, frost giants, etc., feature in many myths. Nor is the god of death necessarily an antagonist. Still, the idea of a god or set of gods opposing the primary pantheon appears very often. Myths are stories, and stories need conflict. Someone with godly power can only be truly rivaled by another such being, and a dedicated foil is quite handy. Any of the gods can fill this role, as can any other being with power approaching godlike. (In many forms of Christianity, Satan has practically become an antagonistic god. This, combined with the elevation of saints, the hierarchy of angels, and so on, might even provide a glimpse of monotheism in the process of becoming polytheistic.)

Family matters

Once you have a sizable pool of deities, they can be related. Greece shows a nice portrait of the extreme end of this: the Olympian gods are one big, unhappy, inbred family, a very model for the European aristocracy of later centuries.

In a pantheon, gods can marry. (Whether they remain faithful, however, is another story. Or a lot of them, in the case of Zeus.) They can have children, and these will likely be gods in their own right. Some of the deities might be brothers and sisters. They may become lovers. They could even be all of these at once, since gods don’t necessarily have to play by mortal rules.

This fooling around can also extend to the inhabitants of the world. Every culture with polytheistic leanings has a story about a god (almost always a man) having relations with a mortal (nearly always a woman). Sometimes this is simply for love. Other times, it’s out of lust. In a few cases, it’s neither. The many lovers Zeus took are well-known; there are so many of them, we still haven’t run out of names for Jupiter’s moons. But everywhere you look in polytheism, gods and men are coming together.

And these unions, in mythology, often lead to children. A child with one divine parent might also become a god. Usually, there’s a tale as to why they are or aren’t fully divine. They could also be relegated to a separate rank of demigods, immortal beings with less power than the highest deities, but far more than any normal human. These might then go on to develop their own myths, like Heracles. (And don’t think this is limited to polytheism. A divine child is sort of the central figure of one of the world’s major monotheistic religions.)

The more gods a pantheon has, the more opportunity for relation. And the stories become endless. Not only that, but they can also echo the world itself. Children may follow in their parents’ footsteps, taking on similar roles, as with Aphrodite and Eros. Or they could become a blend of their two parents; the son of a sky god and a sea goddess might be the patron of the trade winds…or bringer of hurricanes. A forsaken child may become an antagonist. A city might choose to worship a demigod believed to be the offspring of a god and a local priestess or seer. The only limit is the imagination.

The story begins

Any way you slice it, polytheism has a reason for its popularity in fantasy. In real life, pantheons came about naturally, through centuries of cultural evolution. Fantasy creations didn’t. But they’re fun to think about, and they add a dimension to a world and its peoples. From a storytelling point of view, there’s not that much to be said of an omnipotent deity. But a hundred lesser beings, human in their flaws and faults, breathe a kind of life into a story’s religious backdrop.

That doesn’t mean you should go wild with the idea, though. Unless you’re writing a “mythic” story, where mortal and divine regularly intermingle, multiple gods should probably be just like one—out of the way. But they will leave their mark, everywhere from the calendar (Saturday) to place names (Athens) to any other facet of life. Any kind of religion shapes a culture. In the worlds you create, how they do it is up to you.

Sound changes: vowels

In this part of our little series, we’ll look at some of the sound changes that can affect vowels. Since there tend to be far fewer vowels than consonants in a language’s phonemic inventory, there aren’t as many places for these sound changes to go. For the same reason, however, the vowel system of a language is more prone to change, with new phonemes coming into use and old ones disappearing.


Before we begin, think of (or look at) the IPA vowel chart. It’s usually depicted as something like a trapezoid, but it’s just as easy to imagine it as a triangle with vertices at /i/, /a/, and /u/. All the other vowel sounds—/e/, /y/, /ø/, and so on—are along the sides or in the middle. This conception will make a few of the sound changes described below seem more obvious.


The process of umlaut, as found in German, is an example of a larger phenomenon used referred to as fronting. Either term is fine for amateur conlangers, because everyone will know what you mean. Whatever you call it, it’s a change that causes vowels to move towards the front of the mouth.

Most commonly, fronting occurs under the influence of an /i/ sound. (In that, it’s almost like a kind of vowel harmony, or the vowel version of assimilation.) Sometimes, the /i/ later disappears, leaving behind the affected vowel as its only trace.

The Germanic languages embraced fronting to varying degrees, and they’re the best example around. German itself, of course, has the front rounded vowels ü and ö; the diacritic is often called an umlaut for just this reason. Old English, meanwhile, back /ɑ/ was fronted to /æ/. Swedish brought its /uː/ frontward to become /ʉː/. And the list goes on.

Fronting doesn’t always happen, so the back vowels aren’t totally lost. Instead, it can become a way to add in more front vowels; overall, languages tend to have more in the front than the back. Or it can cause mergers, as [y] becomes reinterpreted as /i/. This very thing happened in Greek, for instance.

Raising and lowering

Instead of bringing a vowel to the front, raising brings it up. Usually, this moves a sound one “step” up on the vowel chart: /a/ → /e/ → /i/. Intermediate steps like /ɛ/ can come into play, as well. An example of this process happening right now is in my own dialect of US Southern English, where some vowels are raised before nasal sounds. Thus, “pin” and “pen” sound alike.

The environment usually causes raising, but it’s not any specific sound that triggers it. Nasals can, as they do for me, but raised vowels later in the word can do it, too. So can other consonants. In general, it works out to yet another form of assimilation—vowels will tend to be raised by proximity to other “high” sounds. The reason it works so well for nasals is because they’re the highest in the mouth that you can get: in the nose.

Unlike fronting, raising seems to be more “effective”. But this makes it possible for other sound changes to come into play, sweeping into the vocalic void left behind. If raising gets rid of most instances of /a/, for example, some other sound will likely change to fill that gap.

The opposite of raising, lowering, is one such way of accomplishing this. It’s the same thing as raising, but in reverse: /u/ → /o/ → /a/ is a common trend. Front vowels appear to be harder to lower, likely from the massive influence of /i/, but it’s possible to do, say, /e/ → /ɛ/.


Vowels near nasal sounds might assimilate to them, in a change called nasalization. If the change is thorough enough, it can even result in the loss of the nasal consonant, leaving only a nasal vowel. That was the case in French and Portuguese, both of which have a set of nasalized vowels.

Any of the nasal sounds work for this, from /m/ to /ɴ/, but the “big three” of /n/, /m/, and /ŋ/ are the most common in languages, in that order. They’ll be the likely suspects. If nasalization occurs, then it will probably be on those vowels that precede these sounds; vowels following nasals are less susceptible to the change. Nasals at the end of a word or right before another consonant are the best candidates for the total nasalization that results in their disappearance.

A similar change can occur with /r/-like (rhotic) sounds, but this is much less common. It is a way to get a series of rhotic vowels like those in American English, and it’s conceivable that the difference between “regular” and “rhoticized” could become phonemic.

Lengthening and shortening

Solitary vowel phonemes can, in some cases, become long vowels or diphthongs. On the other hand, it’s easy for those to revert to short vowels. (And those can be shortened further, dropping out altogether, but we’ll get to that in a moment.)

These changes are very connected to the stress pattern of a word. Stressed vowels are more likely to be lengthened or broken into diphthongs. Unstressed vowels, by contrast, get the opposite treatment: reduction and shortening. That’s not the only reason these processes can happen, but it is the primary one.

The total elision of unstressed vowels is also quite possible. This can happen between consonants (syncope), at the beginning of a word (apheresis) or at its end (apocope). All of these are historically attested, both in natural language evolution and in borrowed words. Syncope, for example, occurs in British English pronunciations of words like secretary, while apocope turns American “going” to “goin'”.

Combining and breaking

Two vowels that end up beside each other (probably because of consonant changes) can create an unstable situation. Like the case of consonant clusters, vowel clusters “want” to simplify. They can go about this in a couple of different ways.

The easiest way is for the two to combine into a diphthong or long vowel. Where this isn’t possible, one of the vowels may assimilate to the other, much like consonants. Alternatively, the two might “average out”, fusing into a sort of compromise sound, like /au/ → /o/ (or /oː/, if that’s possible in the language).

Another potential outcome is a separation into two syllables by adding a glide. For example, one form of this diaresis is /ie/ → / Once the vowel cluster is broken apart, other sound changes can then alter the new structure, potentially even re-merging the cluster.


Plenty of other vowel changes exist, but these are the most common and most defining. Next time, we’ll wrap up the series with a look at some of the sound changes that sit outside of the usual consonant/vowel dichotomy, as well as those that can affect a whole word. Also, we’ll conclude with a few rules of thumb to help you get the most out of your conlang’s evolution.

Languages I hate

Anyone who has been writing code for any length of time—anyone who isn’t limited to a single programming language—will have opinions on languages. Some are to be liked, some to be loved, and a few to be hated. Naturally, which category a specific language falls into depends on who you’re talking to. In the years I’ve been coding, I’ve seriously considered probably a dozen different languages, and I’ve glanced at half again as many. Along the way, I have seen the good and the bad. In this post, I’ll give you the bad, and why I think they belong there. (Later on, I’ll do the same for my favorite languages, of course.)


Let’s get this one out of the way first. I use Java. I know it. I’ve even made money writing something in it, which is more than I can say for any other programming language. But I will say this right now: I have never met anyone who likes Java.

The original intent of Java was a language that could run “everywhere” with minimal hassle. Also, it had to be enough like C++ to get the object-oriented goodness that was in vogue in the nineties, but without all that extraneous C crap that only led to buffer overflows. So everything is an object—except for “primitive” types like, say, integers. You don’t get to play with pointers—but you can get null-pointer exceptions. In early versions of the language, there was no way to make an algorithm that worked with any type; the solution was to cast everything to Object, the root class underlying the whole system. But then they bolted on generics, in a mockery of C++ templates. They do work, except for the niggling bit called type erasure.

And those are just some of the design decisions that make Java unbearable. There’s also the sheer verbosity of the language, a problem compounded by the tendency of new Java coders to overuse object-oriented design patterns. Factories and abstract classes have their place, but that place is not “everywhere I can put them”. Yes, that’s the fault of inexperienced programmers, but the language and its libraries (standard and 3rd-party) only reinforce the notion.

Unlike most of the other languages I hate, I have to grin and bear it with Java. It’s too widespread to ignore. Android uses it, and that’s the biggest mobile platform out there. Like it or not, Java won’t go away anytime soon. But if it’s possible, I’d rather use something like Scala.


A few years ago, Ruby was the hipster language of choice, mostly thanks to the Rails framework. Rails was my first introduction to Ruby, and it left such a bad taste in my mouth that I went searching for something better. (Haven’t found it yet, but hope springs eternal…) Every bit of Ruby I see on the Internet only makes me that much more secure in my decision.

This one is far more subjective. Ruby just looks wrong to me, and it’s hard to explain why. Most of it is the cleverness it tries to espouse. Blocks and symbols are useful things, but the syntax rubs me the wrong way. The standard library methods let you write things like 3.times, which seems like it’s trying to be cute. I find it ugly, but that might be my C-style background. And then there’s the Unicode support. Ruby had to be dragged, kicking and screaming, into the modern world of string handling, and few of the reasons why had anything to do with the language itself.

Oh, and Ruby’s pitifully slow. That’s essentially by design. If any part of the code can add methods to core types like integers and strings, optimization becomes…we’ll just say non-trivial. Add in the Global Interpreter Lock (a problem Python also has), and you don’t even get to use multithreading to get some of that speed back. No wonder every single Ruby app out there needs such massive servers for so little gain.

And even though most of the hipsters have moved on, the community doesn’t seem likely to shed the cult-like image that they brought. Ruby fans, like those of every other mildly popular language, are zealous when it comes to defending their language. Like the “true” Pythonistas and those poor, deluded fools who hold up PHP as a model of simplicity, Ruby fanboys spin their language’s weaknesses into strengths.

Java is everywhere, and that helps spread out the hate. Ruby, on the other hand, is concentrated. Fortunately, that makes it easy to ignore.


This one is like stepping into quicksand—I don’t know how far I’m going to sink, and there’s no one around to help me.

Haskell gets a lot of praise for its mathematical beauty, its almost-pure functional goodness, its concision (quicksort is only two lines!) and plenty of other things. I’ll gladly say that one Haskell application I use, Pandoc, is very good. But I would not want to develop it.

The Haskell fans will be quick to point out that I started with imperative programming, and thus I don’t understand the functional mindset. Some would even go as far as Dijkstra, saying that I could never truly appreciate the sheer beauty of the language. To them, I would say: then who can?. The vast majority of programmers didn’t start with a functional programming language (unless you count JavaScript, but it still has C-like syntax, and that’s how most people are going to learn it). A language that no one can understand is a language no one can use. Isn’t that what we’re always hearing about C++?

But Haskell’s main problem, in my opinion, is its poor fit to real-world problems. Most things that programs need to do simply don’t fit the functional mold. Sure, some parts of them do, but the whole doesn’t. Input/output, random numbers, the list goes on. Real programs have state, and functional programming abhors state. Haskell’s answer to this is monads, but the only decent description of a monad I’ve ever seen had to convert it to JavaScript to make sense!

I don’t mind functional programming in itself. I think it can be useful in some cases, but it doesn’t work everywhere. Instead of a “pure” functional language, why can’t I have one that lets me use FP when I can, but switch back to something closer to how the system works when I need it. Oh, wait…


I’ll just leave this here.

Magic and tech: power

One of the great drivers of technological innovation throughout history has been the need for power. Not military power, nor electrical, but motive power, mechanical power. Long before the Industrial Revolution transformed the way we think about power, machines were invented. Simple machines, complex machines, even some that we don’t quite understand. But every machine requires an input of force to get things started.


Today, we have electricity, obtained from a vast array of methods: solar energy, fossil fuels, nuclear fission, all the way down to wind and water. Many of our modern forms of power generation, however, are, well, modern. They rely on technology developed relatively recently. Man-made nuclear reactors didn’t—couldn’t—exist 80 years ago. Although the mechanism that makes solar panels work was worked out by Einstein, we need present-day electronics to actually use it.

Go back not all that long ago, and you miss out on a lot of ways to generate power. Solar and nuclear are less than a century old. Coal and oil and natural gas have only been used in industrial capacities for two or three times that. For a large majority of our history, power was hard to come by, and there weren’t a lot of options. Yes, earlier generations didn’t use anywhere near as much power as we do, and they didn’t use electricity at all—except maybe in Baghdad—but you can argue cause and effect all day long. Did they not use power because they didn’t have as much of it, or did they not produce as much because they didn’t need it?

However you come down on that argument, the truth is plain to see: all the way through the Renaissance, at least, there weren’t a lot of ways to produce power. You could use human or animal power, as many cultures did. It works for travel, but also for machines that require an impetus, such as millstones, potters’ wheels, pulleys, and most other things that the people of a thousand years ago would need.

Wind and water provide a better path to power, and this was figured out some two thousand years ago. Since then, the technology has only been refined. A blowing breeze or flowing stream can spin a wheel with far less human intervention than muscle power, and they’re cheaper than beasts of burden in the long run. Even the first windmills and waterwheels, built backwards by the standards of our imagination (horizontal blades for wind and undershot wheels for water), nonetheless freed up the labor of both man and beast for other, better things.

Now with magic

This triumvirate of wind, water, and muscle was enough to get us through the ages. But what can our little bit of magic add to the mix? We’ve already seen that magical stores of energy are available to our fictional culture, and they can be used to propel a wheeled vehicle. Hook them up to any other type of wheel, and they’ll do the same thing. For a relatively small price, the people of this land have a magical alternative to wind and water. That’s not to say those won’t be used; it’s more likely that the magical means will complement them.

Even this is a huge development, but let’s see if we can do anything else before we look at how it would transform society. Most magic involves manipulating natural forces, especially fire and water and air. So why not lightning? Now, that’s not to say that mages can summon thunderbolts from the sky, no more than they can call a tidal wave or shoot fireballs from their fingertips. This is more subtle.

Static electricity is pretty easy to discover. We encounter it all the time. In the winter, it’s even worse, because the air’s drier and we tend to wear thicker clothing. I know that I cringe whenever I go to open a door this time of year, and I’m sure I’m not alone. The small shocks we get don’t have a lot of energy (on the order of millijoules), but you can ask anyone who’s ever been struck by lightning or hit with the discharge from an old CRT about the potential power of static electricity.

Electric current is a bit harder to get, but that’s where the magic comes in. As of now, it’s in its early stages, but mages have begun to store an electric charge in much the same fashion that they store mechanical power. Charging is easier, for those who know the proper lightning-element spells, and some truly massive containers can be built, resembling globe-sized versions of those plasma balls that used to be all the rage. Using the current requires some way of interfacing with the containing sphere, typically by wrapping a lightly infused bit of metal around it. This, for all intents and purposes, creates an electrode.

The first uses of this magical technology were purely medical. “Shock therapy” was briefly considered a cure-all, until it was found that it didn’t really cure much of anything. A few practical uses came out of the earliest generations: an easy spark generator, handy for starting fires (if far more expensive than sticks and rocks); a way of creating better magnets than any lodestone; electroplating metals. For a decade, the fashion among mages was to find a new and exciting way of using this captured lightning.

Then somebody figured out how to make an electric motor. This was very recently in our magical society’s history—not just within living memory, but within a generation—and it’s mostly a curiosity right now. Small electric spheres can’t provide enough current to produce a significant amount of power, and the larger versions are too costly for practical use. However, that hasn’t stopped people from trying. Some very rich individuals have contracted higher mages to develop a mill powered by this new source of energy, but no one else thinks it’s a viable replacement for the motive spheres…yet.

A few mages are traveling down a different path. Instead of trying to harness the lightning they have imprisoned for mechanical power, they are investigating the possibilities of using the electrical energy directly. They’ve made some interesting discoveries in doing this, like the fact that some materials conduct electricity, while others stop it. Small mundane devices can store tiny amounts of energy and dissipate it slowly—capacitors. And, of course, our mages are learning about the intimate connection between electricity and magnetism.

In the end, our magical society can be said to have the beginnings of electrical technology, although they came about it by a different route. As of yet, they haven’t been able to do too much with it, apart from toys, scientific experiments, and a new form of lighting that aims to be better than the old oil lamp in every way. They have, in our terms, early batteries, motors, and light filaments. Once these get out of the mage’s laboratory, they will have the same effect as their Earthly equivalents had on us.

The development of magic-powered propulsion, however, is much more of a culture shock. With the storage of mechanical energy, most repetitive labor can be automated. Looms, mills, mints, forges, nearly every aspect of medieval-style living benefits from this. The need for workers (or slaves, for that matter) has decreased severely in our fictional society’s recent times. People still need to be able to feed their families, but the unskilled masses are finding new jobs.

And they won’t remain unskilled for too long. The machines have already taken over the roles once relegated to child labor, but the children have to go somewhere. Why not school? Trade schools, whether operated by guilds or skilled craftsmen, are beginning to appear in the cities, a supply coming into existence to meet the demand. And many of these trades must teach the basics of education, as well.

Power to the people

Just by giving the populace a way to move things can we transform a people. Muscle power is very limited, and it’s tiring, even with the endurance spells we’ve already said this society has. Waterwheels need specific conditions to be productive. Not everywhere is lucky enough to have the sustained winds to make that form of power practical. But magical power levels the playing field.

Historically, the increase of power with technology has had the immediate effect of giving the affected segment of the population more time to spend not working. They naturally find ways to fill those gaps. Art, hobbies, education—the same things we do in our free time. Some of those spare-time activities end up becoming full-time jobs of their own, and so the cycle continues.

But it’s a positive feedback cycle. Each time the power available to a society increases, that’s that much less work that has to be done by its people. As we know, the less time you spend doing what you have to do, the more time you get to do the things you want to do. Greater power, then, leads to a higher standard of living, even if it’s hard to see the tangible benefits.

Sound changes: consonants

Languages change all the time. Words, of course, are the most obvious illustration of this, especially when we look at slang and such. Grammar, by contrast, tends to be a bit more static, but not wholly so; English used to have noun case, but it no longer does.

The sounds of a language fall into a middle ground. New words are invented all the time, while old ones fall out of fashion, but the phonemes that make up those words take a longer time to change. This does, however, occur more often than wholesale grammatical alterations. (In fact, sound change can lead to changes in grammar, but it’s hard to see how the opposite can happen.)

This brief miniseries will detail some of the main ways sounds can change in a language. The idea is to give you, the conlanger, a new tool for making naturalistic languages. I won’t be covering everything here—I don’t have time for that, nor do you. Examples will be necessarily brief. The Index Diachronica is a massive catalog of sound changes that have occurred in real-world languages, and it’s a good resource for conlangers looking for this sort of thing.


We’ll start by looking at some of the main sound changes that can happen to consonants. Yes, some effects are equally valid for consonants and vowels, but I had to divide this up somehow.


Lenition is one of the most common sound changes. Basically, it’s a kind of “weakening” of a consonant into another. Stops can weaken into affricates or fricatives, for instance; German did this after English and its relatives broke away, hence “white” versus weiß. Another word is “father”, which shows two examples of this—compare it to Latin pater, which isn’t too far off from the ancestral form. (Interestingly, you can even say that “lenition” itself is a victim.)

Fricatives can weaken further into approximants (or even flaps or taps): one such change, of /s/ to /h/, happened early on in Greek, hence “heptagon”, using the Greek-derived root “hepta-“. Latin didn’t take this particular route, giving us “September” from Latin septem “seven”.

Approximants don’t really have anywhere to go. They’re already weak enough as it is. The only place for them to go is away, and that sometimes happens, a process called elision. Other sounds can be elided, but the approximants are the most prone to it. In English, for instance, we’ve lost /h/ (and older /x/) in a lot of places. (“im” for “him” is just the same process continuing in the present day.)

Lenition and elision tend to happen in two main places: between vowels and at the end of a word. Those aren’t the only places, however.


Assimilation is when a sound becomes more like another. This can happen with any pair of phonemes, but consonants are more susceptible, if only because they’re more likely to be adjacent.

Most assimilation involves voicing or the point of articulation. In other words, an unvoiced sound next to a voiced one is an unstable situation, as is a cluster like /kf/. Humans are lazy, it seems, and they want to talk with the least effort possible. Thus, disparate sequences of sounds like /bs/ or /mg/ tend to become more homogenized. (Good examples in English are all those Latin borrowings where ad- shows up as “al-” or “as-“, like “assimilation”.)

Obviously, there are a few ways this can play out. Either sound can be the one to change—/bs/ can end up as /ps/ or /bz/—but it tends to be the leading phoneme that gets altered. How it changes is another factor, and this depends on the language. If the two sounds are different in voicing, then that’ll likely shift first. If they’re at different parts of the vocal tract, then the one that changes will slide towards the other. Thus, /bs/ will probably come out as /ps/, while /mg/ ends up as /ŋg/.

Assimilation is also one way to get rid of consonant clusters. Some of the consonants will assimilate, then they’ll disappear. Or maybe they won’t, and they’ll create geminates, as in Italian


Anyone who’s ever heard the word “ask” pronounced as “ax” can identify metathesis, the rearranging of sounds. This can happen just about anywhere, but it often seems to occur with sound sequences that are relatively uncommon in a language, like the /sk/ cluster in English.

This one isn’t quite as systematic in English, but other languages do have regular metathesis sound changes. Spanish often swapped /l/ and /r/, for example, sometimes in different syllables. One common thread that crosses linguistic barriers involves the sonority hierarchy. A cluster like /dn/ is more likely to turn into /nd/ than the other way around.

Palatalization, etc.

Any of the “secondary” characteristics of a consonant can be changed. Consonants can be palatalized, labialized, velarized, glottalized, and so on. This usually happens because they’re next to a sound that displays one of those properties. It’s like assimilation, in a way.

Palatalization appears to be the most common of these, often affecting consonants adjacent to a front vowel. (/i/ is the likely culprit, but /e/ and /y/ work, too.) Labialization sometimes happens around back rounded vowels like /u/. Glottal stops, naturally, tend to cause glottalization, etc. Often, the affecting sound will disappear after it does its work.


Dissimliation is the opposite of assimilation: it makes sounds more different. This can occur in response to a kind of phonological confusion, but it doesn’t seem to be very common as a regular process. Words like “colonel” (pronounced as “kernel”) show dissimilation in English, and examples can be found in many other languages.

Even more…

There are a lot of possible sound changes we haven’t covered, and that’s just in the consonants! Most of the other ways consonants can evolve are much rarer, however. Fortition, for example, is the opposite of lenition, but instances of it are vastly outnumbered by those of the latter.

Vowels present yet more opportunities to change up the sound of a language, and we’ll see them next week. Then, we’ll wrap up the series by looking at all the other ways the sound of a word can change over time.

Software internals: Strings

A string, as just about any programmer knows, is a bit of text, a sequence of characters. Most languages have some built-in notion of strings, usually as a fundamental data type on par with integers. A few older programming languages, including C, don’t have a separate “string” type, but they still have strings. Even many assemblers allow you to define strings in your assembly language code, though you’re left to deal with them yourself.

The early string

At its heart, a string really isn’t much more than a bunch of characters. It’s a sequence, like an array. Indeed, that’s one way of “making” strings: stuff some characters into an array that’s big enough to hold them. Very old code often did exactly that, especially with strings whose contents were known ahead of time. And there are plenty of places in modern C code where text is read into a buffer—nothing more than an array—before it is turned into a string. (This usually leads to buffer overflows, but that’s not the point.)

Once you actually need to start working with strings, you’ll want something better. Historically, there were two main schools of thought on a “better” way of representing strings. Pascal went with a “length-prefixed” data structure, where an integer representing the number of characters in the string was followed by the contents. For example, "Hi!" as a Pascal string might be listed in memory as the hexadecimal 03 48 69 21. Of course, this necessarily limits the length of a string to 255, the highest possible value of a byte. We could make the length field 16 bits (03 00 48 69 21 on a little-endian x86 system), bringing that to 65535, but at the cost of making every string a byte longer. Today, in the era of terabyte disks and gigs of memory, that’s a fair trade; not so in older times.

But Pascal was very much intended more for education and computer science than for run-of-the-mill software development. On the other side of the fence, C took a different approach: the null-terminated string. C’s strings aren’t their own type, but an array of characters ending with a null (00) byte. Thus, our example in C becomes 48 69 21 00.

Which style of string is better is still debated today, although modern languages typically don’t use a pure form of either of them. Pascal strings have the advantage of easily finding the length (it’s right there!), while C’s strlen has to count characters. C strings also can’t have embedded null bytes, because all the standard functions will assume that the null is only at the end. On the other hand, a few algorithms are easier with null-terminated strings, they can be as long as you like, and they’re faster if you don’t need the length.

In modern times

In today’s languages, the exact format of string doesn’t matter. What you see as the programmer is the interface. Most of the time, that interface is similar to the array, except with a few added functions for comparison and the like. In something like C#, you can’t really make your own string type, nor would you want to. But it’s helpful to know just how these things are implemented, so you’ll know their strengths and weaknesses.

Since everything ultimately has to communicate with something written in C, there’s probably a conversion to a C-style string somewhere in the bowels of any language. That doesn’t mean it’s what the language works with, though. A Pascal-like data structure is perfectly usable internally, and it’s possible to use a “hybrid” approach.

Small strings are a little special, too. As computers have gotten more powerful, and their buses and registers have grown wider, there’s now the possibility that strings of a few characters can be loaded in a single memory access. Some string libraries use this to their advantage, keeping a “small” string in an internal buffer. Once the string becomes bigger than a pointer (8 bytes on a 64-bit system), putting it in dynamic memory is a better deal, space-wise. (Cache concerns can push the threshold of this “small string optimization” up a bit.)

There are also a few algorithms and optimizations that string libraries can use internally to speed things up. “Copy-on-write” means just that: a new copy of a string isn’t created until there’s a change. Otherwise, two variables can point to the same memory location. The string’s contents are the same, so why bother taking up space with exact copies? This also works for “static” strings whose text is fixed; Java, for one, is very aggressive in eliminating duplicates.


Nowadays, there’s a big problem treating strings as nothing more than an array of characters. That problem is Unicode. Of course, Unicode is a necessary evil, and it’s a whole lot better than the mess of mutually incompatible solutions for international text that we used to have. (“Used to”? Ha!) But Unicode makes string handling exponentially harder, particularly for C-style strings, because it breaks a fundamental assumption: one byte equals one character.

Since the world’s scripts together have far more than 255 characters (the most a byte can distinguish), we have to do something. So we have two options. One is a fixed-size encoding, where each character—or code point—takes the same amount of space. Basically, it’s ASCII extended to more bits per character. UTF-32 does this, at the huge expense of making every code point 4 bytes. Under this scheme, any plain ASCII string is inflated to four times its original size.

The alternative is variable-length encoding, as in UTF-8. Here, part of the “space” in the storage unit (byte for UTF-8, 2 bytes for UTF-16) is reserved to mark a “continuation”. For example, the character ë has the Unicode code point U+00EB. In UTF-8, that becomes C3 AB. The simple fact of the first byte being greater than 7F (decimal 127) marks this as a non-ASCII character, and the other bits determine how many “extended” bytes we need. In UTF-32, by contrast, ë comes out as 000000EB, twice as big.

The rules for handling Unicode strings are complex and unintuitive. Once you add in combining diacritics, the variety of spaces, and all the other esoterica, Unicode becomes far harder than you can imagine. And users of high-level, strings-are-black-boxes languages aren’t immune. JavaScript, for instance, uses UCS-2, a 16-bit fixed-width encoding. Until very recently, if you wanted to work with “high plane” characters—including emoji—you had some tough times ahead. So there’s still the possibility, in 2016, that you might need to know the internals of how strings work.