The future of auxlangs

Auxlangs are auxiliary languages: conlangs specifically created to be a medium of communication, rather than for artistic purposes. In other words, auxlangs are made to be used. And two auxlangs have become relatively popular in the world. Esperanto is actually spoken by a couple million people, and it has, at times, been considered a possibility for adoption by large groups of people. Lojban, though constructed on different principles, is likewise an example of an auxlang being used to communicate.

The promise of auxlangs, of course, is the end of mistranslation. Different languages have different meanings, different grammars, different ways of looking at the world. That results in some pretty awful failures to communicate; a quick Internet search should net you hundreds of “translation fails”. But if we had a language designed to be a go-between for speakers of, say, English and Spanish, then things wouldn’t be so bad, right?

That’s the idea, anyway. Esperanto, despite its numerous flaws, does accomplish this to a degree. Lojban is…less useful for speaking, but it has a few benefits that we’ll call “philosophical”. And plenty of conlangers think they will make the one true international auxiliary language.

So let’s fast-forward a few centuries. Esperanto was invented on the very edge of living memory, as we know, and Lojban is even younger than that, but Rome wasn’t built in a day. Once auxlangs have a bit of history behind them, will any of them achieve that Holy Grail?

The obvious contender

They’d have to get past English, first. Right now, the one thing holding back auxlang adoption is English. Sure, less than a quarter of the world’s population speaks it, but it’s the language for global communication right now. Nothing in the near future looks likely to take its place, but let’s look at the next best options.

Chinese, particularly Mandarin, may have a slight edge in sheer numbers, but it’s, well, Chinese. It’s spoken by Chinese, written by Chinese, and it’s almost completely confined to China. Sure, Japan, Korea, and much of Southeast Asia took parts of its writing system and borrowed tons of words, but that was a thousand years ago. Today, Chinese is for China. No matter how many manufacturing jobs move there (and they’re starting to leave), it won’t be the world language. That’s not to say we won’t pick a few items from it, though.

On the surface, Arabic looks like another candidate. It’s got a few hundred million speakers right now, and they’re growing. It has a serious written history, the support of multiple nations…it’s almost the perfect setup. But that’s Classical Arabic, the kind used in the Koran. Real-life “street” Arabic is a horrible mess of dialects, some mutually unintelligible. But let’s take the classical tongue. Can it gain some purchase as an auxlang?

Probably not. Again, Arabic is associated with a particular cultural “style”. It’s not only used by Muslims or even Arabs, mind you, but that’s the common belief. There’s a growing backlash against Muslims in certain parts of the world, and some groups are taking advantage of this to further fan the flames. (I write this a few hours after the Brussels bombings on March 22.) But Arabic’s problems aren’t entirely political. It’s an awful language to try to speak, at least by European standards. Chinese has tones, yes, but you can almost learn those; pharyngeal and emphatic consonants are even worse for us. Now imagine the trouble someone from Japan would have.

Okay, so the next two biggest language blocks are out. What’s left? Spanish is a common language for most of two continents, although it has its own dialect troubles. Hindi is phonologically complex, and it’s not even a majority language in its own country. Latin is dead, as much as academics hate to acknowledge that fact. Almost nothing else has the clout of English, Chinese, and Arabic. It would take a serious upheaval to make any of them the world’s lingua franca.

Outliving usefulness

It’s entirely possible that we’ll never need an international auxiliary language at all, because automatic translation becomes good enough for daily use in real-time. Some groups are making great headway on this right now, and it’s only expected to get better.

If that’s the case, auxlangs are then obsolete. There’s no other way of putting it. If computers can translate between any two languages at will, then why do you need yet another one to communicate with people? It seems likely that computing will only become more ubiquitous. Wearables look silly to me, but I’ll admit that I’m not the best judge of such things. Perhaps they’ll go mainstream within the next decade.

Whatever computers you have on your person, whether in the form of a smartphone or headgear, likely won’t be powerful enough to do the instantaneous translation needed for conversation, but it’ll be connected to the Internet (sorry, the cloud), with all the access that entails. Speech and text could both be handled by such a system, probably using cameras for the latter.

For auxlang designers, that’s very much a dystopian future. Auxiliary languages effectively become a subset of artlangs. But never fear. Not everyone will have a connection. Not everyone will have the equipment. It’ll take time for the algorithms to learn how to translate the thousands of spoken languages in the world, even if half of them are supposed to go extinct in the coming decades.

The middle road

Auxlangs, then, have a tough road ahead. They have to displace English as the world language, then hold off the other natural contenders. They need real-time translation to be a much more intractable problem than Google and Microsoft are making it out to be. But there’s a sliver of a chance.

Not all auxlangs are appropriate as an international standard of communication. Lojban is nice in a logical, even mathematical way, but it’s too complicated for most people. A truly worldwide auxlang won’t look like that. So what would it look like?

It’ll be simple, that’s for sure. Think something closer to pidgins and creoles than lambda calculus. Something like Toki Pona might be too far down the scale of simplicity, but it’s a good contrast. The optimum is probably nearer to it than to Lojban. Esperanto and other simplified Latins can work, but you need to strip out a lot of filler. Remember, everyone has to speak this, from Europeans to Inuits to Zulus to Aborigines, and everywhere in between. You can’t please everybody, but you can limit the damage.

Phonology would also tend to err on the side of simplicity. No tones, no guttural sounds half the world would need to learn, no complex consonant clusters (but English gets a pass with that one, strangely enough). The auxiliary language of the future won’t be Hawaiian, but it won’t be Georgian, either. Again, on the lower side of medium seems to be the sweet spot.

The vocabulary for a hypothetical world language will be the biggest problem. There’s no way around it that I can see, short of doing some serious linguistic analysis or using the shortcut of “take the same term in a few big languages and find the average”. Because of this, I can seriously see a world auxlang as being a pidgin English. Think a much simplified grammar, with most of the extraneous bits thrown out. Smooth out the phonology (get rid of “wh”, drop the dental fricatives, regularize the vowels, etc.) and make the whole thing either more isolating or more agglutinative—I’m not sure which works best for this. The end result is a leaner language that is easier to pick up.

Or just wait for the computers to take care of things for us.

Assembly: the architecture

The x86 gets a lot of hate for its architecture. In modern days, it’s really more out of habit than any technical reason, but the 16-bit variant really deserved some of its bad reputation. Other parts, in my opinion, were overstated. Hyperbole then became accepted wisdom, and positive reinforcement only made it grow.

But I’m not here to apologize for the x86. No, this post has a different purpose. It’s an overview of the x86 architecture from the point of view of an assembly programmer circa 1991. Although the 386 had been out for a few years, older processors were still around, and not too much was making full use of the 32-bit extensions that the 386 brought. DOS, particularly, remained 16-bit, though there were extensions to address more memory. For the most part, however, we’ll stick to “real mode”, like it was in the olden days.

The CPU

An x86 processor of this vintage was a lot less complex than today’s Intel i7 or AMD A10, and not just because it didn’t include integrated graphics, megabytes of on-die cache, power management functions, and so on. Later lines have also added lots of assembly-level goodness, like MMX and SSE. They’re 64-bit now, and that required the addition of “long mode”.

But let’s ignore all that and look at the “core” of the x86. There’s not that much to it, really. In the original conception, you have about a dozen programmer-accessible registers, all of which started out 16 bits wide, but exactly one of these is truly general-purpose. The registers can be divided into a few categories, and we’ll take each of them in turn.

General registers

These are the “big four” of the x86: AX, BX, CX, and DX. As I said last time, all four can also be accessed as a pair of byte-sized registers. The high bytes are identified by the first letter of the register name followed by H, while the low bytes use L. So we have AL or BH or whatever. It doesn’t actually increase the number of registers we have, but sometimes the savings from loading only a single byte can add up. Remember, older computers had less memory, so they had to use it more wisely.

Each of the four 16-bit general registers is used in its own special way. AX is the accumulator (like the A register from the 6502), and it’s usually the best for arithmetic; some instructions, like the multiplication instruction MUL, require it. BX is used as a “base” for a few addressing-type instructions. CX is semi-reserved as a loop counter. And DX is sometimes taken as an “extension” of AX, creating a kind of 32-bit register referred to as DX:AX.

Of course, if you’re not using the instructions that work on specific registers, you can do what you like with these. Unlike the 6502, where almost everything involved a memory access, x86 does let you work register-to-register. (On the other hand, it doesn’t have cheap access to the zero page, so there.) You can add AX to BX, for instance, and no one will care.

Pointer registers

The other four main registers all have something to do with pointers and addressing. You can use them as scratch space for arithmetic, but a lot of instructions assume they hold addresses. Unlike the general registers, all these are only 16-bit. (Modern systems do give you special access to the low byte of them, however.)

SP is the big one out of this group: the stack pointer. Stacks are a lot more important on the x86 than the 6502, mainly because that’s where you put your “extra” data that won’t fit in registers. But programmers usually don’t manipulate SP directly. They instead pop and push (note the terminology change from 6502), and those instructions change SP as needed. BP is an extra pointer register mostly used by languages like C to access stack “frames”, but assembly programmers can turn it into a general pointer.

The other two come in a pair: SI and DI. These stand for “source index” and “destination index”, respectively, and the processor uses them for certain load and store operations. Quite a few of the DOS and BIOS APIs expect them to hold pointers to input and output parameters. And on an early x86, they were the best option for indirect addressing, a bit like the 6502’s X and Y registers.

System registers

The instruction pointer, IP, controls the execution of code. It’s not directly accessible by programmers; instead, you change it through branching (jumping, in x86 parlance) and subroutine calls. In other words, you can mostly act like it’s not there.

The register that holds the flags, usually called FLAGS when it needs a name, also can’t directly be read from or written into. You can push it to the stack, however, then manipulate it from there, but the main three flags (carry, direction, and interrupt) have special instructions to set and clear them, similar to the 6502.

While the x86 has quite a few more flags than the 6502, most of them aren’t too important unless you’re delving deep into an operating system’s internals. The main ones to know about are the carry, zero, sign, direction, overflow, and interrupt flags. Most of them should be self-explanatory, while “overflow” works in a similar fashion to its 6502 counterpart. The direction flag is mostly used for string-handling instructions, which we’ll see in a later post.

One more register deserves a brief mention here. On the 286, it’s called the MSW, or “machine status word”. After that, it gets the official designation CR0. It’s used to control internal aspects of the processor, such as switching between real and protected modes or emulating a floating-point unit. I can’t think of a case where this series would use it, but now you know it’s there.

Segment registers

And then we come to the bane of many an assembly programmer, at least those of a generation or two ago: the segment registers. We’ll look at the x86 memory model in a moment; for now, just think of segments as something like overlapping banks of memory.

We’ve got four segment registers, all 16 bits wide even in 32-bit mode, for the code, data, stack, and extra segments. Their mnemonic names, conveniently enough, are initialisms: CS, DS, SS, and ES. CS points to the segment where execution is occurring; you can’t change it except with a “far” call, but you can read from it. SS holds the segment address of the stack, but you probably figured that one out already. DS is the default for reading and writing memory, while ES, as its name suggests, is for whatever you like.

Segment registers are weird. You can move values to and from them (except into CS, as I said), but you can’t operate on them. What you can do, however, is use them to “override” an address. For example, loading a value from memory uses DS as its base, but you can make it use ES instead: mov ax, [es:di] loads the value pointed to by DI, but in the ES segment.

Memory model

And that leads us to the x86 memory model. It’s a bit convoluted, since the original 8086 was designed as a 16-bit system that could address 1 MB of memory. Something had to give, but Intel took a…nonstandard approach.

Every address on the x86 has two parts: segment and offset. (This is true even on today’s processors, but 64-bit mode is hardwired to treat all segments as starting at address 0.) In real mode, as with an older x86 running DOS, an actual memory address can be obtained by shifting the segment 4 bits to the left and adding the offset. Or, to put it in code: address = (segment << 4) + offset. Each segment, then, can address a 64K block of memory in a fashion much like banking in the 8-bit world.

The difference between one segment and the next is only 16 bytes, thanks to the 4-bit shift. That means that segments will overlap. The addresses b000:8123 and b800:0173, for example, refer to the same memory location: 0xb8123. In practice, this doesn’t matter too much; the segment portion is mostly used as a base address, while the offset is, well, an offset.

In protected mode, throw those last paragraphs out. Segments instead are indexes into a lookup table, creating a virtual memory system that essentially went unused until its mercy-killing by AMD when they brought out the x86-64. (The segment-based virtual memory scheme of protected mode, interesting though it seemed, was basically an exercise in the knapsack problem.) We won’t be worrying about protected mode much, though, so let’s move on.

Back to real mode, the world of DOS and early x86s. A little bit of arithmetic shows that the segment:offset addressing method allows access to 1 MB of memory, more or less. 0000:0000 is, of course, address 0, and it’s the lowest possible value. The highest possible is ffff:ffff, and that presents a problem. Add it up, and you get 0x100fef. On old systems, this simply wrapped around the 1 MB “barrier”, to 0x00fef. (Once memory sizes expanded to multiple megabytes, it no longer overflowed, but some programs relied on that behavior, so a hardware hack was put into place. It’s called the A20 gate, and it was originally put in the keyboard controller, of all places. But I digress.)

Input/output

Also available to the x86 are the I/O ports. These are accessed using the IN and OUT instructions, a byte, word, or (in 32-bit mode) double-word at a time. They function like their own little address space, separate from main memory. The x86 architecture itself doesn’t really define which ports do what. That’s left to the PC platform—which will be the subject of the next post.

Modern operating systems also allow memory-mapped I/O access, but we’ll leave that alone for the time being. It’s far more useful when you go beyond the bounds of the 16-bit world.

Interrupts

Like the 6502, the x86 has support for interrupting the processor’s normal flow, but it goes about it in a different way. An interrupt can be caused by hardware; the old term IRQ, “interrupt request”, referred to this. But software can also directly invoke interrupts. As we saw last week, that’s how DOS implemented its API.

In real mode, the result is the same either way. The processor jumps to a specific memory location and executes a bit of code, then returns to what it was doing like nothing happened. We’ll see the details later on, but that’s the gist of it.

To be continued

So I’ve rambled on long enough for this post. Next up will be a look at the PC platform itself, at least as it stood a quarter century ago. That’ll be a look into deep history for many, but the choices made then still affect us today. Following that will be the dive into the deep end of “old-school” x86 assembly.

Magic and tech: defenses

Last time, we looked at how magic can augment a civilization’s offenses. Now, let’s turn to the other side of the coin and see what we can do about protecting ourselves against such force. It’s time to look at defense.

In the typical fantasy setting, sans magic, the common personal defense is, of course, armor. Sword-and-sorcery fiction often throws in some sort of spell-based defense, anything from walls of force to circles of protection to arrow-deflecting fields. And it’s a fairly common thing to give most potential offensive magic some sort of counterbalance. (The spell that can’t be blocked or resisted usually has a very good reason, and it’ll probably be a superweapon.) First, though, let’s look at what the mundane world has to offer.

Real-world protection

For personal protection, armor of various sorts has been around for millennia. Just about anything can be used as an armor material, as long as it does the job of preventing puncture or dissipating kinetic energy. Cloth, leather, many kinds of metal, wood, paper…you name it, somebody’s probably made armor from it. Exactly which material is used will depend on a civilization’s technological status, their geography (mo metal deposits means no metallic armor), their cultural outlook on warfare, the local climate, and many other factors. In general, though, pretty much everybody will use some armor, stories of naked Viking berserkers notwithstanding.

In the time period we’re focusing on in this series, the later Middle Ages, the best armor tended to be made of metal. But metal was relatively expensive, so not every single levied soldier is going to be running around in full plate. The best armor would be had by those with the means to procure it: nobles, knights, and the like. A well-equipped army will have better protection, naturally, while hurried musters of villagers will net you a company of men in whatever they could find, just like with weapons.

Remember that armor is designed as protection first, and most of its qualities will follow. The main type of injury it was protecting against was puncture—cutting and stabbing. Blunt trauma a very distant runner-up. We’ll take a look at medicine in a future post, but it’s helpful to think about how deadly even the smallest open wounds were back then. Without antibiotics or a working knowledge of sanitation and antiseptics, infection and sepsis were far more commonplace and far more dangerous. The best medicine was not to be wounded in the first place, and most armors show this.

Armor evolves alongside weapons. That’s why, once gunpowder spread to every battlefield in Europe, the heavier types of armor began to fall out of fashion. When fifty or more pounds of plate could no longer render you impervious to everything, why bother wearing it in the first place? (In modern times, materials science has advanced enough to create new plate that can take a shot, and now we see heavier armor coming back into vogue.)

Shields, in a sense, are nothing more than handheld armor. Some of them, depending on the culture, might have specialized defenses for a particularly common kind of attack. Others will instead use more of a weaker material, like your typical round shield made of hardwood. Again, guns tended to make most shields obsolete, at least until science could catch up. Today’s riot shields would make a 14th-century soldier salivate, but they’re based on the same old principles.

Larger-scale defenses work a different way. The usual suspects for city protection are walls, ramparts, moats, killing fields, and the like. Each one has its own purpose, its own specific target. Some of them fell by the wayside, victims of progress—how many modern cities have walls?—and some were remade to keep up. Most of them represent a significant allocation of materials and labor; bigger cities can afford that, but smaller towns might not be able to.

Magically reinforced

When the world becomes more dangerous as a result of weaponized magic, it stands to reason that new defenses will be developed to protect against such threats. One of the best ways of preventing injury, as we know, is never being hit at all. A spell to sharpen one’s senses lets a soldier react more quickly to an attack, meaning that there’s a better chance of dodging it. But that’s a waste of magical talent. Armies can comprise hundreds or even thousands of soldiers, and there’s not enough time (or enough mages) to enchant them all on the eve of battle.

Our “easy out” of stores of magical energy won’t help much here, so what can we do? Since personal defenses are, well, personal, and we’ve already said that very few people are mages, it doesn’t seem like we have a lot of options. Enchanted materials are the best bet. Armor can be fortified against breaking, making it harder to penetrate. It’s not perfect, but it’s a good start, and it will take a lot of heat off our soldiers.

It’ll also have a secondary effect, one that will come to the fore in later years. Harder, stronger materials push back the date of gunpowder-induced obsolescence by quite a while. A fortified plate across your chest won’t make you not feel a bullet, but it’ll stop that bullet from piercing your skin and hitting something vital. Like Kevlar jackets today, these would cause the impact energy to spread out, which lowers the pressure on any one spot. That’s enough to save lives, especially if the enchantment isn’t too costly. And it wouldn’t be, because it’s valuable enough to research better ways of doing it.

Fortified shields benefit in the same way, but there we get a side bonus. Shields can become stronger or they can become lighter. The second option might be a better one, if mobility is the goal.

Protecting against magical attacks is far tougher. Wards are the best way in our setting, but they have a severe downside: one ward only counters one specific type of attack. We’ve seen that magic gives us a bunch of new weapons. Warding against all of them is inconvenient at best, impossible at worst. This is a case for good espionage (another post idea!) and scouting—if you know what to expect, you’ll be able to defend against it. Still, armor can hold a few different wards, and those who can afford it will likely invest in a bit of extra protection.

On the large scale, we see the same ideas, just bigger. Wards can be made on walls, for example, and a gate can receive a fortifying enchantment. The increased size makes these ludicrously expensive, but can you put a price on the lives of your citizens? Moats, however, become practically useless, and drawbridges are little more than a degenerate case of a gate.

Picking up the pieces

Besieged settlements in our magical setting are far more perilous than anything medieval Europe knew. In pitched battles, too, the advantage will tend to go to the attacker. That isn’t too far off from what happened in our own world, from the Renaissance to the early days of the Industrial Revolution. Once gunpowder reigned supreme, defense took a back seat.

It’s the strategy and tactics that will change the most. Protracted sieges are less of a risk for the offensive side, as you can always bomb the city into oblivion. Staying in one place will only get you killed, so guerilla warfare becomes much more attractive for an outnumbered foe. It might be better for a defender to give up the city and work from the shadows as an organized resistance movement.

Magic, then, creates an asymmetry in warfare. This little bit of it gives the offense the edge. Defense needs a lot more help. Of course, it’s said that the best defense is a good offense. In our magical world, that won’t be so much a witty aphorism as a standard doctrine.

Let’s make a language – Part 14c: Derivation (Ardari)

Ardari takes a different approach for its word derivation. Instead of compounding, like Isian does, Ardari likes stacking derivational affixes. That doesn’t mean it totally lacks compounds, just that they take a bit of a back seat to affixes. Therefore, we should start with the latter.

Ardari’s three main parts of speech—noun, verb, and adjective—are mostly separate. Sure, you can use adjectives directly as nouns, and we’ve got ky to create infinitives, but there are usually insurmountable boundaries surrounding these three. The most regular and productive derivation affixes, then, are the ones that let us pass through those boundaries.

Making nouns

To make new nouns from other types of words, we’ve got a few choices:

  • -önda creates abstract nouns from verbs (luchönda “feeling”)
  • -kön makes agent nouns, like English “-er” (kwarkön “hunter”)
  • -nyn creates patient nouns from verbs, a bit like a better “-ee” (chudnyn “one who is guarded”)
  • -ymat takes an adjective and makes an abstract noun from it (agrisymat “richness”)

All of these are perfectly regular and widely used in the language. The nouns they create are, by default, neuter. -kön and -nyn, however, can be gendered: kwarköna denotes a male hunter, kwarköni a huntress.

Two other important nominal suffixes are -sö and -ölad. The first switches an abstract or mass noun to a concrete or count noun, while the second does the opposite. Thus, we have ichurisö “a time of peace”, oblasö “a drop of water”, sèdölad “childhood”, or kujdölad “kingship”. (Note that a final vowel disappears when -ölad is added.)

Ardari also has both a diminutive -imi and an augmentative -oza. These work on nouns about like you’d expect: rhasimi “puppy”, oskoza “ocean”. However, there is a bit of a sticking point. Diminutive nouns are always feminine, and augmentatives always masculine, no matter the original noun’s gender. This can cause oddities, especially with kinship terms: emönimi “little brother” is grammatically feminine!

The other main nominal derivation is po- (p- before vowels). This forms antonyms or opposites, like English “un-” or “non-“. Examples include poban “non-human” and polagri “gibberish”.

Most other derived nouns are, in fact, adjectives used as nouns, as we’ll see below.

Making adjectives

First of all, adjectives can be made by one of three class-changing suffixes:

  • -ösat makes an adjective from an abstract noun (idyazösat “warlike”)
  • -rät makes an adjective from a concrete noun (emirät “motherly”)
  • -ròs creates a “possibility” adjective from a verb (dervaròs “livable”)

Diminutives and augmentatives work as for nouns, but they take the forms -it and -ab, and they don’t alter gender, as Ardari adjectives must agree with head nouns in gender. Some examples: pòdit “oldish”, nejab “very wrong”.

We’ve already seen the general adjective negator ur- in the Babel Text. It works very similarly to English un-, except that it can be used anywhere. (The blended form u- from the Babel Text’s ulokyn is a special, nonproductive stem change.)

Most of the other adjective derivations are essentially postpositional phrases with the order reversed. Here are some of the most common:

  • nèch-, after (nèchidyaz “postwar”)
  • jögh-, before (jötulyan “pre-day”)
  • olon-, middle, centrally (olongoz “midnight”)
  • är-, above or over (ärdaböl “overland”, from dabla)
  • khow-, below or under (khowdyev “underground”)

Many of these are quickly turned into abstract nouns. For instance, olongoz is perfectly usable as a noun meaning “midnight”. Like any other adjective-turned-noun, it would be neuter: olongoze äl “at midnight”.

Making verbs

There are only two main class-changing suffixes to make verbs. We can add -ara to create a verb roughly meaning “to make X”, as khèvara “to dry”. The suffix -èlo works on nouns, and its meaning is often more nuanced. For example, pämèlo “to plant”, from pämi “plant”.

Repetition, like English “re-“, is a suffix in Ardari. For verb stems ending in a consonant, it’s -eg: prèlleg- “to relearn”. Vowel-stems instead use -vo, as in bejëvo- “to rethink”.

Ardari also has a number of prefixes that can be added for subtle connotations. The following table shows some of these, along with their English equivalents.

Prefix Meaning English Example
ej- for, in favor of pro- ejsim “to speak for”
èk- against anti- èksim “to speak against”
jès- with co- jèzgrät “to co-create”
nich- wrongly, badly mis- nichablon “to mishear”
ob- after post-/re- opsim “to reply”
sèt- before pre- sètokön “to precut”
wa- into in- wamykhes “to inquire”
zha- out of ex- zhalo “to expire”

Making compounds

Compounds aren’t as common in Ardari as they are in Isian, but they’re still around. Any noun can be combined with any other noun or adjective, with the head component coming last, as in the rest of the language.

Adjective-noun combinations are the most regular, like chelban “youth, young person”. Noun-agent is another productive combination: byzrivirdökön “bookseller”. Noun-noun compounds tend to be idiosyncratic: lagribyzri “dictionary”, from lagri “word” and byzri “book”.

Reduplicated adjectives are sometimes used for colloquial superlatives: khajkhaj “topmost”, slisli “most beautiful”.

A few words derived from nouns or verbs sit somewhere between compounds and derivational morphemes. An example is -allonda, from allèlönda “naming”. This one works a bit like English “-onomy”: palallonda “astronomy”. Another is -prèllönda, more like “-ology”: ondaprèllönda “audiology”. Finally, -benda and -bekön, from bejë-, work like “-ism” and “-ist”: potsorbekön “atheist” (po- + tsor + -bekön).

Make some words

As before, these aren’t all of the available derivations for Ardari. They’re enough to get started though, and they’re enough to accomplish our stated goal: creating lots of words!

Assembly: the precursor

Before Windows was a thing, there was DOS. Before today’s 64-bit processors, or even their 32-bit predecessors, there was the 16-bit world. In a way, 16-bit DOS is a bridge between the olden days of 8-bit microprocessors like the 6502 and the modern behemoths of computing, like the A10 in the PC I’m writing this on, or the Core i5 in the next room.

A while back, I started writing a short series of posts about assembly language. I chose the 6502, and one of the reasons why was the availability of an online assembler and emulator. Very recently, a new challenger has come into this field, Virtual x86. Even better, it comes with a ready-to-go FreeDOS image with the NASM assembler pre-installed. Perfect.

So, I’m thinking about reviving the assembly language series, moving to the (slightly) more modern architecture of 16-bit x86 and DOS. It’s a little more relevant than the 6502, as well as much more forgiving, but the fundamentals of assembly are still there: speed, size, power. And, since 16-bit code doesn’t run at all on 64-bit x86 CPUs, we don’t have to worry as much about bad habits carrying over. The use of DOS (FreeDOS, specifically) helps, too, since essentially nothing uses it these days. Thus, we can focus on the code in the abstract, rather than getting bogged down in platform-specific details, as we’d have to do if we looked at “modern” x86.

A free sample

Assembly in this old-school fashion is fairly simple, though not quite as easy as the venerable 6502. Later on, we can delve into the intricacies of addressing modes and segment overrides and whatnot. For now, we’ll look at the 16-bit, real-mode, DOS version of everyone’s first program.

org 100h

    mov dx, data
    mov ah, 09h
    int 21h
    mov ah, 04ch
    int 21h

data:
    db 'Hello, World!$'

All this little snippet does is print the classic string to the screen and then exit, but it gives you a good idea of the structure of x86 assembly using what passes for the DOS API. (Oh, by the way, I copied the code from Virtual x86’s FreeDOS image, but I don’t see any other way you could write it.)

Here are the highlights:

  • org 100h defines the program origin. For the simplest DOS programs (COM files), this is always 100h, or 0x100. COM files are limited to a single 64K segment, and the first 256 bytes are reserved for the operating system, to hold command-line arguments and things like that.

  • The 16-bit variety of x86 has an even dozen programmer-accessible registers, but only four of these are anywhere close to general-purpose. These are AX, BX, CX, and DX, and they’re all 16 bits wide. However, you can also use them as byte-sized registers. AH is the high byte of AX, AL the low byte, and so on with BH, BL, CH, CL, DH, and DL. Sometimes, that’s easier than dealing with a full word at a time.

  • mov is the general load/store instruction for x86. It’s very versatile; in fact, it’s Turing-complete. Oh, and some would say it’s backwards: the first argument is the destination, the second the source. That’s just the way x86 does things (unless you’re one of those weirdos using the GNU assembler). You get used to it.

  • int, short for “interrupt”, is a programmatic way of invoking processor interrupts. The x86 architecture allows up to 256 of these, though the first 16 are for the CPU itself, and the next are taken up by the BIOS. DOS uses a few of its own for its API. Interrupt 0x21 (21h) is the main one.

  • Since there are only 256 possible interrupts and far more useful operations, the OS needs some way of subdividing them. For DOS, that’s what AH is for. A “function code” is stored in that register to specify which API function you’re calling. The other registers hold arguments or pointers to them.

  • Function 0x09 (09h) of interrupt 0x21 writes a string to the console. The string’s address is stored in DX (with some segment trickery we’ll discuss in a later post), and it must end with a dollar sign ($). Why? I don’t know.

  • Function 0x4c (04ch) exits the program. AL can hold a return code. Like on modern operating systems, 0 is “success”, while anything else indicates failure.

  • db isn’t an actual assembly instruction. Much like in 6502-land, it defines a sequence of literal bytes. In this case, that’s a string; the assembler knows how to convert this to an ASCII sequence. (“Where’s Unicode?” you might be wondering. Remember that DOS is halfway to retirement age. Unicode wasn’t even invented before DOS was obsolete.)

Going from here

If you like, you can run the FreeDOS image on Virtual x86. It comes with both source and executable for the above snippet, and the original source file even includes compilation directions. And, of course, you can play around with everything else the site offers. Meanwhile, I’ll be working on the next step in this journey.

Creating a sport

Humans have probably played games for about as long as they’ve been human. Some of these are mental (chess, etc.), while others are mostly physical in nature. These physical games, when they become somewhat organized and competitive (two other universals in humanity), can be called sports.

This post, then, looks at what it takes to create the rudiments of a fictional sport. I’ll admit, very few stories will need such fine detail. The specifics of a sport likely won’t feature in any work of fiction, though there are examples of sports being a focus. The video game Final Fantasy X has its Blitzball, for example; it’s both a mini-game and a major part of the culture of Spira, the game’s fictional world. Similarly, the Harry Potter book/movie series has its game of Quidditch, which forms a backdrop for certain events of its story. (And that fictitious sport later received its own video game, Harry Potter: Quidditch World Cup.)

Again, let’s spell out what the post considers a sport. It has to be mainly physical, first of all. Go and chess are both classic games with long histories and intricate strategies, but they are tests of the mind, not the body, so they don’t meet our definition.

Second, sports are competitive. They pit one person or group against one or more others of relatively equal strength. The opposing forces don’t have to be present at the same time—baseball is effectively 9 against 1—but each side must have an equal opportunity to claim victory.

Third, sports have goals. This can be a literal goal, as in soccer or basketball, or a figurative one, like the highest or lowest score. Goals also imply an ending condition, such as time, score, or distance. Otherwise, you don’t have a sport.

Finally, the key factor in turning a game that meets all of the above criteria into a sport is some form of organization. This can be nothing more than a common set of rules, or it can be organized leagues with sponsorship and broadcast rights and billion-dollar contracts. Pickup games of street basketball and gym-class dodgeball fail this test, but they are simplified versions of “true” sports, so they get a pass.

Historical sports

In modern times, we’re familiar with quite a few sports. America has the familiar triumvirate of football, baseball, and basketball, all very popular. Hockey, soccer and rugby are three other big ones around the world, and the Olympics this summer will showcase dozens more. And that’s not counting track and field events, racing (whether on foot or using a vehicle), golf, cricket, and all those others we tend to overlook.

Each of these “modern” sports has a history, but all those histories, whether long (soccer dates back centuries) or short (BMX racing, now an Olympic sport, started in the 1970s) boil down to same thing. Someone, somewhere, started playing a game. More people then began playing. With more players, rules evolved. As the game grew in popularity, it became more fixed in its form, and thus a sport was born.

But sports don’t remain fixed forever. Different rule sets can emerge, and those can give rise to new sports. Rugby split off from soccer when players decided they wanted to pick up the ball and run with it. (Later on, Americans decided they liked a turn-based version better: football.) Cricket never caught on much in the US, but rounders, a simplified version played in English schoolyards, did; after a lot of tweaking, it developed into baseball. The list of “derivative” sports goes on: street hockey, beach soccer and volleyball, Australian rules football…

Nor do sports ever truly die. The Mesoamerican civilizations (Aztec, Maya, Olmec, etc.) have become famous in recent years for the archaeological evidence for their ball game, which dates back as far as 1600 BC. Despite all that has happened since then, a descendant of the Aztec game, now known as ulama, is still played in parts of Mexico. Over in the Old World, the Greek fighting sport pankration, a staple of the classical Olympics that was dropped when they were modernized, has been modified, organized, and subsumed into mixed martial arts.

Birth of a sport

Every culture has its sports. Sometimes, they’re inextricably linked. Few play cricket outside of Britain and its former colonies. Racing on an oval, as in NASCAR, is quintessentially American. Others gain more widespread appeal. Soccer—whatever you want to call it—is a worldwide game. Baseball bas become popular throughout the Americas and Asia. And so on.

Most sports will come about because of a culture. They’ll be part of it, at the start. Sometimes, they’re related to warfare, possibly as training (running, javelin throwing) or as a war “proxy” (the Mesoamerican ball games, maybe). Alternatively, they can be childhood games that “grew up”.

Which sports a culture plays can depend on its outlook on life, its technological advancement, and plenty of other factors. Technology’s role, of course, is easy to understand. After all, you can’t race automobiles until they’re invented. In the same vein, European games before the 1500s didn’t use rubber balls, because they didn’t have them; they tended to use wrapped animal bladders or things like that.

The level of organization is also dependent on these factors. Video replays obviously require video, but that’s an easy one. Precise timing is also necessary for many sports, but it took a long time to master. And from a cultural perspective, it’s not hard to imagine that a more egalitarian society might focus on loosely defined individual competitions rather than team games, while a martial civilization may see rigorously regulated team sports as a perfect metaphor for squad-level battles.

Taking steps

So let’s think about what it takes to make a sport. Looking back at the introduction, we see that we need an organized, competitive, and physical endeavor with well-defined goals. That’s a pretty good start. Let’s break it down in a different way, though, by asking some basic questions.

  1. Who’s playing? Options include one-on-one, like martial arts; one against the “field”, like racing and golf; or team-against-team, as in baseball or football. Anything other than a contest between opposing individuals also requires a total count of players. For “serious” team sports, you can also work out rules for substitutions and things like that.

  2. Where are they playing? Indoors or outdoors is the natural first approximation. But you’ll also want to know the size and shape of the playing area. This is usually the same for every event, but not always. Baseball fields have a bit of variation in the size of the outfield, and the racetrack at Daytona is almost five times as long as the one in Bristol.

  3. What do they need to play? In other words, what equipment does the sport require? Balls are very common, though their composition (rubber, bladder, wood, etc.) can vary. Sticks show up in quite a few sports: baseball, hockey, and cricket are just three. Nets, posts, racquets—the possibilities are virtually endless. That’s not even counting vehicles or, as in polo, animals.

  4. What are they trying to do? “Get the ball in the goal” is one possible objective. “Reach a certain point before X” is another. Those two, in fact, cover most sports Americans recognize as such. Add in “Don’t let the ball touch the ground”, and you’re pretty much set. You can also substitute “puck” or whatever for “ball”, if your sport uses one of those instead. Note that this is the main objective, not the entirety of the rules.

  5. What is and is not allowed? These are the finer rules of the game. They’re the bulk of the gameplay, but a fictitious story is allowed to gloss over them when they’re not pivotal to the action. You have to be consistent, though, but from a storytelling perspective. A sport’s rules don’t necessarily have to make sense. Football’s “catch rule”, the definition of “charging” in basketball, and the whole sport of cricket are evidence of this.

  6. Who wins, and how? This is the victory condition. Some games are time-based, where they end after a certain period has elapsed. Others, such as baseball or tennis, finish after a set number of turns or scores. Sports where score is kept will generally be won by the side with the most scores; golf, though, is a counterexample. Races, of course, go to the one who finishes first, and a few sports (gymnastics and figure skating, for instance, but also boxing) are scored by judges.

There are quite a few other details you can add, like what happens after an event, whether there is enough organization for leagues and championships, etc. The level of detail is important here, though: don’t get lost in impertinent trivia. It’s fun, but you probably don’t need it for the story.

In those stories where it’s warranted, on the other hand, an invented sport can add flavor to a culture. It’s a good illustration that we’re looking at a different set of people. This is what they think is fun. Sure, many cultures will have similarities in their sports. Soccer could plausibly be created just about anywhere, at almost any time. Many of the martial events at the original Olympics came about from soldierly pursuits, and everybody has soldiers. But it’s the differences that we notice the most.

With fantasy, there’s also the potential for new sports that are beyond our capability. Anything involving magic fits this bill; our two fantastic examples above are both physically impossible for ordinary humans. But fantasy worlds might be more amenable to bizarre sports. The same is true in futuristic science fiction. We can’t play games in zero-G today, but that doesn’t mean people on 24th century starships can’t. As with everything in worldbuilding, the only limits are in your mind.

Let’s make a language – Part 14b: Derivation (Isian)

Both of our conlangs have a wide variety of ways to construct new words without having to resort to full-on coinages. We’ll start with Isian, as always, since it tends to be the simpler of the two.

Isian compounds

Isian is a bit more like German or Swedish than English, in that it prefers compounds of whole words rather than tacking on bound affixes. That’s not to say the language doesn’t have a sizable collection of those, but they’re more situational. Compounding is the preferred way of making new terms.

Isian compounds are mostly head-final, and the most common by far are combinations of two or more nouns:

  • hu “dog” + talar “house” → hutalar “doghouse”
  • acros “war” + sam “man” → acrosam “soldier” (“war-man”)
  • tor “land” + domo “lord” → tordomo “landlord”

Note that acrosam shows a loss of one s. This is a common occurrence in Isian compounds. Anytime two of the same letter would meet, they merge into one. (In writing, they might remain separate.) Two sounds that “can’t” go together are instead linked by -r- or -e-, whichever fits better.

Adjectives can combine with nouns, too. The noun always goes last. Only the stress patterns and the occasional added or deleted sound tell you that you’re dealing with a compound rather than a noun phrase:

  • sush “blue” + firin “bird” → sufirin “bluebird”
  • bid “white” + ficha “river” → bificha “rapids” (“white river”)

In the latter example, which shows elision, the noun phrase “a white river” would be ta bid ficha, with bid receiving its own stress. The compound “some rapids” is instead ta bificha, with only one stress.

Most verbs can’t combine directly with anything else; they have to be changed to adjectives first. A few “dynamic” verbs, however, can be derived from wasa “to go” plus another verb. An example might be wasotasi “to grab”, from otasi “to hold”.

Changing class

Isian does have ways of deriving, say, a noun from an adjective. The language has a total of eight of these class-changing morphemes that are fairly regular and productive. All of them are suffixes, and the table below shows their meaning, an example, and their closest English equivalent.

Suffix Function English Example
-do State adjective from verb -ly ligado “lovely”
-(t)e Verb from noun -fy safe “to snow”
-el Adjective from noun -y, -al lakhel “royal”
-m Agent noun from verb -er ostanim “hunter”
-mer Adjective from verb -able cheremer “visible”
-nas Abstract noun from verb -ance gonas “speech”
-(r)os Noun from adjective -ness yaliros “happiness”
-(a)ti Verb from adjective en- haykati “to anger”

For the most part, these can’t be combined. Instead, compounds are formed. As an example, “visibility” can be translated as cheremered “visible one”, compounding cheremer with the generic pronoun ed.

-do is very commonly used to make compounds of verbs (in the form of gerund-like adjectives) and nouns. An example might be sipedototac “woodcutting”, from which we could also derive sipedototakem “woodcutter”.

More derivation

The other productive derivational affixes don’t change a word’s part of speech, but slightly alter some other aspect. While the class-changers are all suffixes, this small set contains suffixes, prefixes, and even a couple of circumfixes. (We already met one of those in the Babel Text, as you’ll recall.)

  • -chi and -go are diminutive and augmentative suffixes for nouns. Most nouns can take these, although the meanings are often idiosyncratic. For example, jedechi, from jed “boy”, means “little boy”, and secago “greatsword” derives from seca “sword”.

  • -cat, as we saw in the Babel Text, turns a noun into a “mass” noun, one that represents a material or some other uncountable. One instance there was gadocat “brick”, meaning the material of brick, not an individual block.

  • a-an was also in the Babel Text. It’s a circumfix: the a- part is a prefix, the -an a suffix. Thus, we can make ayalian “unhappy” from yali “happy”.

  • Two other productive circumfixes are i-se and o-ca, the diminutive and augmentative for adjectives, respectively. With these, we can make triplets like hul “cold”, ihulse “cool”, and ohulca “frigid”.

  • The prefix et- works almost exactly like English re-, except that you can put it on just about any verb: roco “to write”, eteroco or etroco “to rewrite”.

  • ha-, another verbal prefix, makes “inverse” forms of verbs. For example, hachere might mean “to not see” or “to miss”. It’s different from the modal adverb an.

  • mo- is similar in meaning, but it’s a “reverse”: mochere “to unsee”.

That’s not all

Isian has a few other derivation affixes, but they’re mostly “legacy”. They aren’t productive, and some of them are quite irregular. We’ll meet them as we go on, though. For now, it’s time to switch to Ardari.

Software internals: Lists

Up to now, we’ve been looking at the simplest of data structures. Arrays are perfectly linear. Strings are mostly so, until you add in the complications of modern character sets. Objects need a little bit more help, but they can often be reduced to arrays of data. Not so this time around.

Lists are incredibly important to a lot of programming work. In today’s popular languages, they’ve essentially taken over the fundamental position once occupied by arrays. In Python, Ruby, and so many other languages, it’s the list that is the main sequence. In Lisp and Clojure, it’s the universal one, in the same way that NAND is the universal gate in electronics: everything else can be built from it.

But why are lists so useful? Unlike arrays, they lend themselves to dynamic allocation. Elements packed into an array don’t have anywhere to go. If you want to resize the structure, you often have to move everything around. In C++, for example, a vector (essentially a dynamically-sized array) can have elements added and removed, but that can trigger a full resizing, which can move the vector to a different address in memory, thus rendering all your pointers and iterators useless. Higher-level languages don’t have these problems with arrays, but lists never have them in the first place. They’re made to be changed.

Linked lists

The most common method for making a list is the linked list. At its core, a linked list is nothing but a sequence of nodes. A node is typically defined as a structure with a data member (one piece of the data you’re storing in the list) and a link member. This will usually be a pointer to the next node in the list, or, for the last item, a null pointer. In code, it might look like this:

// It is "item type"
struct Node<It>
{
    It data;
    Node<It>* next;
};

A list, then, could be represented as nothing more than a pointer to the first Node. By “walking” through each next pointer, a function can visit each node in order. And that’s all there really is to it.

Working with this kind of linked list isn’t too hard. Finding its length, a common operation in code, doesn’t take much:

// This is assumed throughout, but we'll make it explicit here
using List<It> = Node<It>*;

size_t length(List<It> li)
{
    size_t counter = 0;

    while (li)
    {
        ++counter;
        li = li->next;
    }

    return counter;
}

There are plenty of optimizations we can do to improve on this (it’s O(n)), but it should illustrate the basic idea. If you prefer a functional approach, you can do that, too:

// FP version
size_t length(List<It> li)
{
    if (!li)
        return 0;
    else
        return 1 + length(li->next);
}

That one looks a lot better in a proper FP language, but I wanted to stick to a single language for this post.

Inserting new elements is just manipulating pointers, and changing their value can be done by altering the data members. In general, that’s all you need to know. Even higher level languages are largely based on this same foundation, no matter what interface their lists present to the programmer.

More links

But this singly-linked list can be a bit cumbersome. Almost every operation involves walking the list from the beginning, and there’s no real way to get to an earlier element. That very need naturally leads to the creation of the doubly-linked list.

Here, each element has two pointers: one to the next element, the other to the previous one. It’s anchored on both sides by null links, and it’s otherwise the same principle as the singly-linked version, with the only downside being a slight increase in memory use. In code, such a structure might look like this one:

struct DNode<It>
{
    It data;
    DNode<It>* prev;
    DNode<It>* next;
}

Code that doesn’t care about going “backwards” can ignore the prev pointer, meaning our length function from earlier works with doubly-linked lists, too. (We’d need to change the argument type, of course.) Now, though, we get to reap the rewards of having two links. We no longer need to worry about getting a pointer to the beginning of the list, for one; any pointer to a list element can now be used to find its start.

Doubly-linked lists are so much more useful than singly-linked ones that they’re really the default in modern times. The C++ standard library had only doubly-linked lists until 2011 brought slist, for instance. And high-level languages usually don’t give you a choice. If they use a linked list, it’ll have (at least) two links per node.

Another option

The drawback of linked lists is the time it takes to find anything in them. Most operations are going to require you to walk some, if not all, of the list. The bigger the list gets, the longer it takes to walk it. In other words, it’s O(n) time.

Different systems get around this in different ways. One possibility is to forgo the use of linked lists entirely, instead basing things around an array list. This is nothing more than a dynamically-sized array like the C++ vector or Java ArrayList, but it can be used like a list, except that it also has a random access interface.

Most of the time, an array list will have a reference to its own block of memory, enough to hold all its current elements plus a few extra. When it runs out of that space, it’ll allocate a new, larger buffer, and move everything to that. On the inside, it would look something like:

struct AList<It>
{
    size_t capacity;
    size_t max_capacity;
    It* data;
}

Array lists work a lot better in an object-oriented system, because you can use methods to simplify the interface, but there’s no reason you need them. Here, for example, is a non-OOP access method for our AList above:

It at(AList<It> li, size_t pos)
{
    if (pos >= li.capacity)
        // Out-of-bounds error...

    return li.data[pos];
}

Inserting is trickier, though, because of the possibility of reallocation:

void insert(AList<It> li, It item)
{
    if (li.capacity == li.max_capacity)
    {
        resize_array(li, li.max_capacity * 2);
    }

    data[capacity] = item;
    ++capacity;
}

Our hypothetical resize_array function would then fetch a new block of memory with double the space, copy all the elements over to it, update max_capacity, and change data to point to the new block. Not hard to do, but non-trivial, and the copy can be time-consuming with large array lists. (It runs in what’s called “amortized” O(n). If you don’t cause a reallocation, then inserts are constant-time. If you do, they’re linear, because of the copy.)

Array lists are probably used more often than linked lists as the backing for high-level languages, and they’re the go-to option even for C++, Java, and C#. That’s because you tend to do a lot of insertions into their lists. If the system can allocate a big enough buffer at the start, then inserting into an array list is no different, internally, from inserting into an empty space in a regular array. Deletions are always that easy.

But the linked list approach will come in handy later on, when we look at trees, and they also have a few key advantages. As always, it’s a balance. If you need random access to elements, array lists are better. If you’re doing a lot of inserting at the ends of the structure, and not all at once, linked lists start to become a more attractive option. And if you’re using a high-level language, you’d use whatever is available. It still helps to know what you’re getting into, though. Knowing how the different types of list work can help you plan your own code’s structure better, even if you never have the choice of which kind to use.

Worldbuilding and the level of detail

Building a world, whether a simple stage, a vast universe, or anywhere in between, is hard work. Worse, it’s work that never really ends. One author, one team of writers, one group of players—none of these could hope to create a full-featured world in anything approaching a reasonable time. We have to cut corners at every turn, just to make the whole thing possible, let alone manageable.

Where to cut those corners is the hard part. Obviously, everything pertinent to the story must be left in. But how much else do we need? Only the creator of the work can truly answer that one. Some stories may only require the most rudimentary worldbuilding. Action movies and first-person shooters, for example, don’t need much more than sets and (maybe) some character motivations. A sprawling, open-world RPG has to have a bit more effort put into it. The bigger the “scale”, the more work you’d think you need.

Level of detail

But that’s not necessarily true. In computer graphics programming, there’s a concept called mipmapping. A large texture (like the outer surface of a building) takes up quite a chunk of memory. If it’s far enough away, though, it’ll only be a few pixels on the screen. That’s wasteful—and it slows the game down—so a technique was invented where smaller versions of the texture could be loaded when an object was too far away to warrant the “full-sized” graphics. As you get closer, the object’s texture is progressively changed to better and better versions, until the game engine determines that it’s worth showing the original.

The full set of these textures, from the original possibly down to a single pixel, is called a mipmap. In some cases, it’s even possible to control how much of the mipmap is used. On lower-end machines, some games can be set to use lower-resolution textures, effectively taking off the top layer or two of the mipmap. Lower resolution means less memory usage, and less processing needed for lighting and other effects. The setting for this is usually called the level of detail.

Okay, so that’s what happens with computer graphics, but how, you might be wondering, does that apply to worldbuilding? Well, the idea of mipmapping is a perfect analogy to what we need to do to make a semi-believable world without spending ages on it. The things that are close to the story need the most detail, while people, places, and events far away can be painted in broad, low-res strokes. Then, if future parts of the story require it, those can be “swapped out” for something more detailed.

The level of detail is another “setting” we can tweak in a fictional work. Epic fantasy cries out for great worldbuilding. Superhero movies…not so much. Books have the room to delve deeper into what makes the world tick than the cramped confines of film. Even then, there’s a limit to how much you should say. You don’t need to calculate how many people would be infected by a zombie plague in a 24-hour period in your average metropolis. But a book might want to give some vague figures, whereas a movie would just show as many extras as the producers could find that day.

Branches of the tree

The level of detail, then, is more like a hard cutoff. This is how far down any particular path you’re willing to go for the story. But you certainly don’t need to go that far for everything. You have to pick and choose, and that’s where the mipmap-like idea of “the closer you are, the more detail you get” comes in.

Again, the needs of the story, the tone you’re trying to set, and the genre and medium are all going to affect your priorities. In this way, the mipmap metaphor is exactly backwards. We want to start with the least detail. Then, in those areas you know will be important, fill in a bit more. (This can happen as you’re writing, or in the planning stages, depending on how you work.)

As an example, let’s say you’re making something like a typical “space opera” type of work. You know it’s going to be set in the galaxy at large, or a sizable fraction of it. Now, our galaxy has 100 billion stars, but you’d be crazy to worry about one percent of one percent of that. Instead, think about where the story needs to be: the galactic capital, the home of the lost ancients, and so on. You might not even have to put them on a map unless you expect their true locations to become important, and you can always add those in later.

In the same vein, what about government? Well, does it matter? If politics won’t be a central focus, then just make a note to throw in the occasional mention of the Federation/Empire/Council, and that’s that. Only once you know what you want do you have to fill in the blanks. Technology? Same thing. Technobabble about FTL or wormholes or hyperspace would make for usable filler.

Of course, the danger is that you end up tying yourself in knots. Sometimes, you can’t reconcile your broad picture with the finer details. If you’re the type of writer who plans before putting words on the page, that’s not too bad; cross out your original idea, and start over. Seat-of-the-pants writers will have a tougher time of it. My advice there is to hold off as long as feasible before committing to any firm details. Handwaving, vague statements, and the unreliable narrator can all help here, although those can make the story seem wishy-washy. It might be best to steer the story around such obstacles instead.

The basic idea does need you to think a little bit beforehand. You have to know where your story is going to know how much detail is enough. Focus on the wrong places, and you waste effort and valuable time. Set the “level of detail” dial too low, and you might end up with a shallow story in a shallower world. In graphics, mipmaps can often be created automatically. As writers, we don’t have that luxury. We have to make all those layers ourselves. Nobody ever said building a world was easy.

Let’s make a language – Part 14a: Derivation (Intro)

By this point in the series, we’ve made quite a few words, but a “real” language has far more. English, for instance, is variously quoted as having anywhere from 100,000 to over a million different words. How do they do it? Up to now, we’ve been creating words in our conlangs in a pretty direct manner. Here’s a concept, so there’s a word, and then it’s on to the next. But that only takes you a very short way into a vocabulary. What we need is a faster method.

Our words so far (with a few exceptions) have been roots. These are the basic stock of a language’s lexicon, but not its entirety. Most languages can take those roots and construct from them a multitude of new, related words. This process is called derivation, and it might be seen as one of the most powerful weapons in the conlanger’s arsenal.

How to build a word

Derivation is different from inflection. Where inflection is the way we make roots into grammatically correct words, derivation is more concerned with making roots into bigger roots. These can then be inflected like any other, but that’s for after they’re derived.

The processes of derivation and inflection, however, work in similar ways. We’ve got quite a few choices for ways to build words. Here are some of the most common, with English examples where possible.

  • Prefixes: morphemes added to the beginning of a root; “un-” or “anti-“.
  • Suffixes: morphemes added to the end of a root; “-ize” and “-ly”.
  • Compounding: putting two or more roots together to make a new one; “football” or “cellphone”.
  • Reduplication: repeating part or all of a root; “no-no”, “chit-chat”.
  • Stress: changing the stress of a root; noun “permit” and verb “permit“.

Stem changes (where some part of the root itself changes) are another possibility, but these are more common as inflections in English, as in singular “mouse” versus plural “mice”. Tone can be used in derivation in languages that have it, though this seems to be a little rarer.

Also, although I only listed prefixes and suffixes above, there are a few other types of affixes that sometimes pop up in derivation. Infixes are inserted inside the root; English doesn’t do this, except in the case of expletives. Circumfixes combine prefixes and suffixes, like German’s inflectional ge-t. The only English circumfix I can think of is en-en, used to make a few verbs like “enlighten” and the humorous “embiggen”. Finally, many languages’ compounds contain a linking element. German has the ubiquitous -s-, and English has words like “speedometer”.

Derivations of any kind can be classified based on how productive they are. A productive derivation is one which can be used on many words with predictable results. Unproductive derivations might be limited to a few idiosyncratic uses. These categories aren’t fixed, though. Over time, some productive affixes can fall out of fashion, while unproductive ones become more useful due to analogy. (“Trans-” is undergoing the latter transformation—ha!—as we speak, and some are pushing for wider use of the near-forgotten “cis-“.)

Isolating languages are a special case that deserves a footnote. Since the whole point of such a language is that words are usually simple, you might wonder how they can have derivation. Sometimes, they will allow a more “traditional” derivation process, typically compounding or some sort of affix. An alternative is to create phrases with the desired meaning. These periphrastic compounds might be fixed and regular enough in form to be considered derivations, in which case they’ll follow the same rules.

What it means

So we have a lot of ways to build new words (or phrases, for the isolating fans out there) out of smaller parts. That’s great, but now we need those parts. For compounds, it’s pretty easy, so we’ll start with those.

Compounding is the art of taking two smaller words and creating a larger one from them. (And it is indeed an art; look at German if you don’t believe me.) This new word is somehow related to its parts, but how depends a lot on the language. It can be nothing more than the sum of its parts, as in “input-output”. Or the compound may express a subset of one part, like “cellphone”.

Which words can be compounded also changes from language to language. Putting two nouns together (“railroad”) is very common; which one goes first depends, and it’s not always as simple as head-first or head-final. Combinations of two verbs are rarer in Western languages, though colloquial English has phrasal compounds like “go get” and “come see”. Adjective-noun compounds are everywhere in English: “redbird”, “loudspeaker”, and so on.

Verbs and nouns can fit together, too, as they often do in English and related languages. “Breakfast” and “touchscreen” are good examples. Usually, these words combine a verb and an object into a new noun, but not always. Instrumental compounds can also be formed, where the noun is the cause or means of the action. In English, these are distinguished by being noun-verb compounds: “finger-pointing”, “screen-looking”. They start out as gerunds (hence the -ing), but its trivially easy to turn them into verbs.

Really, any words can be compounded. “Livestreaming” is an adjective-verb compound. “Aboveboard” combines a preposition and a noun. The possibilities are endless, and linguistic prescription can’t stop the creative spirit. You don’t even have to use the whole word these days. “Simulcast”, “blog”, and the hideous “staycation” are all examples of “blended” compounds.

All the rest

Compounds are all made from words or, more technically, free morphemes. Most of the other derivational processes work by attaching bound morphemes to a root. Some of these are highly productive, able to make a new word out of just about anything. Others are more restricted, like the rare examples of English reduplication.

Changing class

Most derivations of this type change some part of a word’s nature, shifting it from one category to another. English, as we know, is full of these, and its collection makes a good, exhaustive list for a conlanger. We’ve got -ness (adjective to noun), -al (noun to adjective), -fy (noun to verb), -ize (adjective to verb), -able (verb to adjective), and -ly (adjective to adverb), just to name a few. Two special ones of note are -er, which changes a verb to an agent noun, and its patient counterpart -ee.

In general, a language with a heavy focus on derivation (especially agglutinative languages) will have lots of these. One for each possible pair isn’t out of the question. Sometimes, you’ll be able to stack them, as in words like “villification” (noun to verb and back to noun) or “internationalization” (noun to adjective to verb to noun!).

Changing meaning

Those derivations that don’t alter a lexical category will instead change the meaning of the root. We’ve got a lot of options here, and English seems happy to use every single one of them. But we’ll look at just a few of them here. Most, it must be said, were borrowed from Latin or Greek, starting a couple hundred years ago; these classical languages placed a much heavier emphasis on agglutination than English at the time.

Negation is common, particularly for verbs and adjectives. In English, for example, we’ve got un-, non-, in-, dis-, de-, and a-, among others. For nouns, it’s usually more of an antonym than a negation: anti-.

Diminutives show up in a lot of languages, where they indicate “smallness” or “closeness” of some sort. Spanish, for instance, has the diminutive suffix -ito (feminine form -ita). English, on the other hand, doesn’t have a good “general” diminutive. We’ve got -ish for adjectives (“largish”) and -y for some nouns (“daddy”), but nothing totally regular. By a kind of linguistic analogy, diminutives often have high, front vowels in them.

Augmentatives are the opposite: they connote greatness in size or stature. Prefixes like over-, mega-, and super- might be considered augmentatives, and they’re starting to become more productive in modern English. By the same logic as above, augmentatives tend to use back, low vowels.

Most of the others are concerned with verbal aspect, noun location, and the like. In a sense, they replace adverbs or prepositions. Re-, for example, stands in for “again”, as pre- does for “before”. And then there are the outliers, mostly borrowed from classical languages. -ology and -onomy are good examples of this.

Non-English

We’ve heavily focused on English so far, and that’s for good reason: I know English, you know English, and it has a rich tradition of derivation. Other languages work their own ways. The Germanic family likes its compounding. Greek and Latin had tons of affixes you could attach to a word. Many languages of Asia, Africa, and the Pacific have very productive reduplication. Although I used English examples above, that’s no reason to slavishly follow that particular language when constructing your own.

In the next two posts, we’ll see how Isian and Ardari make new words. Each will have its own “style” of derivation, but the results will be the same: near-infinite possibilities.