Novel Month 2015 – Day 2, continued

I told you I’d be back.

I slept really late today, and that’s probably going to be the case for the rest of this week, so I’ll likely be doing these “split” updates the next few days. I’ve been writing some more on Chapter 1, and I’m close to the 3/4 mark. Maybe I can finish it late tonight. If not, it’ll definitely be tomorrow.

Today’s word count: 1,086
Total word count: 3,873

Novel Month 2015 – Day 2

It’s after midnight, so this counts as a new day, right?

I got bored, and I’m not sleepy, so I wrote a bit more. It’s not much, just finishing the scene I was in yesterday evening. If nothing else, it’s that much less I’d have to write later. This all but finishes the first half of Chapter 1, and it gives me time to think, so I’m calling it a win.

Whatever I write after I wake up will probably go in a continuation post tonight.

Today’s word count: 950
Total word count: 2,787

Novel Month 2015 – Day 1

Let’s get this started.

The first day could’ve gone better, but I beat the average. Most of the writing was setup work, building the bridge from the first part of the overarching story. I’d say I’m about a third of the way through Chapter 1, maybe a little less. I had planned for more, but I’ll take it.

I’m using the built-in word count in Vim, so this may not reflect the actual count, but it should be close enough. If we get to the end of the month and I’m that close, then I didn’t do a good enough job.

Today’s word count: 1,837
Total word count: 1,837

At the starting gate

When this post goes up, it’ll be Halloween, even though I’m writing it a couple of days ahead of time. Tomorrow, then, will be November 1st, and that means it’s time to write a novel. Officially, this isn’t NaNoWriMo, because I’m not following their rules to the letter. But I am going by what I feel is the original spirit of the challenge.

So here’s the goal: 50,000 words or a complete novel, whichever comes first. The deadline? Midnight on the 30th. Each day, I’ll try to post a little update about my progress. This certainly won’t be some kind of live blog, though, so don’t expect up-to-the-minute results. After all, I can only write so much. Regular posts (writing stuff on Mondays, code on Wednesdays, conlangs on Fridays) will resume December 2. Until then, I’ll be in hardcore writing mode.

I already have the basic idea for the story I’ll be writing. It’s a continuation of the one I did in 2013. To be honest, I have written parts 2 and 3, along with about half of part 4, but I’ve decided to scrap that work, because I have a better understanding of the setting now, and the old parts don’t fit into it anymore. (Technically, NaNoWriMo requires an original story, and you’re not supposed to start thinking about it until October. Yet another reason why I’m not following the letter of the rules.)

Now, my sleeping schedule is a bit…odd, and my writing schedule is even worse, so I’m not going to schedule these daily updates like I have been with everything else on the site. They’ll go up when I feel I’m “done” writing for the day. That may be at 2 PM or 2 AM. There’s not much I can do about that, short of forcing myself to stay on a schedule, and…let’s just say that circumstances tend conspire against that.

If you want to play along at home, that’s great! Whether you stick to the NaNoWriMo rules or follow my lead and take it easy, just go for it. If you can’t do it, there’s always next year.

Let’s make a language – Part 8b: Pronouns (Conlangs)

We’ve gotten away with neglecting pronouns in our budding conlangs of Isian and Ardari so far, but now the time has come to fill the gap. Now, we’ll give both of them a nice set of pronouns to use, checking off all the boxes from the last theory post.

Isian

Isian will have a fair amount of complexity in its pronominal system, and it will contain more than one irregularity. In that sense, we’re making it much like the languages common in the West.

If you’ll recall, Isian doesn’t use case on its nouns, much like English. But we will have personal pronouns that change depending on their role in a sentence. Specifically, Isian has, for most of them, a subject, object, and possessive form. Here’s the full list:

Pronouns Subject Object Possessive
1st Singular em men mi
1st Plural mit mida mich
2nd Person so tas ti
3rd M. Sing. i im ey
3rd F. Sing. sha shim shi
3rd M. Pl. is sim si
3rd F. Pl. shas sham shay

In the third person, there are separate pronouns for masculine and feminine; unlike English, the plural also changes for gender. (Masculine is the default in “formal” Isian, but we’ll see a way to change that in a moment.)

We can use the subject and object pronouns in sentences anywhere a noun would go: sha fusas men “she kissed me”; em hame tas “I love you”. The possessive pronouns, however, function more like articles, and they always go at the beginning of a noun phrase: mi doyan “my brother”; ey wa talar “their big house”.

We also have a “generic” third-person pronoun, which doesn’t change for case. In the singular, it’s ed, while the plural form is des. This can be used like the English generic “you” or “one”: ed las an yoweni “you can’t enter”. In informal speech, we can also use these as genderless personal pronouns, more like English singular “they”: ed an daliga e talar “they don’t live in the house”.

Finally, we have the reflexive or intensive pronoun lan. This covers the functions of all of English’s “-self” pronouns all by, well, itself: e sam sipes lan “the man cut himself”; e esher hishis lan “the girls washed themselves”; em ocata lan “I asked myself”.

Beyond the personal pronouns, we have a couple more classes. We’ll start with Isian’s demonstratives, which come in distinct singular and plural forms. For near things, we have the singular ne and plural nes. Far things are denoted by to and tos. These four words are close in meaning and scope to English “this”, “these”, “that”, and “those”, respectively, and they can be used in much the same way, either as independent pronouns or like adjectives: nes “these”, nes jedi “these boys”.

Next are the interrogatives, or question words. Isian has two of these. For people, we use con, while things take cal. All the other possible questions (where, when, etc.) can be made from compounds or phrases based on one of these, which we’ll see in a later post, when we look at forming questions.

More relevant to today’s subject are the indefinite pronouns, which are derived from the question words. We have four pairs of these, each of them created by means of a prefix:

  • je- “some”: jecon “someone”, jecal “something”.
  • es- “any”: escon “anyone”, escal “anything”.
  • licha- “every”: lichacon “everybody”, lichacal “everything”.
  • ano- “none”: anocon “nobody” or “no one”; anocal “nothing”.

Finally, “standard” Isian (assuming a culture that has such a thing) doesn’t normally allow pronoun omission, or pro-drop. We’ve been using it so far, but that’s because we didn’t have any pronouns up to this point. Our hypothetical speakers of Isian would find it a little informal, though.

Ardari

Ardari has quite a few more pronouns than Isian, but the idea is still the same. First, let’s take a look at the personal pronouns:

Pronouns Subject Object Possessive
1st Singular my myne mynin
1st Excl. Plural nyr nyran nyri
1st Incl. Plural sinyr sinran sinri
2nd Informal sy syne synin
2nd Form. Sing. tro trone tronin
2nd Form. Pl. trowar trone tronin
3rd Masc. Sing. a anön ani
3rd Masc. Pl. ajo ajon oj
3rd Fem. Sing. ti tise tini
3rd Fem. Pl. tir ti tisin
3rd Neuter Sing. ys yse ysin
3rd Neuter Pl. ysar ysar ysoj
Impersonal mantö manetö manintö

That looks like a lot, but it’s really not too much. It’s the different distinctions that Ardari makes that can be hard to understand. The cases are largely the same as they were in the simpler conlang. It’s the left-hand column where the complexity lies.

For the first person, the singular should be obvious. But we have two plurals, labeled “exclusive” and “inclusive”. Which one to use is determined by whether you want to include the listener in the action. If you do, you use the inclusive; otherwise, you need the exclusive.

The second person again has a distinction unfamiliar to speakers of English, but this one shows up in plenty of other languages. The informal is used, surprisingly enough, in informal situations, such as among friends, and it works for singular and plural. The formal is for people you don’t know as well, when you need to show deference, or similar situations. It does change for the plural, but only if it’s the subject.

The third person shouldn’t be that hard to figure out. Remember that Ardari has masculine, feminine, and neuter. Here, we can use the neuter for the case of the unknown or of mixed gender; it doesn’t carry the same connotations of inhumanity as English “it”.

The impersonal form can be used for generic instances and cases where you’re not sure which person is right; it’s transparently derived from man “one”, with the definite article attached.

Reflexive pronouns can be made by adding the regular suffix -das to any object pronoun: mynedas “myself”; anöndas “himself”. Attach it to a subject pronoun, and you get an intensive meaning: mydas “I myself”.

And then we have a special, irregular pronoun lataj. This one roughly means “each other”, and it’s used anywhere you’d need a “reciprocal” meaning: ysar lataj salmedi “they love each other”.

Finally, to add flavor and that hint of verisimilitude, Ardari has vocative forms of a few pronouns. These are: second-person formal troda and plural trodavar; third-person masculine anaj and aja; third-person feminine tija (singular and plural); and third-person neuter singular ys.

Of course, few of these are really needed in Ardari, because the language employs pro-drop liberally, thanks to the concord marking on verbs. If you can get away without a subject or even object pronoun, our hypothetical Ardari speakers would, except in the most formal situations.

For demonstratives, we have a threefold division. The table below shows the “determiner” form; separate pronouns can be made by adding the suffix -man. (Literally, zaman translates to “this one”, and so on for the rest.)

Near Middle Far
Masc. Sing. za pro gyon
Fem. Sing. zi pri gyen
Neut. Sing. zall prall alyör
Plural zej prej ejn

“Near” is those things near or known only by the speaker, or something specifically referred to recently in conversation, so that both speaker and hearer know it. “Middle” is used for things closer to the listener, or something that is well-known to both parties but absent. The “Far” demonstratives are used for those things that are far away from both speaker and listener, are not known to the listener at all, or are speculative in some way.

A few examples of these, since there are so many, and they don’t fit the same pattern as English:

  • ablonyje zallman “listen to this”; uses the “near” form because the speaker knows it, but the listener doesn’t.

  • sinyr prallman virdondall “we’ll sell that one”; takes the middle form, indicating something nearby and known to both parties.

  • mynin tyeri ejnman majtasa “my daughter wants some of those”; the far form connotes something that neither the speaker nor the listener has.

After all that, the interrogatives are easy. In fact, they’re all derived from a single word, qom “what”. From this, we get qomban “who”, qomren “where”, qomlajch “when”, and qoman “which (one)”. These inflect like any other neuter noun, but they can’t take an article suffix.

Indefinite pronouns can be formed from these just like in Isian. (Call it linguistic borrowing or author laziness, the effect is the same.) We have four possibilities here: ta “some”, za- “every”, du- “no”, and manö- “any”. Making whatever you need is as simple as slapping these in front of an interrogative: taqomban “someone”, zaqom “everything”, and so on.

Pausing the game

After this post, the series is going on temporary hiatus. You’ll see why tomorrow, but I’ll be back with more conlanging action on December 4. In the meantime, have fun playing with Isian, Ardari, or your own language.

When I come back, we’ll work on prepositional phrases, relative clauses, and whatever else I can think of. Then, for the start of the new year, you’ll get to see the first significant writing in both languages.

Assembly: optimization in the past and present

In this post, I won’t be discussing assembly language in any depth. Rather, I want to focus on one of the main reasons to use assembly: optimization. Actually, it might be the main reason today, because there’s not much need for assembly coding these days; it’s only when we want something to be as fast as possible that it comes into play.

Also, I’m moving away from the 6502 for this one, instead using the x86 architecture for my examples. Why? Because x86 is still the leading processor family for desktops, and so much has been written about it over the decades. There’s a lot of optimization info out there, far more than for just about any other architecture. Yes, ARM is growing, especially in the lower-end part of the market where assembly can still be very useful, but ARM—due to its very nature—is so fragmented that it’s hard to give concrete examples. Also, because x86 is so long-lived, we can trace the development of various processor features through its evolution. For that, though, we’ll need a bit of a history lesson.

Starting gates

The first microprocessors, from a bird’s-eye view, weren’t exactly complicated in function. They took an instruction from memory, decoded it, executed it, then moved on, sometimes jumping around the executable code space. Decoding each instruction and performing it were the only real hard parts. That’s one reason why RISC was so highly touted, as the smaller, more fundamental instruction set required less chip space for decoding and execution. (Look up something like VAX assembly for the opposite—CISC—end of the spectrum.)

Fetching the instruction was a simple load from memory, something every processor does as a matter of course. Decoding required a major portion of the circuit (the 6502 used a programmable array a bit like a modern FPGA, except that its function was fixed in the factory) but a comparatively small portion of processor time. Executing could require more memory accesses for the instruction’s operands, and it could take a lot of time, especially for something complex like multiplication—an ability the 6502, among others, lacks.

The birth of parallelism

But memory has always been slower than the processor itself. On all but the most complicated instructions, memory access takes the most time of any phase of execution. Thus, the prefetch queue was born. In a sense, this was the forerunner of today’s cache. Basically, it tried to predict the future by fetching the next few bytes from memory. That way, the large time constants required for RAM access could be amortized.

The problem with the prefetch queue, as with all cache, comes with branching. Branches, as we saw in earlier posts, are the key to making decisions in assembly language. But they force the processor to jump around, instead of following a linear path through memory. A branch, then, negates the advantage of the prefetch queue.

Processor designers (and assembly programmers) tried a few methods of working around the cost of branching. That’s why, at a certain time long ago, loop unrolling was considered a very important optimization technique. If you need to run a particular group of instructions, say, ten times, then it was a bit faster to “copy and paste” the assembly instructions than it was to set up a branching loop. It used more space, but the speed gains made up for that.

Another optimization trick was rearranging the branch instructions so that they would fail more often than not. For example, the x86 has a pair of instructions, JZ and JNZ, that branch if the zero flag is set or clear, respectively. (This is equivalent to the 6502’s BEQ and BNE, except that the x86 has more registers and instructions that can change it.) If you have a section of code that is run only when an argument is 0, and 0 rarely shows up, the naive way of writing it would be to skip over that section with a JNZ. But it might be faster (on these earlier processors, at least) to put the “only if 0” code at the end of the subroutine (or some other place that’s out of the way) and use JZ to branch to it when you need it.

In the pipeline

Eventually, the interests of speed caused a fundamental shift in the way processors were made. This was the birth of the pipeline, which opened a whole new world of possibilities, but also brought new optimization problems. The prefetch queue described above was one of the first visible effects of pipelining, but not the last.

The idea of a pipeline is that the processor’s main purpose, executing code, is broken into smaller tasks, each given over to a dedicated circuit. These can then work on their own, like stations on an assembly line. The instruction fetcher gets the next instruction, passes it on to the decoder, and so on. A well-oiled machine, in theory. In practice, it’s hard to get all the parts to work as one, and sometimes the pipeline would be stalled, waiting on one part to finish its work.

The beauty of the pipeline is that each stage is distinctly ordered. Once an instruction has been retrieved, the fetcher isn’t needed, so it can do something else. Specifically, it can fetch the next instruction. If the timing works out, it can fill up the prefetch queue and keep it topped off when it has the free time.

Fortune-telling

But branches are the wrenches in the works. Since they break the linear flow of instructions, they force the pipeline to stall. This is where the processor designers had to get smart. They had to find a way of predicting the future, and thus branch prediction was popularized.

When it works, branch prediction can completely negate the cost of a conditional jump. (Of course, when it fails, it stalls the whole thing, but that’s no worse than not predicting at all.) From an assembly language point of view, it means that we could mostly ditch the clever tricks like loop unrolling and condition negation. They would still have their uses, but they wouldn’t need to be quite so common. That’s a good thing, because the extra code size brought by loop unrolling affected another part of these newfangled processors: the cache.

Cache really came about as another way to make memory access faster. The method was a little roundabout, but it worked, and cache has stuck with us through today. It’s only getting bigger, too; most of the physical space on today’s processors is, in fact, cache memory. Many chips actually have more memory on the die than the 4 MB my first PC had in total.

The trick to cache comes from looking at how code accesses memory. As it turns out, there’s a pattern, and it’s called the principle of locality. Put simply, reading one memory location is a pretty good indicator that you’re going to be reading the ones right next to it. If we could just load all of those at once, then we’d save a ton of time. So that’s what they did. Instead of loading memory a byte or a word at a time, they started pulling them in 16 or more at once. And it was faster, but only while you stayed in the region of memory loaded into the cache.

Soon, cache became not fast enough, not big enough, and they had to find ways to fix both of these problems. And that’s where we are today. Modern x86 chips have three levels of cache. The first, L1, is the smallest, but also the fastest. L2 cache is a bit slower, but there’s more of it. And L3 is the slowest (though still faster than RAM), but big enough to hold the entirety of, say, Windows 95.

The present state of affairs

So now the optimization strategy once again focuses on space. Speed is mostly a non-factor, as the desktop x86 processors can execute most of their instructions in a single clock cycle, branch prediction saves us from the cost of jumps, and huge amounts of cache mean fewer of the horrifically slow memory accesses. But cache is limited, especially the ultra-fast L1. Every instruction counts, and we want them to all be as close together as possible. (Data is the same way, but we’ll ignore it for now.) Unrolling loops, for example, is a waste of valuable cache.

A few other optimizations have been lost along the way, made obsolete by the march of progress. One of my favorite examples is that of clearing a register. It’s a common need in assembly language, and the favored method of doing it early in the x86 days was by using the XOR instruction, e.g., XOR AX, AX. Exclusive-OR, when given the same value twice, always produces 0, and this method was quicker (and shorter) than loading an immediate 0 value (MOV AX, 0).

The self-XOR trick was undone by a combination of factors. The first was register renaming, which essentially gave the processor a bunch of “virtual” registers that it could use as internal scratch space, moving data to and from the “real” ones as needed. The second was out-of-order execution, and that takes a little more explaining.

If you’ve ever looked at the optimized assembly output of your favorite high-level compiler (and you should, at least once), then you might have noticed that things aren’t always where you put them. Every language allows some form of rearranging, as long as the compiler can prove that it won’t affect the outcome of the program. For example, take a look at this C snippet:

int a, d;
int b = 4;
int c = 5;
for (a = 0; a < b; a++) {
    f(a);
}
d = b * c;

The final statement, assigning the value 20 to d, can be moved before the loop, since there’s no way that loop can change the value of b or c; the only thing the loop changes, a, has nothing to do with any other variable. (We’re assuming more code than this, otherwise the compiler would replace b, c, and d all with the simple constant 20.)

A processor can do this on the assembly level, too. But it has the same restriction: it can only rearrange instructions if there are no dependencies between them. And that’s where our little XOR broke. Because it uses the same register for source and destination, it created a choke point. If the next few instructions read from the AX register, they had to wait. (I don’t know for sure, but I’ve heard that modern processors have a special case just for self-XOR, so it lives again.)

Ending the beginning

This has been a bit of a rambling post, I’ll admit. But the key point is this: optimization is a moving target. What might have worked long ago might not today. (There’s a reason Gentoo is so fast, and it’s because of this very thing.) As processors advance, assembly programmers need new tricks. And we’re not the only ones. Compilers produce virtually all assembly that runs these days, but somebody has to give them the knowledge of new optimization techniques.

Calm before the storm

October is almost over, and November is upon us. November, as you may know, is National Novel Writing Month (NaNoWriMo). If you’ve never tried this, it’s a great time to give it a shot. I will be, and I’ll have a post up later this week about how that will affect the update schedule here. (There’s no way I could write an average of 1,667 words per day on a novel and 3 posts a week on here, and I don’t have enough of a backlog to make up the difference, so something has to give.)

With this break in the schedule, though, I’ll have time to come up with more ideas for posts in all three of the main categories (writing, code, and languages). Then, when December rolls around and I get back to regular posting, I’ll be able to build up a bigger and better queue, which will give me a little bit more free time.

Here’s what I’ve got so far, starting with the “prose” part of Prose Poetry Code (I’ll get to that “poetry” part one of these days, I promise!):

  • Politics and religion in a fantasy world
  • Colonization, in both sci-fi and general fiction
  • A look at space battles, and how they might really play out
  • A set of posts about alien life, including what’s plausible, possible, and maybe even likely
  • An irregular series about the interaction between magic and technology, covering both “technomancy” and “magitech”

For the coding aspect, I’ve got some ideas about C++, ES6, game programming, procedural generation, and a few others. As for Let’s Make a Language, well, the second half of Part 8 will go up on Friday, and I have plans out to Part 13. Since each part takes 2-3 posts, that’s at least a good two months’ worth of content. I’ve also got a couple of themes for more general conlang posts that don’t fit the series, and I can slot them in whenever I need a break from creation.

So that’s what you have to look forward to, starting in December. Again, in a few days I’ll give an update on what will happen over the course of next month, including my plans to write a novel in 30 days.

Let’s make a language – Part 8a: Pronouns (Intro)

Pronouns are, at the most basic level, words that stand in for other words. Think of “he” or “them” in English. Those words don’t really mean anything by themselves. They usually have to be said with reference to some other thing, like “a man” or “a bunch of kids”. A few of them, like “someone”, don’t, that’s true, but most pronouns do tend to refer to another noun.

Also, the definition of “pronoun” covers more ground than you might think. And the way this ground is divided up varies from one language to another. Sure, it’s obvious that the examples above are pronouns, but so are words like “these” and “who”. However, some languages don’t have an equivalent to “these”, because they don’t need a plural form of “this”. The word for “who” might be different, too, based on various factors. So let’s take a look at all the kinds of pronouns we can find in a language, all those that might fit in a conlang.

Getting personal

The most well-known class of pronouns has to be the personal ones, exemplified by words like English “he”, “she”, and “us”. Despite the name, these don’t necessarily refer to people (“it” normally doesn’t, for example), but they match up fairly well with the person distinction on verbs, where the first person is the speaker, the second is the listener, and the third is everybody else.

That’s the ideal situation, anyway. In practice, even the three-way person distinction can be a bit nebulous. Some languages have two sets of third-person pronouns, one each for those things close by (proximate) and far away (obviative); the latter is sometimes called “fourth person”. We’re off to a good start, aren’t we?

For many languages, pronouns are distinguished in most or all of the ways that nouns are, whether by number, gender, case, or whatever else. In quite a few, they actually have finer distinctions than ordinary nouns. English is one of these, as its pronouns can be marked for case (“we” versus “us”) and gender and even animacy (masculine “he”, feminine “she”, inanimate neuter “it”, and—informally—animate neuter “they”).

Personal pronouns also often show contrasts in ways that are relatively rare for common nouns. Honorific or formal pronouns are common, mostly in the second person. Spanish Usted is an example, as are the many possibilities in Japanese. Animacy is another case of this, as you can see in the English example above. And the first person can come in inclusive and exclusive forms, depending on whether “we” is supposed to include the listener.

Beyond the basic three (or four) persons, we have a few other odds and ends. Impersonal pronouns exist in many languages; the English form is “one”, which isn’t much used in modern speech. Generic pronouns, like the “you” that has largely supplanted impersonal “one”, are a close relative. You can have reflexive pronouns, like the “-self” group, which refer to…well, themselves. Emphatic pronouns, in English, take the same forms as reflexives, but they’re meant to emphasize a specific noun, rather than simply refer to it: “I will go myself.” Possessive pronouns are another important class. Languages with case might treat them as genitive forms of personal pronouns, but they could also be independent. And finally, a reciprocal pronoun (English “each other”) pops up in many places, specifically to deal with a single situation.

Demonstration

The demonstratives are another group of pronouns. This is the group that includes English “this” and “that”, used to refer to a specific, known instance of something. English has a pair of these, a bit like the proximate/obviative split mentioned earlier. “This” is for nearby things, while “that” is used to refer to something at a distance. We can add a third degree into this—as in Spanish, for example—either between “this” and “that” or beyond both of them, like “yonder”, which is non-standard in most dialects, but not mine. Four or even five contrasting degrees of distance aren’t unheard of, either, and a few languages have none at all.

Questions and others

Interrogative pronouns, like English “who” and “what”, are used to form questions. (We’ll see exactly how that’s done in a later post.) We use these when referring to a noun we don’t know, as when we ask, “What is it?” This class isn’t limited to people and things, either. Many languages have specific pronouns to ask about time (“when”), place (“where”), and reasons (“why”), among others.

It’s also common to derive a few other pronouns from the interrogatives. Relative pronouns, for those languages that have them, often come from the question words: “the man who hired me”. Relative clauses are worth a whole post by themselves, though, so we’ll hold off further discussion about them.

The indefinite pronouns, on the other hand, we’ll talk about right now. They’re a big group of words that tend to be derived in some fashion. Some languages, like English, make them out of interrogatives, as in “somewhere” or “anyhow”. Others, like English (funny how that works out), create them from generic nouns like “thing” or “one”: “someone”, “nothing”, “everyday”, “anybody”. And then a few of them have special cases, as in Spanish algo “something”, which is a morpheme to itself.

The making of

In form, pronouns can take just about any shape. They can be separate words that function as nouns in their own right, as they are in many languages. They can appear as verbal suffixes, as is the case in polysynthetic tongues. Or they could be a mix of these.

One interesting notion we can discuss here is the idea of pro-drop, omitting pronouns that would be redundant due to verbal conjugations or other factors. We don’t have it in English—pronouns are always required—but it’s one of the first grammatical aspects students learn about Spanish, and many other languages allow it. Japanese might be considered an extreme example of pro-drop, as context allows—and decorum sometimes requires—a speaker to omit subject pronouns, object pronouns, and any other extraneous bits.

As far as the specific sounds used to create a pronoun, there are a couple of trends. Quite a few languages, for example, have a first-person pronoun with a front nasal sound like /m/, and many of those then go on to have a second-person pronoun with a central consonant like /t/ or /s/. Most European languages show this pattern (Spanish me/te; English me/thee; German mich/dich), enough to make you think it’s an Indo-European thing. But then you have Finnish, a Uralic language, with minä/sinä. And then WALS gives the example of Nanai, a language of eastern Siberia: mi/si. Clearly, there’s some process at work here. That is also made clear by a contrary trend, where the first person shows /n/, the second /m/. This one is more widespread in America, with occasional occurrences elsewhere in the world, in unrelated languages.

For your information

When making a conlang, pronouns can be a hassle to get right. Their very definition lends itself very well to a mechanical approach, especially in agglutinating languages, where you can just attach the right markers to some generic base. It’s harder to make a full set like English, where just about every personal pronoun on the chart has a different history.

The personal pronouns are probably the easiest, though it’s not exactly hard to go overboard. Indefinites, relatives, and all the rest aren’t as necessary at the start, if only because the things you’d most likely say in the early stages won’t need them. But they shouldn’t be too far behind, because they’re no less useful.

Remember that pronouns often follow a paradigm, but there are plenty of irregularities. In natural languages, that’s from borrowing, sound change, and all the other natural factors of linguistic evolution. But there are languages out there with very regular pronoun systems, too.

Future reference

The next post in this series will have all the pronouns you could ever want for Isian and Ardari. Since this post covered most of the theory, there won’t be that much left to do, so we’ll get words, words, and more words. After that, we’ll move to the things that we call prepositional phrases, which aren’t always what they seem.

Assembly: a little bit more

Well, I’m back. Instead of giving you more apologies for missing a couple of weeks of this exciting series (sarcasm alert!), let’s jump right back in and look at some more old-school assembly language. This week, we’ll get to know homebrewed 6502 versions of a couple of C standard library staples, and we can start talking about how you use data structures in assembly.

Memory and data

The simplest, dumbest (for the computer, not the programmer) way to treat data is as raw memory. The problem is, there’s not much you can do with it. You can initialize it, copy it around, and that’s about it. Everything else needs some structure. Copying in assembly language is pretty easy, though, even in 6502-land:

; Arguments:
; $F0-$F1: Source address
; $F2-$F3: Destination address
; Y register: Byte count
memcpy:
    lda ($F0), Y    ; 2 bytes, 5 cycles
    sta ($F2), Y    ; 2 bytes, 6 cycles
    dey             ; 1 byte, 2 cycles
    bne memcpy      ; 2 bytes, 2-3 cycles
    rts             ; 1 byte, 6 cycles

Yep, this is a stripped-down version of memcpy. It has its limitations—it can only copy a page of memory at a time, and it has no error checking—but it’s short and to the point. Note that, instead of a prose description of the subroutine’s arguments and return values and whatnot, I’m just putting that in the comments before the code. I trust that you can understand how to work with that.

Since the code is pretty self-explanatory, the comments for each line show the size and time taken by each instruction. A little bit of addition should show you that the whole subroutine is only 8 bytes; even on modern processors, the core of memcpy isn’t exactly huge.

The timing calculation is a little more complex, but it’s no less important on a slow, underpowered CPU like the 6502. In the case of our subroutine, it depends on how many bytes we’re copying. The core of the loop will take 13 cycles for each iteration. The branch instruction is 3 cycles when the branch is taken, 2 cycles when it’s missed. Altogether, copying n bytes takes 16n+5 cycles, a range of 21 to 4101. (A zero byte count is treated as 256.) In a modern computer, four thousand cycles would be a few microseconds at most. For the 6502, however, it’s more like a few milliseconds, but it’s hard to get faster than what we’ve got here.

Strings

The first way we can give structure to our data is with strings. Particularly, we’ll look at C-style strings, series of bytes terminated by a null value, hex $00. One of the first interesting operations is taking the string’s length—the C standard library’s strlen—and this is one implementation of it in 6502 assembly:

; Arguments:
; $F0-$F1: String address
; Returns:
; A: Length of null-terminated string
strlen:
    ldy #$00        ; 2 bytes, 2 cycles
    clv             ; 1 byte, 2 cycles
  slloop:
    lda ($F0), Y    ; 2 bytes, 5 cycles
    beq slend       ; 2 bytes, 2-3 cycles
    iny             ; 1 byte, 2 cycles
    bvc slloop      ; 2 bytes, 3 cycles
  slend:
    tya             ; 1 byte, 2 cycles
    rts             ; 1 byte, 6 cycles

All it does is count up through memory, starting at the pointed-to address, until it reaches a zero byte. When it’s done, it gives back the result in the accumulator. Now, this comes with an obvious restriction: our strings can’t be more than 255 bytes, or we get wraparound. For this example, that’s fine, but you need to watch out in real code. Of course, in modern processors, you’ll usually have at least a 32-bit register to work with, and there aren’t too many uses for a single string of a few billion bytes.

Our assembly version of strlen weighs in at 12 bytes. Timing-wise, it’s 12n+20 cycles for a string of length n, which isn’t too bad. The only real trickery is abusing the overflow flag to allow us an unconditional branch, since none of the instructions this subroutine uses will affect it. Using a simple JMP instruction is equivalent in both time and space, but it means we can’t relocate the code once it has been assembled.

Another common operation is comparing strings, so here’s our version of C’s strcmp:

; Arguments:
; $F0-$F1: First string
; $F2-$F3: Second string
; Returns comparison result in A:
; -1: First string is less than second
; 0: Strings are equal
; 1; First string is greater than second
strcmp:
    ldy #$00        ; 2 bytes, 2 cycles
  scload:
    lda ($F0), Y    ; 2 bytes, 5 cycles
    cmp ($F2), Y    ; 2 bytes, 5 cycles
    bne scdone      ; 2 bytes, 2-3 cycles
    iny             ; 1 byte, 2 cycles
    cmp #$00        ; 2 bytes, 2 cycles
    bne scload      ; 2 bytes, 2-3 cycles
    lda #$00        ; 2 bytes, 2 cycles
    rts             ; 1 byte, 6 cycles
  scdone:
    bcs scgrtr      ; 2 bytes, 2-3 cycles
    lda #$FF        ; 2 bytes, 2 cycles
    rts             ; 1 byte, 6 cycles
  scgrtr:
    lda #$01        ; 2 bytes, 2 cycles
    rts             ; 1 byte, 6 cycles

Like the its C namesake, our strcmp doesn’t care about alphabetical order, only the values of the bytes themselves. The subroutine uses just 24 bytes, though, so you can’t ask for too much. (Timing for this one is…nontrivial, so I’ll leave it to the more interested reader.)

Other structures

Arrays, in theory, would work almost like strings. Instead of looking for null bytes, you’d have an explicit count, more like newer C functions such as strncmp. On the 6502, the indirect indexed addressing mode (e.g., LDA ($F0), Y) we’ve used in every example so far is your main tool for this. Other architectures have their own variations, like the x86’s displacement mode.

More complex structures (like C structs or C++ classes), are tougher. This is where the assembly programmer needs a good understanding of how high-level compilers implement such things. Issues like layout, padding, and alignment come into play on modern computers, while the 6502 suffers from the slower speed of indirection.

Self-contained structures (those that won’t be interfacing with higher-level components) are really up to you. The most common layout is linear, with each member of the structure placed consecutively in memory. This way, you’re only working with basic offsets.

But there’s a problem with that, as newer systems don’t really like to access any old byte. Rather, they’ll pull in some number of bytes (always a power of 2: 2, 4, 8, 16, etc.) all at once. Unaligned memory accesses, such as loading a 32-bit value stored at memory location 0x01230001 (using x86-style hex notation) will be slower, because the processor will want to load two 32-bit values—0x01230000 and 0x01230004—and then it has to do a little bit of internal shuffling. Some processors won’t even go that far; they’ll give an error at the first sign of an unaligned access.

For both of these reasons, modern languages generate structures with padding. A C struct containing a byte and a 32-bit word (in that order), won’t take up the 5 bytes you’d expect. No, it’ll be at least 8, and a 64-bit system might even make it 16 bytes. It’s a conscious trade-off of size for speed, and it’s a fair trade in these present days of multi-gigabyte memory. It’s not even that bad on embedded systems, as they grow into the space occupied by PCs a generation ago.

Coming up

For now, I think I’m going to put this series on hold, as I’m not sure where I want it to go. I might move on to a bigger architecture, something like the x86 in its 16-bit days. Until then, back to your regularly scheduled programming posts.

Writing The One

Many movies, books, and other works of fiction involve a protagonist who is destined (or fated or whatever other term you choose) to save the world. Only he (or she, but this is rarer) can do this. No one else has the power, or the will, or the knowledge necessary to accomplish this feat. But this character does, for some reason. He is The One.

Stories of The One aren’t hard to find. For example, Neo, in The Matrix, is explicitly referred to by that moniker. But the idea of a single savior of the world, someone who can do what no other person can, goes back centuries, if not more. After all, it’s the founding idea of Christianity. Perhaps that’s why it’s so embedded in the Western mind.

Writing a story about The One is fairly straightforward, but there are pitfalls. The most obvious is similarity: how do you distinguish your hero from all those who have come before? That part’s up to you, and it’s so dependent on your specific story that I’m not sure I can say much that would be relevant. However, I can offer some food for thought on the general notion of The One.

Begin at the beginning

Let’s start with the origin story, since that’s what is so popular these days. How did your One come about? More importantly, how did he gain that status? Here are a few ideas:

  • The One was born that way. This one works best when it’s fate driving the story. The One is somehow marked from birth as such. Maybe he was born in a time of omen, like an eclipse. Or he could be the child of a supernatural being. In any event, this kind of story can deal with the conflict inherent in growing up as The One. Another option is that The One’s status is fixed at birth, but his power comes later.

  • The One received the destined status at a certain time. This could be at a coming of age (18 years old or the cultural equivalent), or at the time of a particular event. Basically, this idea is just a delayed form of the one above, and most of the same caveats apply. The benefit is that you don’t have to write a story about a character growing both physically and metaphysically at the same time.

  • Something changed the course of fate. In other words, The One wasn’t always meant to be; he only came into his own after a specific event. The death of his parents, for example, or a plague ravaging his homeland. Or, perhaps, he finds a sage or a sword or whatever, setting him on the path of becoming The One. Before that, he was a kid on a farm or something like that. Clearly, in this case, some part of your story needs to tell that story, whether through a prologue, a series of flashbacks, or some other storytelling device. (Another option, if you’re making a trilogy or similar multi-part story, is to have the first “act” tell the protagonist’s origin.)

Method to the madness

Now we have another question: how does The One work? Rather, how does his status manifest itself? Jesus could work miracles. Neo had essentially godlike powers while he was in the Matrix. Luke Skywalker was simply more powerful and more adept at the Force. None of these are wrong answers, of course; the one you want largely depends on the goal of your story. Some options include:

  • All-powerful, all the time. Sometimes The One really does have the power of a deity. That can work for movies, and even for books. It’s harder for a video game, though, and it can be tricky in any medium. The hardest part is finding a way to challenge someone who has such a vast amount of power. Look to superheroes, especially overpowered ones like Superman and Thor, for ideas here. (If this kind of The One gains his powers after a life-changing event, then you have a nice, neat solution for the first part of your story.)

  • Increasing over time. This one is popular in fantasy literature and video games, mostly because it fits the progression model of RPGs. If you’ve ever played a game where you slowly “level up” as the story unfolds, then you know what’s going on here. Either The One grows in overall strength, or his powers gradually unlock. Both ways can work, but a non-game needs to be written so that it doesn’t seem too “gamey”.

  • Unlocking your full potential. Instead of a slow rise in power, it’s also possible that The One’s path follows a pattern more like a staircase. Here, pivotal events serve to mark the different “stages” to The One. In actuality, this is another way of leveling up, but it’s guided by the story. The final confrontation (or whatever would end the world, if not for The One) is then the final level, and drama dictates that this is when the protagonist would reach the apex of his ability—probably shortly after a failure or setback.

Supporting cast

The One isn’t always alone. Any proper world-saving hero is going to have a set of helpful allies and companions. By necessity, they won’t be as powerful, but they can each help in their own way. Almost any type of character works here, as long as they can fade a little bit into the background when it’s time for The One to take center stage. Here are some of the more common ones.

  • The love interest. It’s a given nowadays that a hero needs romance. In video games, the current fad is to let the player choose which character gets this role. For less interactive works, it’s obviously a fixed thing. Whoever it is, the point is to give the hero someone to love, someone who is utterly dependent on his success, in a more personal way than the rest of the world.

  • The childhood friend. This is another way to add a personal element to the catastrophe. Like the love interest (which can actually be the same person), the childhood friend “grounds” The One in reality, giving a human side to someone who is by definition, a superhuman. (Note that you can also substitute a family member here, but then you can’t really combine this role with the love interest.)

  • The strongman. Unless The One is physically strong, he’ll likely need additional muscle, possibly even in the form of a bodyguard. This works in traditional fantasy, where it’s standard for the mages to be weaklings with massive hidden power. For most other styles, it’s harder to justify, but a tough guy is welcome in any party.

  • The academic. Some stories rely on the fact that The One doesn’t know everything about his potential, his destiny, his enemies, or even himself. The academic, then, serves the role of exposition, allowing the audience to learn about these things at the same time as the hero. This kind of character shines in the early acts of a story; by the end, dramatic pacing takes precedence, and the academic is no longer needed.

  • The otherworldly. In stories with a significant supernatural element, The One might have an inhuman friend or ally. This could be anything from a guardian angel, to an elemental creature, to a bound demon, to even an alien. This otherworldly character can break the rules the story sets for “normal” humans, as well as giving the protagonist an outside perspective. It can also function as a kind of academic, as beings from other worlds or planes often have hidden knowledge.

  • The turncoat. There are two ways you can go with this character. Either he’s someone who turned on The One—in which case, the turncoat makes a good secondary villain—or he turned on The One’s enemy to join the “good guys”. This second possibility is the more interesting, story-wise, because it’s almost like adding a second origin story. Why did he turn? Is he going to try and double-cross The One? The turncoat can also be a way to provide inside information that the protagonist logically shouldn’t have access to.

Conclusion

Writing The One is easy. Writing one of them to be more than simple wish-fulfillment is much harder. Put yourself in your characters’ shoes. Not just the protagonist, but the supporting crew, too. Think about the mechanism of fate, as it exists in the world you’re creating. And think about how you show the power that The One has. Sure, explosions are eye-catching, but they aren’t everything. The One can outwit his foes just as easily as he can overpower them, and sometimes that’s exactly what he must do.