🖼🗣 : the emoji conlang, part 2

In the previous article, I showed that it is possible to create a kind of modern-day hieroglyphic script using the ~1200 emoji characters available in Unicode. Now, let’s expand on that.

Rather than go through a formal grammar, we’ll work our way up from a few simple phrases and sentences, much in the same way as a student learning a new language. 🖼🗣 is, after all, a bit like it’s own language.

Preliminaries

First off, let’s define a few very simple words. These are all “content” words, as you’ll see; grammatical particles (what few we truly need in 🖼🗣) can come later.

  • 👨 – man
  • 👩 – woman
  • 👤 – person
  • 🧒 – child
  • 🐕 – dog
  • 🐈 – cat
  • 👁 – eye
  • 👄 – mouth
  • ✋ – hand
  • 👣 – foot
  • 🍴 – to eat
  • 🥤➡ – to drink
  • 👀 – to see
  • 👂➡ – to hear
  • 🧠➡ – to know
  • 🚶 – to walk
  • 💧 – water
  • 🌬 – air
  • 🔥 – fire
  • 🌐 – earth
  • 🌞 – sun
  • 🌝 – moon
  • ⛅ – sky
  • 🔴 – red
  • 💚 – green
  • 🔷 – blue
  • ◻🌈 – white
  • ◼🌈 – black
  • ♨ – hot
  • ❄ – cold
  • 😃 – happy
  • 😢 – sad
  • 💪 – strong
  • ⬜ – big
  • 🧠〰 – smart

For most of these, the meanings should be fairly obvious. Some, however, are compounds. As an example, the color terms for white and black, ◻🌈 and ◼🌈, combine their first glyphs (ordinarily simple nominal particles) with 🌈, a regular derivation that makes color terms. Similarly, the numerous verbs with ➡ are derived from nouns; the second symbol here acts as a verbalizing suffix. And for adjectives, you can often use 〰, as we did with “smart”: 🧠〰.

Simplicity

The simplest sentences are those with nothing more than a subject, verb, and object. And, to make things even simpler, we’ll start with the most basic verb of all: “to be”. In 🖼🗣, that’s 👉. No need to worry about agreement suffixes or anything like that, though. For our purposes here, 👉 is all we need. (We’ll get to tenses in a later part.)

Here are a few examples to show what I mean:

  • 🧒 👉 😃. – The child is happy.
  • 👩 👉 🧠〰. – The woman is smart.
  • 👨 👉 💪. – The man is strong.
  • 🔥 👉 ♨ – Fire is hot.

Note that we don’t need any special word for “the”, either. It’s understood. (If you really think you need it, you can use 👇, though its meaning is closer to “this”.)

Our other verbs aren’t quite as easy to work with, but we can manage. The principle’s the same, after all:

  • 🧒 👀 🐕. – The child sees a dog.
  • 👨➿ 🚶. – The men are walking.
  • 🐈 🥤➡ 💧. – The cat drinks water.

Once again, don’t worry about the difference between English simple and progressive forms. 🖼🗣 doesn’t bother distinguishing the two.

You, me, and all the rest

Today, everybody’s worried about pronouns. Well, I’ve got you covered there, because 🖼🗣 has plenty of them.

Most languages make a distinction between persons: first, second, and third. To some extent, that’s what we’ll do here, but modern communication, especially on the Internet, is more geared towards a distinction between speaker, audience, and others. (Technically, that’s all the three degrees of person represent, but bear with me.)

A speaker’s solo pronoun is 👁️‍🗨. If they’re including others (whether inside or outside their audience), then this becomes 🤲. These are like “I” and “we”, respectively:

  • 👁️‍🗨 👉 🧠〰. – I am smart.
  • 🤲 👉 😃. – We’re happy.
  • 👁️‍🗨️ 👀 🔷 ⛅ – I see the blue sky.

(Important note: Some systems are not able to display or input the “compound” emoji 👁️‍🗨️. If yours is one of them, don’t despair. You can use 🤳 instead. It doesn’t mean exactly the same thing, as we’ll see in the next part, but it’s close enough.)

But here’s where it gets interesting. If you’re only speaking of yourself, there’s really no reason to need that cumbersome pronoun. It’s implied, because you’re the one talking. So that first sentence can become “👉 🧠〰.” instead, and it’ll mean the same thing.

Only the singular speaker pronoun can be dropped like this, which is far different from most spoken languages which allow such things.

The listener pronouns are much simpler. In fact, they’re not even pronouns at all, because there’s only one of them: 💮. Example:

  • 👁️‍🗨️️ 👀 💮. – I see you.

As with English “you”, this works for both singular and plural.

Last are what most languages call the third-person pronouns. Here, 🖼🗣 has a wide variety to choose from, so let’s take a look.

  • For talking about people in general: singular 👤, plural 👥
  • For talking about anything not human: singular ◻, plural ◻◻
  • For talking about only men: singular ♂, plural 👥♂
  • For talking about only women: singular ♀, plural 👥♀

Mostly, the first two pairs should be preferred, and the “general” form is required when you’re referring to mixed groups. And, of course, using the “non-human” pronouns when you want to talk about people is just wrong.

Some examples using these pronouns:

  • 👥 👉 😃. – They are happy.
  • ♂ 👉 ⬜ 👨. – He is a big man.
  • 👀 ◻. – I see it.
  • 👥♀ 👂➡ 💮. – They (i.e., those women) can hear you.

Possessed

Last in this little lesson, we’ll discuss the possessive form. As with many parts of 🖼🗣, that’s a little different from what you might expect. In fact, it’s one of the few cases where the script recycles English punctuation.

Our key here is the apostrophe, or single quote mark: ‘. When put between two nouns (pronouns included), it indicates that the first possesses the second. So we might say 🧒’🐈 for “the child’s cat” or ♂’✋ for “his hand”.

These aren’t exactly compound nouns, but they can function much like them, fitting into sentences with ease.

  • 👁️‍🗨️’👁 👉 🔷. – My eyes are blue.
  • ♀’🧒➿ 👉 😃. – Her children are happy.
  • 👀 ⬜ 👨’🐕➿. – I see the man’s big dogs.

In the last example above, you can see a difference between the script and English, as far as word order is concerned. The possessive “attaches” to the head noun, even if there are modifying adjectives before it.

Also, you can “chain” possessives, as in ♂’🧒’👁➿ “his child’s eyes”.

Moving forward

Now that you’ve seen a little bit more of this experiment, does it still seem so outlandish? Stay tuned, as this series will delve even deeper into the weird world of emoji, and the strange things we can accomplish when our language is allowed to use nothing else. 👀▶ 💮 🔜!

🖼🗣 : the emoji conlang, part 1

I talked about this a while back, but now it’s for real. Today, I introduce to you a new conlang: 🖼🗣. Or, to put the name in something pronounceable, Pictalk. Yes, the glyphs making up the name are emoji. Yes, so are all the characters used in the entire language.

Strictly speaking, Pictalk isn’t a full-fledged conlang. It’s written-only, first of all. There is no true spoken form. Instead, it should be considered something closer to a conscript, an artificial writing system, modeled after hieroglyphic and ideographic scripts. But that’s enough to encode ideas, thoughts, sayings, and anything that might need to be written in this modern, digital age.

Glyph inventory

The hardest part about making Pictalk is the very restricted set of available glyphs. True, there are over 1200 emoji characters available, and they cover a wide variety of concepts, from animals to emotions to transportation and more. But I don’t have control over which symbols the Unicode Consortium adds to the list. While that list will grow (they add more each year, it seems), there’s little rhyme or reason to which new characters come in.

But that’s okay. We can do this. English only needs 26 letters, right?

Even with the wide array we have, it’s safe to discard quite a few right off the bat. First, I’ll drop the “cat face” group, such as 😸, because they really only repeat the normal human smileys. Next, toss out the handful of CJK ideographs in circles or squares, like 🈹—I’m an English speaker, and even Unicode gives up on giving them reasonable names. The skin tone modifiers (🏻 and friends) don’t make sense in the context of language; Pictalk thus won’t give them meanings, but will allow them to modify other symbols as a kind of synonym.

Likewise (and here’s where we start getting into the grammar bits), gendered forms like 👩‍🏫 or 👨‍⚕️ are synonymous with their “base” forms. With many languages, particularly in the West, where there is no neuter form, masculine is considered the default. Pictalk, however, is gender-neutral. That’s not out of some misguided idea of social justice or diversity, but simple expedience. Unicode has neuter forms for most of what we might call agentive glyphs. Where it doesn’t, we can use either, and that’s fine.

Last, flags. These take up a good chunk of the emoji list (about 15%, all told), and they’re mostly country flags. Well, for Pictalk, those flags represent their countries, and that’s that. Unlike most other characters, they don’t really participate in the construction processes we’ll see later on.

Non-emoji characters

Before we get to that, let’s go over the rest of Unicode. Obviously, since the whole point of Pictalk is to create a hieroglyphic script using the emoji characters, they’re the focus. But we’ve got a few other options available. One I won’t use is Latin letters. Or, for that matter, any other alphabetic script. In earlier versions of the language, I did utilize them for derivation and some small grammatical particles, but I’ve since removed the need for them. Only proper names use alphabetic characters; these are written as they would be in either the speaker’s or the audience’s preferred language.

Numbers, on the other hand, are perfectly usable. They’re already a little bit ideographic, after all, so it wouldn’t destroy the purity of Pictalk to include them. So 0-9 work exactly as they would in English: as the numerals zero through nine. And you can build on that as you do in English. (Pictalk is base-10, by the way.)

Punctuation works the same, as it’s very difficult to design a conlang that doesn’t need it. So sentences can end with a period, question mark, or exclamation point. Quote marks work for, well, quotes. Commas aren’t as necessary, but you can still use them to mark off clauses. Colons, besides having their normal English function, are used as attention-getters, in a sense, following the intended recipient of a statement or question. And we’ll see the other “special” characters as they come up.

Building words

Quite a few emoji work as words by themselves. Think of 🐕, 😄, or ✈, for instance. In Pictalk, that’s the most basic sort of word, and most symbols can function alone. Some are considered nouns, others adjectives or verbs, but there’s always a way to convert them.

Other symbols are “bound”, in that they can only occur fixed to others. An example here would be the (optional) plural marker ➿. By itself, it has no meaning. Suffixed to a root, whether a single symbol or a string of them, it gains meaning: 🐕➿ “dogs”.

More complex are the compound symbols that make up the bulk of the lexicon. In general, nominal compounds are head-final, as in 🐕🏠 “doghouse”, while verbal compounds are often head-initial, as with 📖🏫 “study”, from 📖 “read”. I’ve tried to refrain from being cute with meanings, striving instead for transparency, but some compounds remain idiosyncratic in meaning.

Last, a form of word-building that English doesn’t often employ comes into its own in Pictalk. Reduplication is productive for many basic words. For nouns, it can create a kind of collective sense: 🏠🏠 “neighborhood”. Verbs instead use reduplication as an intensifier: 💭💭 “to contemplate” (or possibly “to overthink”).

Moving on

All in all, I think this just might work. We can make words using only emoji characters. Next up, we’ll see how far we can go in making a language.

A mad experiment

Today, most of the world uses alphabetic scripts, or something fairly close to them. With the major exception of Chinese (and the writing systems derived from it, such as those in Japan and Korea), alphabets, consonantal scripts, and the like reign supreme. They’re easier to learn, obviously, and far more suited to computers, so it’s only natural. Simple scripts, in the vast majority of cases, work just fine, so that’s what we use.

But it wasn’t always this way.

If you look back at the history of writing, you see that alphabets were not the original form of script. Indeed, assuming current theories are correct, writing developed first as pictorial representations of people, animals, etc. Abstractions came in later, as did the practice of using glyphs to represent spoken language, rather than as something closer to an aide mémoire.

The oldest evidence of writing we have all points in the same direction. Egyptian hieroglyphs, Sumerian cuneiform, and ancient Chinese symbols share the common feature of being, at least in some part, logographic scripts. The same may be true of other, mostly undeciphered writing, such as the Proto-Elamite script of that of the Indus Valley—given their age, it doesn’t seem out of the realm of possibility. While China kept its style of writing through the millennia, occasionally simplifying but never throwing away, the rest have mostly died out, replaced by Latin, Greek, Cyrillic, Arabic, the various scripts of India and Southeast Asia, and so on.

Enter madness

But wait. Anyone with a cellphone (which is to say, well, anybody) has at their disposal a vast and growing collection of bona fide ideograms: emoji. Can we use those as the basis for a modern-day hieroglyphic script?

I know what you’re thinking. “Michael, you’ve gone completely crazy!” you probably shouted at your computer screen.

You’d be right, but hear me out. I am being totally serious. Think about it. As of 2018, there are over 1000 emoji symbols in the Unicode standard, and they’re adding more with every update. Granted, most of the new ones are gender-specific versions of older ones, but you still see a genuine emoji every now and then. (“Lobster” was in the newest batch, I think.)

Most emoji fall into one of two categories. One is clearly nominal in nature: animals, vehicles, people, and so on. The other is the emotional set: grinning faces, smilies, and the like. Those can be considered adjectives, if you look at it the right way. Verbs, now, those are harder, but not impossible.

So here’s what I propose. Take the emoji, minus a few that aren’t really all that useful to English speakers (think the “cat faces”, or the numerous symbols containing Japanese writing), and construct a script. Or, if you will, a written-only conlang. Technically speaking, it would be something more akin to a pidgin. It would have no vocabulary of its own, and the grammar would necessarily be very stripped-down.

The limitations are severe, but operating under limiting conditions is the time-honored path of the hacker (in the original sense of the word). Here, we have no control over the inventory of symbols, no convenient way of even typing them, much less pronouncing them. And there’s no real payoff, either. If I did this, it would be for fun, not for glory.

Yet none of that ever stopped me before, so why should it now?

If you’re interested, stick around. I’ll post something more about this mad scheme in the coming weeks.

The problem with emoji

Emoji are everywhere these days. Those little icons like 📱 and 😁 show up on our phones, in our browsers, even on TV. In a way, they’re great. They give us a concise way to express some fairly deep concepts. Emotions are hard to sum up in words. “I’m crying tears of joy” is so much longer than 😂, especially if you’re limited to 140 characters of text.

From the programmer’s point of view, however, emoji can rightfully be considered a pox on our house. This is for a few reasons, so let’s look at each of them in turn. In general, these are in order from the most important and problematic to the least.

  1. Emoji are Unicode characters. Yes, you can treat them as text if you’re using them, but we programmers have to make a special effort to properly support Unicode. Sure, some languages say they do it automatically, but deeper investigation shows the hollowness of such statements. Plain ASCII doesn’t even have room for all the accented letters used by the Latin alphabet, so we need Unicode, but that doesn’t mean it’s easy to work with.

  2. Emoji are on a higher plane. The Unicode character set is divided into planes. The first 65,536 code points are the Basic Multilingual Plane (BMP), running from 0x0000 to 0xFFFF. Each further plane is considered supplemental, and many emoji fall in the second plane, with code points around 0x1F000. At first glance, the only problem seems to be an additional byte required to represent each emoji, but…

  3. UCS-2 sucks. UCS-2 is the fixed-width predecessor to UTF-16. It’s obsolete precisely because it can’t handle higher planes, but we still haven’t rid ourselves of it. JavaScript, among others, essentially uses UCS-2 strings, and this is a very bad thing for emoji. They have to be encoded as a surrogate pair, using two otherwise-invalid code points in the BMP. It breaks finding the length of a string. It breaks string indexing. It even breaks simple parsing, because…

  4. Regular expressions can’t handle emoji. At least in present-day JavaScript, they can’t. And that’s the most used language on the web. It’s the front-end language of the here and now. But the JS regex works in UCS-2, which means it doesn’t understand higher-plane characters. (This is getting fixed, and there are libraries out there to help mitigate the problem, but we’re still not to the point where we can count on full support.)

  5. Emoji are hard to type. This applies mostly to desktops. Yeah, people still use those, myself included. For us, typing emoji is a complicated process. Worse, it doesn’t work everywhere. I’m on Linux, and my graphical applications are split between those using GTK+ and those using Qt. The GTK+ ones allow me to type any Unicode character by pressing Ctrl+Shift+U and then the hexadecimal code point. For example, 😂 has code point 0x1F602, so I typed Ctrl+Shift+U, then 1f602, then a space to actually insert the character. Qt-based apps, on the other hand, don’t let me do this; in an impressive display of finger-pointing, Qt, KDE, and X all put the responsibility for Unicode handling on each other.

So, yeah, emoji are a great invention for communication. But, speaking as a programmer, I can’t stand working with them. Maybe that’ll change one day. We’ll have to wait and see.