Periodically, one goes through periods of deep metaphysical malaise. You look around at the world, wondering how such evil could flourish and such suffering could endure. You descend deeper into darkness, your faith in humanity waning, wondering why we were ever born into this cruel world. Then, suddenly, you realize that somebody has written a programming language based off of the dialect of Lolcats/Cat Macros, and your faith in humanity’s inherent good is completely restored.

LOLCode is a computer programming language concept which draws its vocabulary from the recent internet sensation of captioned cat pictures. Although not fully functional yet, it’s still linguistically fascinating on many different levels, and deserves mention.

i has dialect

One of the most interesting parts of this programming language is that it can exist at all, and the fact that it can goes a long way towards establishing the legitimacy of a feline dialect.

Imagine that I wanted to create a programming language based solely off of star wars vocabulary. I would likely start by finding a donor language, whose basic syntax and ideas I would borrow. Then, I would begin to slowly find equivalents and their translations.

Some equivalent/translation pairs might be obvious. ‘Death Star’ for a verb which meant “remove file”, maybe ‘carbonite’ for “pause process”. One could even get a bit more ornate and incorporate some movie quotes. Perhaps “there is an error” could be coded with ‘It’s a Trap!’, and “load this program” could be ‘Commence Primary Ignition’.

However, no matter how nerdy I felt at the time, my plan would be fatally flawed from the outset. Sooner or later, I would find an expression that was too niché (fulfilling just a small purpose) to have a Star Wars equivalent. I’d have to rely on a set canon of phrases to fill in the blanks, and there’s no way to work around it and still maintain the Star Wars theme.

The reason that LOLCode is so awesome is that, based on what I’ve seen so far, it doesn’t seem to have that limit. Based on my highly scientific research at icanhascheezburger.com, it would appear that LOLCat has become a full fledged dialect. There are many captioned images there, each slightly different, and each seems to fit a coherent grammatical pattern. Some linguists are starting to pick up on distinct patterns and grammatical rules, and based on the fact that any sentence can now be LOLCatted, I’m quite tempted to say that LOLCat has become a productive and functional dialect of English.

Because of this productivity of the LOLCat dialect, it would be quite possible for somebody to take any given sentence or idea and put into LOLCat, thus ensuring that LOLCode could, in theory, become fully functional without ever breaking character. This is very exciting, and very awesome.

mai translationz r not straitforwerd

LOLCode is a very special sort of translation. Conventionally, when one sits down to label a cat, the source is an English sentence (I’m yet to find any cats “en mi refrigeradora, comiendo mis comidaz”). However, here, what people are doing is finding equivalents in human/feline language for concepts, verbs, and ideas within a computer language.

Rather than being able to simply translate, they’re forced to create the inflexible, ambiguity free grammar required to tell a computer what to do. This is tough enough to do even using all sorts of abstract symbols, but to do it within LOLCat dialect and syntax is wonderfully difficult. They’re adapting a human language into a dialect, then bending it into a computer language. This is by no means an easy ask, and it’s a far more complex sort of translation than many.

For this alone, I salute the creator and contributors to LOLCode. Although it may seem silly to some, this is really some top-of-the-line linguistic work.

d00d. ur dialect is teh suxx0rs

Perhaps the even interesting than the mere fact that LOLCat has become a translatable dialect is the fact that, well, there are already people who are arguing about the “correct” way to say something in LOLCat. Take, for instance, this post on the LOLCode wiki:

I know VISIBLE is the current output command, but it’s so not LOLCAT. What if we used LOL as the output instead? So, the Count-1 example becomes:

(Code)

I think this works very well, is funny to read and matches actual LOLCAT protocol, sorta. I guess the LOL would be at the end normally.

As a linguist, this is really, really exciting. People are already trying to step in and enforce the “rules” of the LOLCat dialect. It seems like, as a “native speaker” of LOLCat, the author of this page had a distinct intuition about the “proper” means of expressing a concept in this dialect. Truly incredible.

Although this community of people has only arisen recently, I’m very excited at the potential for the later discussions of “proper” LOLCat, and the sociolinguistic goodness sure to arise from it.

o hai. i discussed ur werk.

So, author of (and contributors to) LOLCode: I salute you. This is a unique, wonderful, and groundbreaking project, and I really hope that it continues to yield such fascinating linguistic insight into the future.

Keep up the good work, and don’t let anybody convince you that what you’re building is silly or unnecessary. If there are two things that the world of technology needs, it’s probably humor and cute, fuzzy animals, and really, I can’t think of a better way to combine the two.

Alright, I’m done. kthxbye

Tagged with Computational Linguistics, Conventional Linguistics, Dialects and Idiolects, Language Humor, Language Usage, Language, Computers, and the Internet, Sociolinguistics, Translation and Translation Theory | 32 Comments


I’m a big fan of the Quote Database at bash.org (Not safe for work, may contain strong language and subject matter). The site is a pasteboard for funny quotes taken from online chats on IRC and other instant message chat services. Although some of them are just wonderful in their own right (here, here and here), many of them have to do with language and language related issues.

One example of a Bash.org quote about language is this one, reproduced here in its entirely:

< %kiwibonga> Je ne donne pas un merde – I don’t give a shit
< %kiwibonga> THAT MAKES NO SENSE
< %kiwibonga> you cannot give a shit to someone
< %kiwibonga> in french
< %kiwibonga> that sounds like “I’m taking a shit in my hands and I’m keeping it for myself”

(For those unfamiliar with the source here, the above quote is referring to the English idiom “I don’t give a shit”, which means, roughly, “I really don’t care” or “I couldn’t care less”.)

This is a wonderful (and humorous) example of the fact that one cannot literally translate some idioms into another language and expect them to retain their meaning.

In many ways, an idiom is a phrase which has cultural meaning independent of the words that make it up. If I say “that’s the way a cookie crumbles” to a politician who just lost an election, I’m not implying that his campaign sat out too long, got stale, and then broke into small pieces when touched. Instead, I expect him to know that I’m saying that such things happen in life, and that I sympathize. There’s nothing in the words per se that carries the meaning, but instead, it’s based in a certain cultural knowledge shared by the two people.

When you start translating these idioms, you end up copying over the words, but the meaning is lost because there’s no shared cultural background. Once that’s lost, one has to read the literal meaning of the words, and thus, “I’m taking a shit in my hands and keeping it for myself”.

This principle isn’t necessarily universal. If I said “A bird in hand is worth one hundred flying” (from Spanish), most people could understand it to mean the same thing as the idiom “A bird in the hand is worth two in the bush”. “That’s flour from a different sack” (also Spanish), in context, would likely be understood to mean “That’s a whole different story”.

However, in most cases, the meaning of an idiom comes not from the words themselves, but from the originating culture. The moral of this story: When you translate idioms word-for-word, if the snake bites you, there’s no remedy in the pharmacy.

(That, or you’re playing with fire. Either way.)

Tagged with Conventional Linguistics, Language Humor, Language Usage, Translation and Translation Theory, Words, Phrases, and Idioms | 2 Comments


So, I’m somewhat obsessed with checking the statistics of who comes here, who gets referred from where, and what search terms they used to find me. Well, the other day, somebody came here from google searching for “IPA translation widget”. For those of you unfamiliar with the terms, a “widget” is a small program written for Apple’s Dashboard interface, and IPA refers to the International Phonetic Alphabet. What this person seems to be wanting was a widget that, like some existing translation widgets, could take a block of text and immediately turn it into IPA characters. For the first few moments, I thought “Wow! That’d be a great idea!”.

Now, as somebody who uses the IPA very, very frequently, such a thing would be wonderful if it worked well. However, I think it would be impossible to actually create a program that goes from English writing to IPA transcriptions without incredible advances in Artificial Intelligence and speech recognition. Here’s why…

Transcription, not translation

At the surface, this doesn’t seem so crazy. Apple includes a widget to do rough, automated translations with Dashboard, and although I never trust automated translations, it does alright for basic words and phrases. I suspect that our anonymous searcher saw that widget and thought “Wow, cool! I wonder if it can help me put something into the IPA”. However, the fundamental difference between translating a sentence into Spanish and putting that same sentence into the IPA is that the IPA isn’t really a language at all, but instead, it’s a method of writing sounds.

The International Phonetic Alphabet is really a set of symbols, each of which represents a sound, sound characteristic, or other element of spoken language. What the IPA allows a linguist (or speech pathologist, or teacher…) to do is to take spoken language and put it onto paper (‘transcription’) with a great deal more precision than most other writing systems. The IPA isn’t a language in itself, it’s just an alternative, phonetic writing system for other languages. The beauty of this is that the IPA is designed to be able to be used not just for English, but for any language. The IPA symbols can be used to transcribe sounds not just from English, but from languages all over the world.

Broad vs. Narrow Transcription

The IPA can be used to transcribe sounds with two different degrees of precision.

If one takes advantage of all the symbols and diacritics, one can make a “narrow” or “phonetic” transcription. At this level, the linguist aims to capture all the detail possible about the word or phrase, including variations across word boundaries, sounds that occur in speech but are unnoticed or unrecognized by native speakers, and even features like intonation and pauses. From these transcriptions, a well-trained linguist could pronounce the words and phrases almost exactly as the speaker did, based simply on the transcriptions. The first, smallest line in the title graphic is a narrow transcription of me pronouncing the site’s title.

This degree of precision would be impossible for a modern computer widget to produce, simply because narrow transcriptions are based on actual words and phrases by a speaker, and really, one needs a fairly trained ear to make an accurate narrow transcription of a word or phrase. Sure, it could use a database of narrowly transcribed words from other speakers, but really, that’s not a narrow transcription. It’s not going to pick up on the variations that each speaker produces, like accents, vowel changes, unusual sound choices, or even tiny speech errors.

The alternative is called “broad” or “phonemic” transcription, expresses the basic sounds of a language or phrase, often more precisely than the native writing system, but at the same time, leaves out detail that’s not necessary to a native speaker. The middle line in the title graphic for this page is a phonemic transcription. Some dictionaries, including the built in OS X dictionary (if you enable IPA in Dictionary Preferences), can show you the standard american IPA Broad transcription form of a word.

Now, using a dictionary of words in a given language and their IPA equivalents, a computer could likely match things and give a passable broad transcription. However, there are variations that occur between people that show up even at a broad level, and are large enough to identify a speaker’s accent, dialect, or even idiolect. For some people (myself included), “caught” and “cot” have the same vowel, but for others, they’re two distinct vowels. So, even at a broad level, you’re not going to get any sort of reliable transcription of one’s actual speech from a computer widget, just a rough approximation.

Why are you transcribing anyways?

In the end, whether such a widget would be useful at all boils down to your reason for needing a transcription. Some people might be learning English and would want a better method of knowing how a given word is supposed to sound. For that, any good dictionary’s pronunciation key should do the trick.

Some people might be interested in the IPA, or want to know how a given word sounds. For that, they’d be better off getting a good phonetics textbook and learning a bit of the IPA themselves, along with some knowledge of phonetics.

However, our widget searcher might just be stuck in an introductory Linguistics course, having to transcribe their speech for an assignment. If so, I offer just one piece of advice: Don’t plagarize transcriptions off the web or from a dictionary. Your professor should have no trouble noticing if you’re not transcribing your own dialect, and everybody’s got a dialect.

Remember, if there’s one thing that phonetics professors are good at, it’s picking out a phone-y.

Tagged with Computers and Software, Conventional Linguistics, Phonetics and Phonology, Translation and Translation Theory | 21 Comments


Site Information

Search all posts

Tags


Archives


Site features