How I write a lexicon

Yeah, another conlanging post. I’ll move on to other things sooner or later.

So, I have the sprawling mess of a grammar for Ŋyjichɯn (which I need to upload again), but that’s not where I story my lexicon (or dictionary, or whatever you personally want to call it). The way I do it is culled from experience and things I’ve learned, mostly from the CBB board (aka, the conlang forum – there are others out there though) and Zompist (linking to the online Language Construction Kit, but looking through his other stuff is helpful as well).

I do it in Excel, because I tend to work on several computers and I haven’t found a portable lexicon program. It always starts with three columns: (conlang), English, and notes. Then columns get added in as I need them. The Ŋyjichɯn lexicon has the following columns, in order:

  • Alpha – the Ŋyjichɯn alphabetical order. It looks like gibberish because I use find and replace in another sheet or text document (or Zompist’s Sound Change Applier 2, aka SCA2) to change every letter to something else.
  • Group – a temporary column, to pull things out that I want to work on. Right now there are three groups: 1) 75 words I pulled out to completely fill in, 2) words that need I want to fix right now, and blank, aka everything else.
  • Lexeme – a lexeme is the basic form of the word before inflection. This is so I can sort it in English alphabetical order, because of the next column.
  • Modern – Ŋyjichɯn is going to get split into two dialects with the same grammar (mostly), but differect vocabulary. Modern Ŋyjichɯn is what I’m working on right now. This column has the words, with some inflections. It also screws up the sorting, because somethings (mostly pronouns) start with notes like ‘subject singular’ or something in parenthesis because the singular form isn’t used (eg, “(sing: rɯs)”). It could have been done in a better way. In the same cell, on a different cell is the paucal (small group) and plural of each noun, and the full inflection of the pronouns.
  • Stress – using no Unicode, I spell out the stress pattern of each word. This then gets run through SCA2 to give me the next column
  • Phonetic – like most languages, spoken Ŋyjichɯn doesn’t exactly fit the written version. It’s not as bad as some, but stress, phonemes interacting, and other factors leads to things like ‘nyma’ being pronounced ‘mima’ or ‘miftyk’ becoming ‘mistych’.
  • Combo – an abbreviated form of the words is used in certain situations.
  • Part of speech – normal stuff, except in Ŋyjichɯn most words can be used as verbs, so I’ve split the parts of speech into things like ‘descriptive (verb)’, ‘noun/static verb’, and then the normal stuff.
  • English – self explanatory, but it’s important not to have one-to-one relations as much as possible
  • Irr? – notes of whether and how a word is irregular
  • Etymology – Mostly empty (or actually marked ‘same’), but has things like ‘onomatopoeic’ and what words compounded to make another word.
  • Category – this is so I can find similarly themed words when I want to. I’ve got things like color, language, directions, anmials, etc, in a drop-down list (which I keep breaking when I add new columns) (I got the code for how to do it here.)
  • Notes – basically, anything else, including usage notes, historical notes, and what words it’s related to.
  • Wanrin & Tajin – these two columns are empty right now.

This is what it ends up looking like:


(you’ll probably have to click on it to actually be able to read it)

I also keep adding worksheets. Besides the lexicon, I’ve got the notes for alphabetizing, the category list, and a list of parts of speech. And I have most of the numbers in a seperate file altogether. There’s also an Excel file of words I need to translate, along with notes about etymology and stuff.

So, does every lexicon need all this junk?

Nope. It depends on the language and what you’re doing with it. If you’re just making a naming language that will end up with a couple handfuls of words, you’ll probably need less detail. You have to customize it to what you need. Some people end up programming things for themselves (I can’t do that). If I wasn’t using multiple computers, I’d love to use Lexique Pro, which is designed for linguists. But Excel works well for sorting and finding things.

