<TITLE>Mnemonic encoding wordlist</TITLE>
<H1>About the wordlist</H1>
The most time-consuming part in this project has been the compilation of the
wordlist. Writing the reference implementation has been trivial in comparison.
In the compilation of the wordlist I have scanned hundreds of thousands of
words using a combination of automated tools and manual sorting. I estimate
that over 300 hours have gone into the creation of the wordlist and around 50
ad-hoc awk scripts. <P>
The wordlist still isn't perfect. <A href="/web/20090918202746/">See how you can help improve it.</A><P>
Remember that at this stage the list is not final and encoded information will
not be compatible to future versions. When the list reaches version 1.0 it
will be frozen and no modifications will be done except, perhaps, spelling
corrections that are still accepted by the soundalike matching. <P>
<UL><LI><A href="">View the wordlist.</A><P></UL>
<H2>Word selection criteria</H2>
<LI>The wordlist contains 1626 words.
<LI>All words are between 4 and 7 letters long.
<LI>No word in the list is a prefix of another word (e.g. visit, visitor).
<LI>Five letter prefixes of words are sufficient to be unique.
The rest of the criteria are less strict. You may find exceptions to all of
them because it is difficult to satisfy them all at the same time.<P>
<LI>The words should be usable by people all over the world. The list is far
from perfect in that respect. It is heavily biased towards western culture
and English in particular. The international vocabulary is simply not big
enough. One can argue that even words like "hotel" or "radio" are not truly
international. You will find many English words in the list but I have tried
to limit them to words that are part of a beginner's vocabulary or words that
have close relatives in other european languages. In some cases a word has
a different meaning in another language or is pronounced very differently
but for the purpose of the encoding it is still ok - I assume that when the
encoding is used for spoken communication both sides speak the same language.<P>
<LI>The words should have more than one syllable. This makes them easier to
recognize when spoken, especially over a phone line. Again, you will find
many exceptions. For one syllable words I have tried to use words with 3 or
more consonants or words with diphthongs, making for a longer and more distinct
pronounciation. As a result of this requirement the average word length has
increased. I do not consider this to be a problem since my goal in limiting
the word length was not to reduce the average length of encoded data but to
limit the maximum length to fit in fixed-size fields or a terminal line
<LI>No two words on the list should sound too much alike. Soundalikes
such as "sweet" and "suite" are ruled out. One of the two is chosen and the
other should be accepted by the decoder's soundalike matching code or using
explicit aliases for some words. <P>
<LI>No offensive words. The rule was to avoid words that I would not
like to be printed on my business card. I have extended this to words that
by themselves are not offensive but are too likely to create combinations
that someone may find embarrassing or offensive. This includes words dealing
with religion such as "church" or "jewish" and some words with negative
meanings like "problem" or "fiasco". I am sure that a creative mind (or a
random number generator) can find plenty of embarrasing or offensive word
combinations using only words in the list but I have tried to avoid the more
obvious ones. One of my tools for this was simply a generator of random word
combinations - the problematic ones stick out like a sore thumb.<P>
<LI>Avoid words with tricky spelling or pronounciation. Even if the receiver
of the message can probably spell the word close enough for the soundalike
matcher to recognize it correctly I prefer avoiding such words. I believe this
will help users feel more comfortable using the system, increase the level
of confidence and decrease the overall error rate. Most words in the list can
be spelled more or less correctly from hearing, even without knowing the word.
<LI>The word should feel right for the job. I know, this one is very
subjective but some words would meet all the criteria and still not feel
right for the purpose of mnemonic encoding. The word should feel like one of
the words in the radio phonetic alphabets (alpha, bravo, charlie, delta etc).<P>
When checking for soundalikes I have found that the standard soundex
algorithms are far too liberal and find too many words that supposedly sound
similar. It may be true that all vowels are pronounced as schwa in certain
cases, but completely eliminating vowels from the soundex comparison is going
a little too far . The consonant groups in the soundex algorithm are too
general while at the same time ignoring consonants that sound alike over a
limited bandwidth channel such as "F" and "S".<P>
If you need a shorter wordlist for any purpose please use words from the
beginning of the list. It is sorted according to my ranking for word
quality. <P>
The phonetic pronunciation database in the Moby wordlist has been particularly
useful in finding soundalikes by comparing the distance between the phonetic
representations rather than the standard spelling. <P>
Actually, not all words are 4 to 7 letters long. 7 extra words with 3 letters
each are used for encoding 24 bit remainders i.e. when the encoded data length
is 3 modulu 4. <P>
<A name="help">
<H2>How you can help</H2>
You can help improve the wordlist by suggesting new words. Finding new words
that meet the criteria is not easy. As I approached my target of 1626 words
I found it was becoming asymptotically more difficult to find new words.<P>
If you have a word to suggest:
<LI>Verify that the word does not appear in <A href="/web/20090918202746/">the list</A>.
<LI>Check if the word already appears in the <A href="/web/20090918202746/">rejects</A> list.
<LI>Evaluate the word according to the above criteria.
<LI>Send me email.
I would also appreciate your opinion about words already in the list. Remember
that if you find a word that you think shouldn't be there I can't just remove
it - I need a replacement first. If English is not your native language I
would especially like to hear your opinion about the usability of the list.
If you can show the list to potential users who know even less English than
you it would be even better.<P>
The list is quite long. If you only have time for reviewing a small part make
sure it isn't the beginning of the list. I don't want to get only comments
about the first words in the list. To display a random selection of words you
can use this command:
head --bytes nnn /dev/urandom | mnencode
I must say it was quite frustrating to find so many good 8 letter
international words, but the 7 letter limit is still one of my primary criteria
that have no exceptions. If you feel that increasing the word quality and
making the words more international is important enough to reconsider 8 letter
words please tell me.<P>
<A name="resources">
<H2>Resources used in compiling this wordlist</H2>
<LI><A href="/web/20090918202746/">The Moby lexicon project</A> / Grady Ward <P>
<LI><A href="/web/20090918202746/">The stolfi wordlists</A> / Jorge Stolfi and the original wordlist authors<BR>
The wordlists in different languages were very useful in finding international words by intersection.<P>
<LI><A href="/web/20090918202746/">The wordlist collection at Oxford</A> / administered by Paul Leyland <P>
<LI><A href="/web/20090918202746/">British National Corpus</A> / frequency data by Adam Kilgarriff <BR>
This database has both word frequency data and part-of-speech classification.
It could have saved me a lot of work if I had found it sooner... <P>
<!-- -->
<LI><A href="/web/20090918202746/">The General Service List</A> / John Bauman and Brent Culligan <P>
<LI><A href="/web/20090918202746/">Behind The Name - Name lists and etymological data</A> / Mike Campbell <P>
<LI><A href="/web/20090918202746/">One Time Password</A> / Neil Haller et al.<BR>
<LI><A href="/web/20090918202746/">The Diceware passphrase generator</A> / Arnold G. Reinhold<P>
<LI><A href="/web/20090918202746/">Interlingua</A><P>
<LI><A href="/web/20090918202746/">Basic English</A><P>
<LI><A href="/web/20090918202746/">Name frequency reports from U.S. census data.</A><P>
<LI><A href="/web/20090918202746/">The CIA world factbook</A><P>
<LI>Name frequency reports in various languages from "baby name" sites.<P>
<LI>Various geographic references<P>
<A href="/web/20090918202746/">back to homepage</A>
acrobat africa alaska albert albino album
alcohol alex alpha amadeus amanda amazon
america analog animal antenna antonio apollo
april aroma artist aspirin athlete atlas
banana bandit banjo bikini bingo bonus
camera canada carbon casino catalog cinema
citizen cobra comet compact complex context
credit critic crystal culture david delta
dialog diploma doctor domino dragon drama
extra fabric final focus forum galaxy
gallery global harmony hotel humor index
japan kilo lemon liter lotus mango
melon menu meter metro mineral model
music object piano pirate plastic radio
report signal sport studio subject super
tango taxi tempo tennis textile tokyo
total tourist video visa academy alfred
atlanta atomic barbara bazaar brother budget
cabaret cadet candle capsule caviar channel
chapter circle cobalt comrade condor crimson
cyclone darwin declare denver desert divide
dolby domain double eagle echo eclipse
editor educate edward effect electra emerald
emotion empire eternal evening exhibit expand
explore extreme ferrari forget freedom friday
fuji galileo genesis gravity habitat hamlet
harlem helium holiday hunter ibiza iceberg
imagine infant isotope jackson jamaica jasmine
java jessica kitchen lazarus letter license
lithium loyal lucky magenta manual marble
maxwell mayor monarch monday money morning
mother mystery native nectar nelson network
nikita nobel nobody nominal norway nothing
number october office oliver opinion option
order outside package pandora panther papa
pattern pedro pencil people phantom philips
pioneer pluto podium portal potato process
proxy pupil python quality quarter quiet
rabbit radical radius rainbow ramirez ravioli
raymond respect respond result resume richard
river roger roman rondo sabrina salary
salsa sample samuel saturn savage scarlet
scorpio sector serpent shampoo sharon silence
simple society sonar sonata soprano sparta
spider sponsor abraham action active actor
adam address admiral adrian agenda agent
airline airport alabama aladdin alarm algebra
alibi alice alien almond alpine amber
amigo ammonia analyze anatomy angel annual
answer apple archive arctic arena arizona
armada arnold arsenal arthur asia aspect
athena audio august austria avenue average
axiom aztec bagel baker balance ballad
ballet bambino bamboo baron basic basket
battery belgium benefit berlin bermuda bernard
bicycle binary biology bishop blitz block
blonde bonjour boris boston bottle boxer
brandy bravo brazil bridge british bronze
brown bruce bruno brush burger burma
cabinet cactus cafe cairo calypso camel
campus canal cannon canoe cantina canvas
canyon capital caramel caravan career cargo
carlo carol carpet cartel cartoon castle
castro cecilia cement center century ceramic
chamber chance change chaos charlie charm
charter cheese chef chemist cherry chess
chicago chicken chief china cigar circus
city clara classic claudia clean client
climax clinic clock club cockpit coconut
cola collect colombo colony color combat
comedy command company concert connect consul
contact contour control convert copy corner
corona correct cosmos couple courage cowboy
craft crash cricket crown cuba dallas
dance daniel decade decimal degree delete
deliver delphi deluxe demand demo denmark
derby design detect develop diagram diamond
diana diego diesel diet digital dilemma
direct disco disney distant dollar dolphin
donald drink driver dublin duet dynamic
earth east ecology economy edgar egypt
elastic elegant element elite elvis email
empty energy engine english episode equator
escape escort ethnic europe everest evident
exact example exit exotic export express
factor falcon family fantasy fashion fiber
fiction fidel fiesta figure film filter
finance finish finland first flag flash
florida flower fluid flute folio ford
forest formal formula fortune forward fragile
france frank fresh friend frozen future
gabriel gamma garage garcia garden garlic
gemini general genetic genius germany gloria
gold golf gondola gong good gordon
gorilla grand granite graph green group
guide guitar guru hand happy harbor
harvard havana hawaii helena hello henry
hilton history horizon house human icon
idea igloo igor image impact import
india indigo input insect instant iris
italian jacket jacob jaguar janet jargon
jazz jeep john joker jordan judo
jumbo june jungle junior jupiter karate
karma kayak kermit king koala korea
labor lady lagoon laptop laser latin
lava lecture left legal level lexicon
liberal libra lily limbo limit linda
linear lion liquid little llama lobby
lobster local logic logo lola london
lucas lunar machine macro madam madonna
madrid maestro magic magnet magnum mailbox
major mama mambo manager manila marco
marina market mars martin marvin mary
master matrix maximum media medical mega
melody memo mental mentor mercury message
metal meteor method mexico miami micro
milk million minimum minus minute miracle
mirage miranda mister mixer mobile modem
modern modular moment monaco monica monitor
mono monster montana morgan motel motif
motor mozart multi museum mustang natural
neon nepal neptune nerve neutral nevada
news next ninja nirvana normal nova
novel nuclear numeric nylon oasis observe
ocean octopus olivia olympic omega opera
optic optimal orange orbit organic orient
origin orlando oscar oxford oxygen ozone
pablo pacific pagoda palace pamela panama
pancake panda panel panic paradox pardon
paris parker parking parody partner passage
passive pasta pastel patent patient patriot
patrol pegasus pelican penguin pepper percent
perfect perfume period permit person peru
phone photo picasso picnic picture pigment
pilgrim pilot pixel pizza planet plasma
plaza pocket poem poetic poker polaris
police politic polo polygon pony popcorn
popular postage precise prefix premium present
price prince printer prism private prize
product profile program project protect proton
public pulse puma pump pyramid queen
radar ralph random rapid rebel record
recycle reflex reform regard regular relax
reptile reverse ricardo right ringo risk
ritual robert robot rocket rodeo romeo
royal russian safari salad salami salmon
salon salute samba sandra santana sardine
school scoop scratch screen script scroll
second secret section segment select seminar
senator senior sensor serial service shadow
sharp sheriff shock short shrink sierra
silicon silk silver similar simon single
siren slang slogan smart smoke snake
social soda solar solid solo sonic
source soviet special speed sphere spiral
spirit spring static status stereo stone
stop street strong student style sultan
susan sushi suzuki switch symbol system
tactic tahiti talent tarzan telex texas
theory thermos tiger titanic tomato topic
tornado toronto torpedo totem tractor traffic
transit trapeze travel tribal trick trident
trilogy tripod tropic trumpet tulip tuna
turbo twist ultra uniform union uranium
vacuum valid vampire vanilla vatican velvet
ventura venus vertigo veteran victor vienna
viking village vincent violet violin virtual
virus vision visitor visual vitamin viva
vocal vodka volcano voltage volume voyage
water weekend welcome western window winter
wizard wolf world xray yankee yoga
yogurt yoyo zebra zero zigzag zipper
zodiac zoom acid adios agatha alamo
alert almanac aloha andrea anita arcade
aurora avalon baby baggage balloon bank
basil begin biscuit blue bombay botanic
brain brenda brigade cable calibre carmen
cello celtic chariot chrome citrus civil
cloud combine common cool copper coral
crater cubic cupid cycle depend door
dream dynasty edison edition enigma equal
eric event evita exodus extend famous
farmer food fossil frog fruit geneva
gentle george giant gilbert gossip gram
greek grille hammer harvest hazard heaven
herbert heroic hexagon husband immune inca
inch initial isabel ivory jason jerome
joel joshua journal judge juliet jump
justice kimono kinetic leonid leopard lima
maze medusa member memphis michael miguel
milan mile miller mimic mimosa mission
monkey moral moses mouse nancy natasha
nebula nickel nina noise orchid oregano
origami orinoco orion othello paper paprika
prelude prepare pretend promise prosper provide
puzzle remote repair reply rival riviera
robin rose rover rudolf saga sahara
scholar shelter ship shoe sigma sister
sleep smile spain spark split spray
square stadium star storm story strange
stretch stuart subway sugar sulfur summer
survive sweet swim table taboo target
teacher telecom temple tibet ticket tina
today toga tommy tower trivial tunnel
turtle twin uncle unicorn unique update
valery vega version voodoo warning william
wonder year yellow young absent absorb
absurd accent alfonso alias ambient anagram
andy anvil appear apropos archer ariel
armor arrow austin avatar axis baboon
bahama bali balsa barcode bazooka beach
beast beatles beauty before benny betty
between beyond billy bison blast bless
bogart bonanza book border brave bread
break broken bucket buenos buffalo bundle
button buzzer byte caesar camilla canary
candid carrot cave chant child choice
chris cipher clarion clark clever cliff
clone conan conduct congo costume cotton
cover crack current danube data decide
deposit desire detail dexter dinner donor
druid drum easy eddie enjoy enrico
epoxy erosion except exile explain fame
fast father felix field fiona fire
fish flame flex flipper float flood
floor forbid forever fractal frame freddie
front fuel gallop game garbo gate
gelatin gibson ginger giraffe gizmo glass
goblin gopher grace gray gregory grid
griffin ground guest gustav gyro hair
halt harris heart heavy herman hippie
hobby honey hope horse hostel hydro
imitate info ingrid inside invent invest
invite ivan james jester jimmy join
joseph juice julius july kansas karl
kevin kiwi ladder lake laura learn
legacy legend lesson life light list
locate lopez lorenzo love lunch malta
mammal margin margo marion mask match
mayday meaning mercy middle mike mirror
modest morph morris mystic nadia nato
navy needle neuron never newton nice
night nissan nitro nixon north oberon
octavia ohio olga open opus orca
oval owner page paint palma parent
parlor parole paul peace pearl perform
phoenix phrase pierre pinball place plate
plato plume pogo point polka poncho
powder prague press presto pretty prime
promo quest quick quiz quota race
rachel raja ranger region remark rent
reward rhino ribbon rider road rodent
round rubber ruby rufus sabine saddle
sailor saint salt scale scuba season
secure shake shallow shannon shave shelf
sherman shine shirt side sinatra sincere
size slalom slow small snow sofia
song sound south speech spell spend
spoon stage stamp stand state stella
stick sting stock store sunday sunset
support supreme sweden swing tape tavern
think thomas tictac time toast tobacco
tonight torch torso touch toyota trade
tribune trinity triton truck trust type
under unit urban urgent user value
vendor venice verona vibrate virgo visible
vista vital voice vortex waiter watch
wave weather wedding wheel whiskey wisdom
android annex armani cake confide deal
define dispute genuine idiom impress include
ironic null nurse obscure prefer prodigy
ego fax jet job rio ski
