belisarius222/vars.md

## vars.md

      
    Raw
  

              vars.md
            
          
    It's useful to have standard names for types, but it's also useful to have standard names for variables.   In Hoon, we strive for unambigious and fast pronounceability.  If you read a Hoon program to me over the phone, I should be able to dictate it perfectly without wondering how something is spelled.  While this goal is probably not completely achievable, we can get a lot closer than we are right now.  I propose a scheme for getting close to this.
In many programming languages, single-letter variable names are common.  In Hoon, these are called "ultra-lapidary style".  These usually satisfy Hoon's pronunciation requirement, but they do not map naturally onto meaningful concepts except when the number of variables is very low.  For a comparator function, taking in 'a' and 'b' is fine.  But for arguments whose meaning is more semantic than positional, which is most of the time, a name should have some mnemonic connection to the value it signifies.
The natural way to do this is a full word, which should be considered good style in Hoon in many circumstances.  Shorter words are preferable because they are faster to subvocalize and they lessen the risk of hiding the geometric form of a written expression, which is crucial to rapid understanding of Hoon source.
In many cases, though, it's difficult to find a word that fully and accurately describes the variable in question.  For fields in a complex data structure, full words are to be preferred.  But for top-level variable names, variable names for standard constructs (such as the current duct), and short-lived local variables, we can do better.
In fact, we already have a standard for these, called "lapidary style": three letters, consonant-vowel-consonant.  This standard has mostly fallen out of use, but I contend that this is only because it was overused, not because it's a bad idea overall.  On the contrary, when I read old Hoon, some parts of it are much faster to read because of this naming scheme.  It only becomes a problem when the number of variable names in scope that aren't full words or standardized names (like 'sut' for subject) climbs high enough that some fall out of working memory.
English words should be used for all names of standalone types (i.e. in their own +$ definitions, not just used once to define a gate sample, which is more like a local variable) and all names of fields within those types, unless the field refers to a variable that has a standard name (like 'our' for this ship), in which case that should be used.  Using the type-based autonamer (=some-type) is fine for fields of type definitions as long as it's unambiguous.
The type-based autonamer (=some-type) is only appropriate when it's unlikely that a second instance of that type will show up in the same scope, since then the type name has been shadowed by the variable name.  Note that this gives an advantage to having standard names for long-lived singleton instances of common types, like hen for duct -- if a second duct variable appears, the type can be used directly without the type being shadowed.
Similarly, English words should be used for all nonstandard long-lived variables.  'fex' for a list of output moves is standard, and it's fine for a long core definition to include a bunted 'fex' at the top, because as soon as someone is familiar with Hoon, they can look at it and tell what it means.
Conversely, standard variable names should not be used to mean something else.  This is especially true of the more commonly used ones.  The number of standard variable names should therefore be small enough that an intermediate Hoon programmer doesn't have trouble remembering not to use them accidentally in other places. Any reasonably experienced C programmer knows that the variable i is likely to be used in a for (int i = 0; i<n; i++) expression, and is less likely to use it for other purposes.  This is good as long as the number of standard variable names is relatively low and easy to remember.
Here are the rules I propose for variables that are not full English words:

consonant-vowel-consonant
the consonants are not silent
unambiguous English pronunciation

The second of these two examples is to be preferred:
?~  some-long-variable-name=(~(get by my-long-state-name) my-long-map-key)
  [~ this-whole-big-core]
(my-prolix-function-name-with-extras thingamajig u.some-long-variable-name)
?~  val=(~(get by bag) key)
  [~ cor]
(fun thing u.val)
Counter-examples:

sih  (silent 'h')
ger  (ambiguous: 'jer' or 'gur')
tea  (no final consonant)
old  (vowel-consonant-consonant)

In general, mnemonic connection to an English word is to be preferred.  Taking a full English word and shortening it to fit these conventions should be encouraged.  Here's an algorithm for doing that:

Take the first consonant and first vowel, then the first consonant
after that vowel.
If the initial consonant-vowel combination or final vowel-consonant
combination has ambiguous pronunciation
(e.g. 'gi' or 'er'), then try changing the vowel ('er' -> 'ar').  If that doesn't work, try softening the consonant ('gi' -> 'ji') or specializing the consonant ('ci' -> 'ki' or 'ce' -> 'se').
If that yields a silent final consonant, try the next consonant in the
word.
If the result still has ambiguous pronunciation or silent final
consonant, try softening or specializing the final consonant.
If this fails, try shortening a different, related word.
If you can't think of a related word, make up a random valid
three-letter name.

Random three-letter names are not much better than single-letter names, but because there are more possibilities, they're more likely to be unique and therefore memorable, and there's merit in consistency of using three letters, so I think it's worth sticking to three-letter names unless single-letter names would be clearer, like in a comparator.
There should never be more than three or so nonstandard three-letter variables in use at once; it's ok for there to be strictly more than that in scope as long as all but three or so of them aren't used after a certain point.  This mostly matters for a sequence of =+'s.
The following is ok, because at no point is the reader required to hold all the variables in working memory:
=+  reb=4
=+  fyn=5
=+  teg=(add reb fyn)
=+  lun=6
=+  sog=(sub teg lun)
[teg sog]
However, it might be the case that 'teg' and 'sog' deserve English words as their names, since they could be meaningful intermediate values that could be named more descriptively.  This is also good:
=+  reb=4
=+  fyn=5
=+  datum=(add reb fyn)
=+  lun=6
=+  loss=(sub datum lun)
[datum loss]
Plurals should end in z, x, or s; other variable names should not, except for shortened words that end in 'th', such as 'path'.
Three-letter English words may also be used as variable names, but acronyms and contractions should not be used.
I think it would be too draconian to require that a variable name not be used in @p, although it would clearly be at least a little better not to mix those namespaces, especially since people don't commonly pronounce the ~ at the start of @p's.  Editor support could be helpful here, but I don't think this should be a hard requirement.  That being said, at least don't use 'zod' as a variable name.  The same applies for syllable names like 'pat' or 'bar'.
Proposed standard names for common variables (I'm not attached to these, and we might want to keep some standards that we've gotten used to):
duct:     dug
effects:  fex
state:    bag
bowl:     bol
wire:     vyr
path:     pax
beam:     bem
beak:     bek
desk:     dek
head:     hed
tail:     tal
core:     cor
gate:     gat
type:     typ
mold:     mol
subject:  sut
sample:   sam
context:  con
formula:  fol
vase:     vax
hoon:     hun
text:     tex
vane:     van
event:    eve
ovum:     ovo
ship:     our, bud/pal/fam/cuz?
time:     now
entropy:  eny
scry:     sky
goof:     guf
tank:     tan
tang:     taz
task:     tak
sign:     syn
service:  sev
old:      old
new:      new
key:      key
value:    val
first:    one
second:   two
third:    tre
fourth:   qua
fifth:    pen