publik-void/Should indexing be 0-based or 1-based.adoc

## Should indexing be 0-based or 1-based.adoc

      
    Raw
  

              Should indexing be 0-based or 1-based.adoc
            
          
    Should numbering start at 0 or 1?


Arguments for 1-based indexing and/or against 0-based indexing


The number which denotes the index of an element is equal to the number one
would get by counting up to (and including) that element.


We are very used to labeling elements with their corresponding count number.
This means 1-based indexing has a lot of inertia in our everyday lives and we
need to build new intuition when working with 0-based indexing. This is the main
source of the classic "off by one" error.  This argument is only valid if
switching to 0-based indexing hurts more in the short-term than it helps in the
long run.


A big amount of literature uses 1-based indexing, since it is the traditional
indexing scheme in mathematics. Translating formulas to 0-based indexing can be
cumbersome and a convention shift in all of mathematics does not happen overnight.
Computer algebra systems could massively mitigate this issue (and many more),
though.


Arguments for 0-based indexing and/or against 1-based indexing


Counting and numbering are two distinct concepts. The fact that an element is
the third one in a sequence does not necessarily imply that it is the element
number three. Taking this implication for granted might come as a convenience
sometimes, but it also tends to obscure the fact that there is a distinction at
all. The two concepts are called ordinal and cardinal numbers. The letter A is
the first of the alphabet, not the Ath. And there is no such thing as
"zeroth" – that is unless we want to change this convention in our natural
languages as well.


Continuous units start at 0. The day does not start at 1 o’clock and a baby is
not born at age 1.

These units are obviously still a thing in discrete domains: Lists of indexed
things (i.e. tuples, arrays, etc.) often represent measurements over time,
space, etc. or get interpolated (think of image oversampling or rotation, lookup
tables, plots of points joined by lines, and so on). Discrete signals have a
first frequency of 0 (also called DC). Polynomials have a first term which
is ax⁰.

Discrete data that do not correspond to units, i.e. ordinal or nominal data,
have no reason in the first place to justify one numbering scheme over the
other.  Think of a building where the ground floor has the number 1, or a book
where the first section is 0.0.0.  You can’t really count the sections of that
book or the stories of a building with a basement easily by means of any
specific numbering scheme and you’d be unlikely to do any other arithmetic on
these values. For all intents and purposes, the stories of a building could as
well be labeled Alice, Bob, Carol, and so on.


Index arithmetic becomes a lot simpler. Here are a few examples:


Indices and index offsets become effectively the same. Adding two indices i
and j together can be done as i + j instead of i + j - 1. This simplifies
nested arrays, for example. It also simplifies intuition in the sense that an
index is merely an offset relative to the first element.


0-based indices compose better with multidimensional arrays. For instance,
consider an n×m matrix stored in a one-dimensional buffer.  If we want to
access the element in the third row and the seventh column, the corresponding
index is 2 * n + 6, 2 * m + 6, 6 * n + 2 or 6 * m + 2, depending on how
exactly we store the matrix. The point is that the index calculation is of the
form i * n + j instead of ((i - 1) * n + (j - 1)) + 1. Similarly, an index
i which represents an element in the one-dimensional buffer can be converted
to the row and column indices with {div(i, n), mod(i, n)}. This would change
into {div(i - 1, n) + 1 , mod(i - 1, n) + 1} for 1-based indices.


Related to the previous bullet point: Every time a cyclic domain is sampled
at discrete points, there is a need for some sort of a modulo operation when
iterating past the end of one period. Examples include the complex roots of
unity, wavetable oscillators, discrete Fourier transforms, FIFOs implemented as
ring buffers, and many more. The modulo operation mod(i, n) returns 0 when i
and n are equal, which is the convention because then div(i, n) + mod(i, n)
== i holds if div is the integer division.


In some cases (such as tree structures), 1-based indexing can be beneficial,
since 1 is the neutral element of multiplication while 0 is the neutral element
of addition. These are usually less common, though. For array-like structures,
we mostly perform addition on indices.


Defining a range (subsequence) as i:j where i determines the first element
in the range and j determines the first element past the last element of the
range (i.e. as half-open intervals) ties in naturally with 0-based indexing.
Then, a range 0:n fully covers an array containing n elements. For 1-based
indexing, one would either need to do 0:n (where the first index determines
the last element before the first element of the range) or 1:n+1. These are
both as "unintiutive" as the leap from 1-based to 0-based indexing was in the
first place. + Whether ranges should be defined in this way is of course a
subjective question of its own. Here are some arguments for doing so:


The range 2:2 is the empty range, 2:3 contains exactly the third element
(which has the index 2), and so on. Adjacent ranges can then be written as i:j,
j:k, …. The common alternative would be i:j-1, j:k-1, … for ranges where
the second index determines the last element of the range.


Also, the length of such a range i:j is j-i instead of j-i+1.


Leading zeros become more consistent. Let’s say your number format includes
n leading zeros and m significant figures, i.e. for the index 00042, n =
3 and m = 2. For such a number format, m = 0 still makes sense, since the
default value is obviously zero. Granted, this is an unusual corner case, but
it improves composability.  Furthermore, the range of possible index values is
always ordered in the same way as ℤ for all values of m. There is no
"wrapover" from 9 to 0, 99 to 00, etc. since the index always starts with zeros
and ends with the highest digit in all significant figures (9 in the decimal
number system).

As a result, there is one more usable ordered index value in total when
including 0. This might be beneficial if, for instance, the index is stored
using a very short binary word (e.g. 8-bit unsigned integer).


Irrelevant arguments


"We are humans, not computers." 0-based numbering is exactly as human as
1-based numbering. This is merely a question of intuition. Also, please consider
the Church–Turing thesis.


"But product/system/language/feature x uses the 1-based scheme!" as in "My
computer keyboard starts at 1!". This is only a convention, and it is exactly
the convention questioned by this very document. The inertia argument for
1-based numbering is already covered above.


"But I don’t want to write n-1 everywhere!" You don’t need to. You also
don’t need to write i:j-1, j:k-1, … for adjacent sequences. This boils down
to a related question about notation which is covered above.


Many high-level languages tend to avoid indices altogether in favor of other
ways to express iteration. While this reduces the importance of the choice of
indexing scheme, it does not really provide any insight into which one is better
in principle.


There are arguments to be made about using both styles as needed, possibly
even allowing way more arbitrary indexing schemes, and introducing language
constructs which make the distinctions clear. This document is intended to be a
discussion about which style should be chosen in cases where one particular
default is needed.


0-based indexing reflects the way the elements are laid out in memory, meaning
no additional arithmetic needs to be performed. In C for instance, *p == p[0]
holds true. This is not really an argument for (or against) 0-based indexing
in my opinion, since optimizing compilers should be able to remove the overhead
from 1-based indexing almost entirely and in that case, any implementation
details should ideally be separate from the language concept.


Some languages use the index 0 as a reference to a special value, such as the
head of an expression or an indication of an error. However, this has most often
proven error-prone or otherwise problematic. Hence this is not an argument for
either indexing scheme.


It often makes sense to initialize memory or variables to 0 by default. Then,
an index variable automatically corresponds to the first element by default when
using 0-based indexing. An index variable automatically starts as invalid when
using 1-based indexing. Either of these could be beneficial or problematic.


Context


This is one of those old (and in fact, not that important) convention fights
like τ vs. π or tabs vs. spaces. But even though the subject has been beaten
to death for ages by now, there still doesn’t seem to be a real consensus.


In computer languages, 1-based indexing is often seen in high-level, "more
scientific" languages such as Matlab, R, Mathematica, Julia, Erlang, APL. The
usual argument for 1-based indexing is that it is "more intuitive". I would
argue intuition is something which is learnable and as such should be trained to
whatever reduces the amount of cognitive overhead the most (be it across a broad
range of domains or in a very specific domain).


In this document, I want to compile a list of arguments for or against either of
the two indexing schemes. Other indexing schemes will not be discussed in this
particular document.


Of course I am aware of
this
infamous stance by Dijkstra. I felt the need for a more comprehensive list of
considerations regarding the topic.


You may notice that I am a proponent of 0-based indexing at this point in time
and that I may or may not be biased due to my background in signal processing.


If you have suggestions or arguments for either side, feel free to contribute to
this document by forking, commenting, etc.