When we look at any Java code using arrays, we’d be convinced that computers count starting from zero, as opposed to people who count from one.
Well, actually everyone counts from zero.
When we calculate a distance from one place to another, we always starting counting at zero.
We count floors in a building in the same way. The ground floor is zero because we haven’t gone up or down. The ground floor is the first floor we come to; it’s our starting position. We start counting from zero when we enter the building on the ground floor, and we add floors as we go up (and subtract as we go down).
Let’s also think about birthdays and age. We don’t say a person is one year old on the day they are born. We say they’re one year old a year later on their birthday. So we start counting from zero in practical real-life terms.
So the topic (and title) of this post should really be, “Why Do Computers Index Arrays From Zero?”. Which is a totally different and much more relevant topic.
Have you ever wondered why it’s done in this way? Are there historical reasons for it? Are there any advantages in counting from zero?
So Why Do Computers Index Arrays From Zero?
The first element of an array in C-based programming languages (C, C++, C#, Java, etc.) is at position (or index) zero, and not one.
As I said before, we count from zero all the time in the real world. Using count-from-zero indexing answers the question, “How many positions after the start of the array is the element we want?”. We don’t say that the first element is one position after the start. It is the start.
No one says the “zeroth” element. We say the first element is at index 0. Think of the index as how far an element is offset from the first position. The first element is then at the first position, so it doesn’t have an offset, and its index is 0. The second element is one element after the first, so it’s offset by 1 element and is then at index 1, etc.
C doesn’t have array objects like Java has. C only works with pointers. The C compiler translates an array expression into the appropriate pointer-based expression.
Given the following C code:
int array[10];
array[0] = 123;
array[1] = 456;
array[2] = 789;
Accessing the array elements with the []
gets translated by the compiler into:
*(array + 0 * sizeof(int)) = 123;
*(array + 1 * sizeof(int)) = 456;
*(array + 2 * sizeof(int)) = 789;
The asterisk (*
) here is the indirection or dereference operator. It accesses the value of the variable whose address is to the right of the asterisk. So it dereferences the address to get the value stored there.
Using the array name by itself in an expression, i.e. array
, gives us the address of the first element of the array.
Any array index/subscript expression array[n]
is translated to *(array + n * sizeof(element_type))
In C arrays are just syntactic sugar over pointers. The index is interpreted as an offset from the beginning of the array. array[n]
and *(array + n * sizeof(element_type))
works for all indexes (positive, zero, or negative), but only if the index numbering starts from 0.
We must never think of the index as being the position of an element in an array. Rather we must think of it as being the number of elements before that element in the array.
What If The Index Started At One?
Counting arrays from index 0 simplifies the calculation of the memory address of each element.
Remember that the array index expression array[n]
is translated to *(array + n * sizeof(element_type))
.
If we wanted to index the first element using one instead of zero, we’d have to do an extra subtraction, as follows:
*(array + (n-1) * sizeof(element_type))
This isn’t a big difference, but it adds an unnecessary subtraction whenever accessing an array element.
When the memory management system allocates memory through a call to say C’s malloc()
function or Java’s new
call, we get back the address where the allocated chunk of memory starts. If a C compiler sees something like array[n]
it has to first calculate the address of the expression, then get the value stored at that address. If the indexes start at 0, the calculation is array + n*size
. If the indexes start at 1, the calculation is array + (n-1)*size
. So array[1]
will correspond to array + (1-1)*size
, which is array
itself.
Arrays starting at zero allows a better representation of ranges. Expressing a range of indexes makes the most sense if we include the lower bound and exclude the upper bound, e.g. for (int i=0; i<length;++i)
. It’s more natural to start at zero when we use exclusive end bounds. It avoids extra additions and subtractions.
Do All Computer Languages Index Arrays From Zero?
Many languages are descendants of or heavily influenced by C. These languages all use zero-based indexing. But zero-based indexing is not universal across all computer languages; it’s just a very common convention.
A number of languages start at 1, e.g. COBOL, Fortran and Basic. A number of languages allow the programmer to choose whether to start at 0 or 1. Some languages even allow any starting index number (even negative); whatever makes sense to the programmer to match the required algorithm.
Having the index start from 1 requires an extra calculation (the unnecessary subtraction). Some languages do the extra calculations because they feel that counting from one is more intuitive for many programmers.
Fortran uses 1-based arrays, which works well when we’re writing mathematical algorithms using vectors and matrices, where indices naturally go 1 .. N
. But as soon as we start coding algorithms such as searching and sorting, or allocating and freeing chunks of memory, then using 1-based arrays becomes very confusing. This easily leads to off-by-one errors, which can be a nightmare to debug.
Ending Off
Just out of interest, let’s go back to the birthday example. Surprisingly enough, not everyone counts age from zero. In some East Asian countries (China, Japan, Korea and Vietnam), age was traditionally counted starting from 1 at birth, and incremented at each New Year rather than at each birthday. This practice has largely been dropped, except in some rural areas. See the Wikipedia article.
Edsger W. Dijkstra published an opinion article in 1982 explaining the reasons why numbering should count from zero. It’s a fairly philosophical computer science/mathematical article which makes for very interesting reading. Dijkstra wrote all of his more than 1300 articles by hand. You can find a scan of the beautifully hand-written article here.
Was this post interesting? Please share your comments, and as always, stay safe and keep learning!