Why Do Computers Count From Zero?

Image of an abacus

When we look at any Java code using arrays, we’d be convinced that computers count starting from zero, as opposed to people who count from one.

Well, actually everyone counts from zero.

When we calculate a distance from one place to another, we always starting counting at zero.

We count floors in a building in the same way. The ground floor is zero because we haven’t gone up or down. The ground floor is the first floor we come to; it’s our starting position. We start counting from zero when we enter the building on the ground floor, and we add floors as we go up (and subtract as we go down).

Let’s also think about birthdays and age. We don’t say a person is one year old on the day they are born. We say they’re one year old a year later on their birthday. So we start counting from zero in practical real-life terms.

So the topic (and title) of this post should really be, “Why Do Computers Index Arrays From Zero?”. Which is a totally different and much more relevant topic.

Have you ever wondered why it’s done in this way? Are there historical reasons for it? Are there any advantages in counting from zero?

So Why Do Computers Index Arrays From Zero?

The first element of an array in C-based programming languages (C, C++, C#, Java, etc.) is at position (or index) zero, and not one.

As I said before, we count from zero all the time in the real world. Using count-from-zero indexing answers the question, “How many positions after the start of the array is the element we want?”. We don’t say that the first element is one position after the start. It is the start.

No one says the “zeroth” element. We say the first element is at index 0. Think of the index as how far an element is offset from the first position. The first element is then at the first position, so it doesn’t have an offset, and its index is 0. The second element is one element after the first, so it’s offset by 1 element and is then at index 1, etc.

C doesn’t have array objects like Java has. C only works with pointers. The C compiler translates an array expression into the appropriate pointer-based expression.

Given the following C code:

int array[10];

array[0] = 123;
array[1] = 456;
array[2] = 789;

Accessing the array elements with the [] gets translated by the compiler into:

*(array + 0 * sizeof(int)) = 123;
*(array + 1 * sizeof(int)) = 456;
*(array + 2 * sizeof(int)) = 789;

The asterisk (*) here is the indirection or dereference operator. It accesses the value of the variable whose address is to the right of the asterisk. So it dereferences the address to get the value stored there.

Using the array name by itself in an expression, i.e. array, gives us the address of the first element of the array.

Any array index/subscript expression array[n] is translated to *(array + n * sizeof(element_type))

In C arrays are just syntactic sugar over pointers. The index is interpreted as an offset from the beginning of the array. array[n] and *(array + n * sizeof(element_type)) works for all indexes (positive, zero, or negative), but only if the index numbering starts from 0.

We must never think of the index as being the position of an element in an array. Rather we must think of it as being the number of elements before that element in the array.

What If The Index Started At One?

Counting arrays from index 0 simplifies the calculation of the memory address of each element.

Remember that the array index expression array[n] is translated to *(array + n * sizeof(element_type)).

If we wanted to index the first element using one instead of zero, we’d have to do an extra subtraction, as follows:

*(array + (n-1) * sizeof(element_type))

This isn’t a big difference, but it adds an unnecessary subtraction whenever accessing an array element.

When the memory management system allocates memory through a call to say C’s malloc() function or Java’s new call, we get back the address where the allocated chunk of memory starts. If a C compiler sees something like array[n] it has to first calculate the address of the expression, then get the value stored at that address. If the indexes start at 0, the calculation is array + n*size. If the indexes start at 1, the calculation is array + (n-1)*size. So array[1] will correspond to array + (1-1)*size, which is array itself.

Arrays starting at zero allows a better representation of ranges. Expressing a range of indexes makes the most sense if we include the lower bound and exclude the upper bound, e.g. for (int i=0; i<length;++i). It’s more natural to start at zero when we use exclusive end bounds. It avoids extra additions and subtractions.

Do All Computer Languages Index Arrays From Zero?

Many languages are descendants of or heavily influenced by C. These languages all use zero-based indexing. But zero-based indexing is not universal across all computer languages; it’s just a very common convention.

A number of languages start at 1, e.g. COBOL, Fortran and Basic. A number of languages allow the programmer to choose whether to start at 0 or 1. Some languages even allow any starting index number (even negative); whatever makes sense to the programmer to match the required algorithm.

Having the index start from 1 requires an extra calculation (the unnecessary subtraction). Some languages do the extra calculations because they feel that counting from one is more intuitive for many programmers.

Fortran uses 1-based arrays, which works well when we’re writing mathematical algorithms using vectors and matrices, where indices naturally go 1 .. N. But as soon as we start coding algorithms such as searching and sorting, or allocating and freeing chunks of memory, then using 1-based arrays becomes very confusing. This easily leads to off-by-one errors, which can be a nightmare to debug.

Ending Off

Just out of interest, let’s go back to the birthday example. Surprisingly enough, not everyone counts age from zero. In some East Asian countries (China, Japan, Korea and Vietnam), age was traditionally counted starting from 1 at birth, and incremented at each New Year rather than at each birthday. This practice has largely been dropped, except in some rural areas. See the Wikipedia article.

Edsger W. Dijkstra published an opinion article in 1982 explaining the reasons why numbering should count from zero. It’s a fairly philosophical computer science/mathematical article which makes for very interesting reading. Dijkstra wrote all of his more than 1300 articles by hand. You can find a scan of the beautifully hand-written article here.

Was this post interesting? Please share your comments, and as always, stay safe and keep learning!

Leave a Comment

Your email address will not be published. Required fields are marked *

Thank You

We're Excited!

Thank you for completing the form. We're excited that you have chosen to contact us about training. We will process the information as soon as we can, and we will do our best to contact you within 1 working day. (Please note that our offices are closed over weekends and public holidays.)

Don't Worry

Our privacy policy ensures your data is safe: Incus Data does not sell or otherwise distribute email addresses. We will not divulge your personal information to anyone unless specifically authorised by you.

If you need any further information, please contact us on tel: (27) 12-666-2020 or email info@incusdata.com

How can we help you?

Let us contact you about your training requirements. Just fill in a few details, and we’ll get right back to you.