Now for something completely different from the last few tips.
I was teaching the Java Collection API the other day, and obviously the question came up about how do we choose which collection class to use for a particular use case.
Our choice is obviously determined by the specific use case. Are we accessing each element often or only once? Does the collection grow and shrink a lot? Do we need the elements to be sorted? Do we have limited memory? How important is speed? That’s the role of Big O notation – something programmers should know about.
What is Big O notation?
Big O notation is one of the tools that we as computer scientists and software developers can use to analyse the cost of an algorithm.
I’m not going to quote a Wikipedia or computer science definition of the term. These get way too technical.
Simply put, Big O notation describes the speed or complexity of an algorithm in algebraic terms. Collections implement algorithms internally, so we can think of the terms algorithm and collection as interchangeable here. If we use a particular algorithm/collection, it’s important that we know how slow or fast it is compared to other options.
Big O notation tells us the maximum number of operations an algorithm will make as the size of the input grows. This is the upper bound, or the worst-case scenario. Big O notation only provides an upper bound on the growth rate of the function.
Big O notation gets its name from the capital letter O (or “big” O) in front of the number representing the operations. The letter O is used because the growth rate of a function is also referred to as the order of the function. Different algorithms with the same growth rate can be represented with the same Big O notation.
Big O notation doesn’t tell us exactly how fast an algorithm will run in seconds. There are way too many other factors that affect the actual speed. We use it to compare different algorithms by the number of operations they make.
Example: Sequential Search
Let’s say we want to search for a particular element in a collection (or a particular record in a database table). If we used a simple sequential search, we can easily reason that, in the worst case, we’ll have to search through every single element to find the one we’re looking for. If there are n elements, a sequential search will take n operations to run. This will be referred to as O(n).
If we run the sequential search, it may happen that the element we want is the very first element in the collection. We were lucky – we didn’t need to look at every element because we found it on our first try.
So did this algorithm actually take O(n) time? Didn’t it take O(1) time because we found it on the first try?
Here 0(1) is the best-case scenario. We were just lucky. Big O notation focuses on the worst-case scenario, which is 0(n) for a simple sequential search. O(n) reassures us that a simple search will never be worse than O(n) time.
Common Algorithm Forms
Some of the most common forms of algorithm efficiency are:
-
Constant: This means that a constant number of instructions is executed regardless of the number of data elements. This is referred to as O(1). Examples would be an ideal hash table lookup or finding an array element by index.
-
Logarithmic: If we double the number of data elements, the number of operations only increases by a fixed number. This is O(log n). Examples would be a binary search on a sorted array or accessing/searching a balanced binary tree.
-
Linear: If we double the number of data elements, the number of operations also doubles. This is O(n). Examples would be a linear (sequential) search or a string comparison.
-
Quadratic: If we double the number of data elements, the number of operations increases by a factor of four. This is O(n2). Examples would be a bubble sort, a selection sort, or the worst case of a quicksort.
Extra, Extra, Read All About It!
Big-O notation is based on algorithm performance for large values of n. This means that any arithmetic constants in the algorithm are ignored and the logarithmic multipliers and powers dominate and are the only factors that determine the order. When n is small, the constant factor is important. As n grows, the O(n) algorithm takes less and less time relative to the O(n2) algorithm. Therefore for large values of n, constant factors don’t matter, just the Big-O bounds. Interestingly, for small values of n, the constants used in the algorithm may end up dominating over the the collection size.
Big-O notation classifies the algorithm itself, and not how it is physically implemented. An algorithm has the same rating, regardless of whether it is physically implemented in 10 instructions or 100 instructions per element. Similarly, an algorithm that requires either one pass or a number of passes for all of its n elements is still referred to as O(n).
Stay safe, and I’ll see you next week!
1 thought on “Big O Notation”
Pingback: Choosing a Java Collection • 2022 • Incus Data Programming Courses