It is important to understand these, because they’re the backbone of modern databases. It is also important to understand the notion of database index.
In this post, we will go through two types of data structures, arrays and trees. There is also another important data structure type known as Hash Table, but we will cover that in another post!
The two dimensional array is the simplest data structure. A table can be seen as an array.
Each row represents a subject, and columns the features that describe the subjects. Each column stores a certain type of data, such an integer, string, date…etc
Let’s say there’s a table array displaying worker information. if you wanted to find all the people who worked in the UK, you’ll have to look at each row to find if that row belongs to the UK.
*This will cost you N operations* (With N being the number of row).
It’s not a bad solutions – but could there be a faster way? This is where trees come into play. Though, most modern databases do provide advanced arrays to store tables efficiently like heap-organized tables or index-organized tables. But this does not change the problem of fast searching for a specific condition with a group of columns.
A binary search tree is a binary tree with a special property, the key in each node must be:
- Greater than all keys stored in the left sub-tree
- Smaller than all keys stored in the right sub-tree
Let’s look at a tree with N =15 elements. Let’s say within this tree, I’m looking for a specific key of 208:
I start with the root, compare it to 208 to see which is greater or smaller, then continue down the appropriate path for the tree.
Let’s say 208 > root.
This means we move to the right side of the tree.
If the next value is greater than 208, then we look at the left side of the sub tree.
We continue this comparison until the value is either found, or does not exist.
In the end, the search costs the amount of levels inside the tree, and there are log(N) levels. Look again at the tree:
Top root has starts with one, the root itself. As we move along the tree, the amount of keys in each row increases by a certain pattern:
Level 1: root
Level 2: two sub trees
Level 3: four sub trees
Level 4: eight sub trees
This means that there are log(N) levels. This means the cost of search is log(N) – a definite upgrade over using N operations if we directly used the array.
Do you see why this is efficient? It scales exponentially with bigger numbers.
It is similar to a Phonebook, though a phonebook is invariably faster since we also know the exact name of who we’re searching for.
Unfortunately, this tree does have problems. A big problem is when you have to get multiple elements between two values. It will cost O(N) because you’ll have to look at each node in the tree and check if it’s between these two values. This operation is also not disk I/O friendly.
I love computer science concepts, even though I never took a class on it. Though iOS developers don’t necessarily need it, it is almost good to have this knowledge in the back of our heads. I will continue exploring and learning more about various databases and sorting algorithms.