VOOZH about

URL: https://thenewstack.io/learning-storage-and-retrieval-with-a-hash-function/

⇱ Learning Storage and Retrieval with a Hash Function - The New Stack


TNS
SUBSCRIBE
Join our community of software engineering leaders and aspirational developers. Always stay in-the-know by getting the most important news and exclusive content delivered fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter in the past. Click the button below to open the re-subscribe form in a new tab. When you're done, simply close that tab and continue with this form to complete your subscription.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!

We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.

What’s next?

Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.

Follow TNS on your favorite social media networks.

Become a TNS follower on LinkedIn.

Check out the latest featured and trending stories while you wait for your first TNS newsletter.

PREV
1 of 2
NEXT
VOXPOP
As a JavaScript developer, what non-React tools do you use most often?
Angular
0%
Astro
0%
Svelte
0%
Vue.js
0%
Other
0%
I only use React
0%
I don't use JavaScript
0%
Thanks for your opinion! Subscribe below to get the final results, published exclusively in our TNS Update newsletter:
NEW! Try Stackie AI
From clobbered drafts to real-time sync
Apr 14th 2026 10:00am, by David Moore
TypeScript 6.0 RC arrives as a bridge to a faster future
Mar 14th 2026 9:00am, by Darryl K. Taft
Mastra empowers web devs to build AI agents in TypeScript
Jan 28th 2026 11:00am, by Loraine Lawson
2023-01-16 06:00:36
Learning Storage and Retrieval with a Hash Function
tutorial,
Frontend Development / Software Development

Learning Storage and Retrieval with a Hash Function

David Eastman channels Samuel Johnson's dictionary in order to explain arrays, hash tables, collisions, and associated computing concepts.
Jan 16th, 2023 6:00am by David Eastman
👁 Featued image for: Learning Storage and Retrieval with a Hash Function
Image via Shutterstock 

If you walk around the city of London, you won’t have too much trouble finding Gough Square, hidden behind Fleet Street. There stands the home of Dr. Samuel Johnson, who (among other things) wrote the first English Dictionary.

There is something surprisingly magical about a dictionary. There is probably only one long-ordered chain you know well, and have known well for a long time: your alphabet. This gives a dictionary a few remarkable characteristics. Each entry has just one key. And the indexing works only one way; so using only a dictionary, you can’t discover an unknown word just be knowing its meaning. Even a young child can look up Steatopygia, but finding that word simply by searching the entire book for “the state of having substantial levels of tissue on the buttocks and thighs” might be a very long task.

Some basics of computing have overlapping definitions and get obscured by their own ubiquity. While hashing has a simple definition, different aspects of it are in focus at different times. In today’s world, hash security is of interest, but in this article, we are looking into storing and retrieving items with a key. We’ll use elementary computing concepts, with a small degree of maths curiosity. It starts with the first hash table.

A hash function is any function that can be used to map data of arbitrary size to fixed-size values.

Arrays and Hash Tables

From a computing point of view, the English Dictionary is not quite so wonderful, as there was never any need to restrict its size. The English language was never created with efficiency in mind; if it was then the dictionary would start with the word “A”, and end with the word “ZZZZ”. Indeed, just four letters if used efficiently could represent all English words (with enough room for another language).

For a digital dictionary, we are interested in numeric keys and indices. We know that the simplest form of storage is a fixed-size array. For our example, we will play with a five-element array for storing the worrying mugshots of our five friends: Alan, Beth, Cath, Dave and Eddy. We want to retrieve the images using just their names; and eventually, delete them.

Now, we could use just two arrays: one with the image data, and one with a matching index holding the keys. The data array would need cells big enough to hold image data, while the index array just needs cells big enough to hold small strings.

In the example below, the key array holds the key “Alan” whose index matches the slightly angry dark-bearded visage in the image array above. I guess Midjourney was having a stressful day. This system is simple to maintain, as entries can be created and deleted easily in sync.

👁 Image

For our small example, this seems fine. But for a large array, the problem with this scheme is that as the dual entries increase, you have to potentially search for longer in order to find the right key. Worse, you could search the entire array just to confirm that the key wasn’t present. Once I start deleting entries, I also have to track disparate empty cells somehow. In other words, this system doesn’t scale well.

What we want is a function that sits between the nice human-readable key and an indexed array of images. Something that can look at the key and say “oh, I put your image data here”.

👁 Image

You might also recognize this model from valet parking. I give the attendant my car key, and in return, he gives me a ticket. Then some other person goes and parks my car. Later I give you the ticket, this matches my car details and my car is retrieved. Addition and retrieval take a fixed amount of time. (At least that is how I believe the system works; my only reference is “Ferris Bueller’s Day Off”.)

So, what is inside the “hash function”? Surprisingly it can be more or less anything. As long as it can produce valid index numbers.

Modulo Operation

The starting point for examples is always the modulo operation, as that precisely fits the bill — it provides a number guaranteed to be less than a given maximum.

In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another.

So if we have a dictionary with a five-slot array, we will need to output index positions of 0-4. As you can see, modulo 5 does this with any integer value:

17 mod 5 = 2
2345 mod 5 = 0
73 mod 5 = 3

I’m sure you grok the pattern. So we have our hash function, right? We should have no need for the index array — just get an index from our name key! Well, not quite.

Given that 17, 207, and 2347 all produce the same modulo result, this can’t possibly work on its own. Even for five random numbers, there will likely be some collisions. The only way to avoid collisions, certainly for a trivial algorithm, is to store the key as we were doing before. This isn’t so surprising really — for a fixed-size array, no algorithm could possibly perfectly fill our index without remembering what had gone before. This would be like a valet parking where the details of the car and where it is parked don’t get recorded anywhere. So our hash function is really saying “oh, start looking for your image data here.” What I’m describing is called open addressing.

I want keys to be the name of the person’s mugshot, e.g. “Cath” — but that isn’t a number. However, I can get a fairly unique number from a string. We could produce a value based on the positional encoding of numbers (think of English dictionary order, counting in base 26), but with an eye on our very simple modulo function let’s just add all the ASCII values of the characters together.

Alan = 65 + 108 + 97 + 110 = 380

Beth = 66 + 101 + 116 + 104 = 387

Cath = 67 + 97 + 116 + 104 = 384

Dave = 68 + 97 + 118 + 101 = 384

Eddy = 69 + 100 + 100 + 121 = 390

(Tip: If you are using a Mac, you can do both modulo and other lines of math directly in the Spotlight Search bar.)

Collisions

So, finally, let’s deal with collisions. If we store the key, then we can fulfill our expectation of keeping the information close to where we first looked, by simply putting it in the next available place down the key array. This method is called linear probing.

So now we have all the ingredients. First, we tell our hash algorithm that we want to store an image of “Alan.” It converts the string “Alan” to an integer, then uses modulo to produce an index into the key array. Then it adds the data, using the same index, to the value array.

👁 Image

Then we add ”Beth” in the same way. This time, the index is two.

👁 Image

When we try and add Eddy next, we get a potential collision with Alan’s entry. So we move to the next available slot in the index array:

👁 Image

To retrieve our data, we just follow the algorithm again, but this time to confirm the index. When we search for “Eddie,” we see “Alan” — so move on one. We didn’t have too far to go. The hashing algorithm told us where to start looking like it promised.

Conclusion

Naturally, this makes a little more sense for the larger arrays you would use in real applications. In modern computing languages, you don’t need to specify the size of the dictionary — they resize as needed. But for now, I think Johnson, and his cat Hodge, would be happy with their indirect contribution to computing.

👁 Image

TRENDING STORIES
David has been a London-based professional software developer with Oracle Corp. and British Telecom, and a consultant helping teams work in a more agile fashion. He wrote a book on UI design and has been writing technical articles ever since....
Read more from David Eastman
SHARE THIS STORY
TRENDING STORIES
SHARE THIS STORY
TRENDING STORIES
TNS DAILY NEWSLETTER Receive a free roundup of the most recent TNS articles in your inbox each day.
The New Stack does not sell your information or share it with unaffiliated third parties. By continuing, you agree to our Terms of Use and Privacy Policy.