Hashing
Hashing is the process of converting input data (of any size) into a fixed-size string of characters, typically a short string of letters and numbers, using a mathematical function called a hash function.
Purpose of Hashing
In system design, hashing is crucial for:
- Data lookup efficiency (e.g., using hash tables)
- Security (e.g., storing passwords or verifying data integrity)
- Data partitioning (e.g., consistent hashing in distributed systems)
- Load balancing (distributing requests to servers)
How Hashing Works
A hash function takes an input (e.g., a username) and returns a hash value (a fixed-size code). A good hash function ensures:
- Determinism – same input gives same output
- Uniformity – distributes input evenly across the output space
- Efficiency – fast computation
- Non-reversibility (in cryptographic hashing) – hard to reverse-engineer
Example with Hash Map
Problem: You want to store and retrieve user records by username quickly.
Solution: Use a hash table where the key is the username, and the value is the user’s data.
Input username: "alice123"
Hash function: hash("alice123") → 142
Store in table at index 142: users[142] = {name: "Alice", age: 25}
When retrieving:
Get hash("alice123") → 142
Fetch users[142]
This lookup is done in constant time O(1) on average.
Example of Password Storage
You should never store raw passwords. Instead, hash them:
Password: "mySecret123"
Hash: SHA256("mySecret123") = "a5dcb3b7e69..."
Store this hash in DB
When the user logs in, hash the provided password and compare it to the stored hash.
This keeps passwords safe even if the database is compromised.
Common Hash Functions
| Function | Use Case |
|---|---|
| MD5 | Fast but insecure (legacy) |
| SHA-1 | Deprecated (vulnerable) |
| SHA-256 | Secure password hashing |
| MurmurHash | Fast and non-cryptographic |
| CRC32 | Data integrity |
Hashing Caveats
- Collisions: Different inputs may produce the same hash. Good hash functions minimize this.
- Security: Cryptographic hash functions are needed for passwords or digital signatures.
- Hash table limitations: Poor hash functions may cause clustering and performance drops.