Skip to main content

Hashing

Hashing is the process of converting input data (of any size) into a fixed-size string of characters, typically a short string of letters and numbers, using a mathematical function called a hash function.

Purpose of Hashing

In system design, hashing is crucial for:

  • Data lookup efficiency (e.g., using hash tables)
  • Security (e.g., storing passwords or verifying data integrity)
  • Data partitioning (e.g., consistent hashing in distributed systems)
  • Load balancing (distributing requests to servers)

How Hashing Works

A hash function takes an input (e.g., a username) and returns a hash value (a fixed-size code). A good hash function ensures:

  • Determinism – same input gives same output
  • Uniformity – distributes input evenly across the output space
  • Efficiency – fast computation
  • Non-reversibility (in cryptographic hashing) – hard to reverse-engineer

Example with Hash Map

Problem: You want to store and retrieve user records by username quickly.

Solution: Use a hash table where the key is the username, and the value is the user’s data.

Input username: "alice123"
Hash function: hash("alice123") → 142
Store in table at index 142: users[142] = {name: "Alice", age: 25}

When retrieving:

Get hash("alice123") → 142
Fetch users[142]

This lookup is done in constant time O(1) on average.

Example of Password Storage

You should never store raw passwords. Instead, hash them:

Password: "mySecret123"
Hash: SHA256("mySecret123") = "a5dcb3b7e69..."
Store this hash in DB

When the user logs in, hash the provided password and compare it to the stored hash.

This keeps passwords safe even if the database is compromised.

Common Hash Functions

FunctionUse Case
MD5Fast but insecure (legacy)
SHA-1Deprecated (vulnerable)
SHA-256Secure password hashing
MurmurHashFast and non-cryptographic
CRC32Data integrity

Hashing Caveats

  • Collisions: Different inputs may produce the same hash. Good hash functions minimize this.
  • Security: Cryptographic hash functions are needed for passwords or digital signatures.
  • Hash table limitations: Poor hash functions may cause clustering and performance drops.