Denormalization

Denormalization is the process of intentionally introducing redundancy into a database by combining tables or adding derived data.
It’s the opposite of normalization.
The goal is not to “break the rules” but to improve performance in real-world applications.

In short:

Normalization = Optimize for consistency & integrity.
Denormalization = Optimize for query performance (sometimes at the cost of redundancy).

Why Do We Denormalize?

Fully normalized databases are great for data integrity, but they have some problems in practice:

Performance Issues
- In normalized design, retrieving data often requires many joins.
- Joins are expensive for large datasets.
Reporting and Analytics
- Applications like dashboards, reports, or data warehouses need fast aggregation, not complex joins.
Read-Heavy Workloads
- If the database is mostly used for reads (not writes), denormalization makes sense.
Caching Frequently Used Data
- Instead of computing results repeatedly (like totals or counts), we may store them directly.

You should consider denormalization when:

Avoid denormalization if:

Normalized Design (3NF)

Customer

To get Order with Customer Name & City, we must join:

SELECT o.OrderID, o.OrderDate, c.Name, c.City
FROM Order o
JOIN Customer c ON o.CustomerID = c.CustomerID;

We combine Order and Customer into one table:

OrderID	CustomerID	Name	City	OrderDate
101	1	Alice	Dhaka	2023-01-01
102	2	Bob	Chittagong	2023-01-05

Combining Tables
- Merging related tables into one (as in Customer + Order).
Adding Redundant Columns
- E.g., Store CustomerName directly in Order table.
Precomputing Derived Data
- E.g., Store OrderTotal in the Order table instead of calculating it from OrderDetails every time.
Duplicating Tables Create summary tables for reporting (like monthly sales).
Star Schema / Snowflake Schema (in Data Warehouses)
- Fact table (sales) + dimension tables (customer, product, time).

Aspect	Normalization	Denormalization
Goal	Reduce redundancy, improve integrity	Improve performance, reduce joins
Pros	Consistent data, avoids anomalies	Faster queries, better for reporting
Cons	Slow queries (many joins), not optimized for reads	Redundancy, anomalies possible
Best Use	OLTP (transactional systems)	OLAP (reporting, analytics, warehouses)