Normalization
Functional Dependency
A Functional Dependency (FD) is a constraint that describes a relationship between attributes (columns) in a relation (table).
Formally:
For a relation R, an attribute (or a set of attributes) X is said to functionally determine another attribute (or set of attributes) Y if, for every pair of tuples in R, whenever the values of X are the same, the values of Y are also the same.
We write this as:
X → Y
(read: X functionally determines Y)
Example of Functional Dependency
Suppose we have a relation Student:
| StudentID | Name | Department | DeptHead |
|---|---|---|---|
| 101 | Alice | CSE | Dr. Rahman |
| 102 | Bob | CSE | Dr. Rahman |
| 103 | Charlie | EEE | Dr. Hasan |
Functional Dependencies:
- StudentID → Name, Department
- A StudentID uniquely determines a single Name and Department.
- If two rows have the same StudentID, they must have the same Name and Department.
- Department → DeptHead
- A department has exactly one Head.
- If two rows have the same Department, they must have the same DeptHead.
Types of Functional Dependencies
Trivial Functional Dependency
- A functional dependency is trivial if the dependent is a subset of the determinant.
- Example:
\{StudentID, Name\}→ StudentID- Always true, because StudentID is already part of
\{StudentID, Name\}.
Non-Trivial Functional Dependency
- A dependency is non-trivial if the dependent is not a subset of the determinant.
- Example:
StudentID → Name- Here, Name is not part of StudentID, so it is non-trivial.
Completely Non-Trivial
- When
X → Yand X ∩ Y = ∅ (they share no attributes). - Example:
StudentID → DepartmentStudentIDandDepartmenthave no attributes in common.
Importance of Functional Dependencies in Normalization
Functional Dependencies are the foundation of Normalization.
- They help detect redundancy.
- They identify candidate keys.
- They are used to decide the normal form of a relation.
Example Problem:
Consider this unnormalized table Employee:
| EmpID | EmpName | DeptID | DeptName | DeptHead |
|---|---|---|---|---|
| 1 | Alice | 10 | CSE | Dr. Rahman |
| 2 | Bob | 10 | CSE | Dr. Rahman |
| 3 | Charlie | 20 | EEE | Dr. Hasan |
Functional Dependencies:
-
EmpID → EmpName, DeptID (An employee ID determines a unique name and department).
-
DeptID → DeptName, DeptHead (Each department ID determines one department name and head).
Problem: Redundancy
- DeptName and DeptHead are repeated for every employee in the same department.
- If the head of DeptID 10 changes, multiple rows must be updated.
Solution: Normalization using FDs
- Split into two relations:
Employee(EmpID, EmpName, DeptID) Department(DeptID, DeptName, DeptHead)
Now redundancy is removed, thanks to identifying FDs.
Summary of Functional Dependency
| Concept | Meaning | Example |
|---|---|---|
| Functional Dependency (FD) | X determines Y (X → Y) | StudentID → Name |
| Trivial FD | Dependent ⊆ Determinant | {EmpID, Name} → EmpID |
| Non-Trivial FD | Dependent not subset of determinant | EmpID → DeptID |
| Role in Normalization | Detect redundancy & guide decomposition | DeptID → DeptHead |
1st Normal Form (1NF)
First Normal Form (1NF) is the basic level of normalization in relational database design. It deals with the structure of data inside a table and ensures that the relation follows the principles of a relational model.
A relation is in 1NF if it satisfies the following rules:
-
Atomic values only – Each cell must contain a single indivisible value (no lists, sets, repeating groups).
-
No repeating groups/arrays – Each column should represent one attribute, not multiple.
-
Unique rows (tuples) – Each record must be unique, identified by a primary key.
Why is 1NF important?
- Ensures the database structure is simple and consistent.
- Eliminates multi-valued attributes (like storing multiple phone numbers in one column).
- Sets the foundation for higher normal forms (2NF, 3NF, BCNF).
Example: Table NOT in 1NF
| StudentID | Name | Subjects | Phone Numbers |
|---|---|---|---|
| 101 | Alice | Math, Physics | 111-2222, 333-4444 |
| 102 | Bob | Chemistry | 555-6666 |
| 103 | Charlie | Math, Chemistry | 777-8888, 999-0000 |
Problems:
- Subjects column contains multiple values (
Math, Physics). - Phone Numbers column contains multiple values (
111-2222, 333-4444). - The table violates atomicity (not atomic values).
Conversion to 1NF
We split multi-valued attributes into separate rows.
| StudentID | Name | Subject | Phone |
|---|---|---|---|
| 101 | Alice | Math | 111-2222 |
| 101 | Alice | Physics | 333-4444 |
| 102 | Bob | Chemistry | 555-6666 |
| 103 | Charlie | Math | 777-8888 |
| 103 | Charlie | Chemistry | 999-0000 |
- Each cell now holds a single value (atomic).
- No repeating groups → “Subjects” split into rows.
- Primary Key can be
\{StudentID, Subject, Phone\}or separate surrogate key.
Key Points about 1NF
| Rule of 1NF | Explanation | Example |
|---|---|---|
| Atomic values | No multi-valued attributes | One phone number per row |
| No repeating groups | No lists or arrays in a cell | Math, Physics must be split |
| Unique rows | Each row must be distinguishable by a key | \{StudentID, Subject\} |
2nd Normal Form (2NF)
A relation is in 2NF if:
- It is already in 1NF.
- It has no partial dependency → meaning no non-prime attribute depends on part of a composite primary key.
Key Terms
- Prime attribute → An attribute that is part of a candidate key.
- Non-prime attribute → An attribute that is not part of any candidate key.
- Partial dependency → When a non-prime attribute depends only on part (not all) of a composite key.
In short: 2NF removes partial dependency.
Example: Table in 1NF but not in 2NF
| StudentID | CourseID | StudentName | CourseName | Instructor |
|---|---|---|---|---|
| 101 | C1 | Alice | DBMS | Dr. Rahman |
| 102 | C1 | Bob | DBMS | Dr. Rahman |
| 101 | C2 | Alice | Networks | Dr. Hasan |
| 103 | C2 | Charlie | Networks | Dr. Hasan |
Candidate Key:
- Composite key = {StudentID, CourseID} (because each student can take multiple courses).
Functional Dependencies:
\{StudentID, CourseID\} → StudentName, CourseName, InstructorStudentID → StudentName(partial dependency problem)CourseID → CourseName, Instructor(partial dependency problem)
Problem:
- StudentName depends only on
StudentID. - CourseName and Instructor depend only on
CourseID. - This creates redundancy:
CourseNameandInstructorrepeat for every student enrolled in that course.- If Dr. Rahman changes department, we must update multiple rows.
Conversion to 2NF
We remove partial dependencies by decomposing into smaller relations.
Decomposed Tables:
Student Table
| StudentID | StudentName |
|---|---|
| 101 | Alice |
| 102 | Bob |
| 103 | Charlie |
Course Table
| CourseID | CourseName | Instructor |
|---|---|---|
| C1 | DBMS | Dr. Rahman |
| C2 | Networks | Dr. Hasan |
Enrollment Table
| StudentID | CourseID |
|---|---|
| 101 | C1 |
| 102 | C1 |
| 101 | C2 |
| 103 | C2 |
Explanation of Fix
- Now:
StudentID → StudentNameis stored in Student table.CourseID → CourseName, Instructoris stored in Course table.Enrollmenttable just links students and courses.
- No partial dependency remains, because:
- In Enrollment, the only dependency is {StudentID, CourseID} (the whole composite key).
Key Points about 2NF
| Rule of 2NF | Explanation | Example |
|---|---|---|
| Must be in 1NF | Atomic values, no repeating groups | Already achieved in Enrollment table |
| No partial dependency | Non-prime attributes cannot depend on part of composite key | StudentName depends only on StudentID → moved to Student table |
| Goal | Remove redundancy caused by composite keys | Avoid repeating Instructor name for each student |
3rd Normal Form (3NF)
A relation is in 3NF if:
- It is already in 2NF.
- It has no transitive dependency → meaning, no non-prime attribute depends on another non-prime attribute.
In other words:
- Every non-prime attribute must depend only on the candidate key and nothing else.
- The rule: For every functional dependency X → Y, either:
- X is a superkey, OR
- Y is a prime attribute (part of a candidate key).
Key Term: Transitive Dependency
A transitive dependency occurs when:
A → B and B → C ⇒ then A → C (indirect dependency).
Example:
- If
StudentID → DeptIDandDeptID → DeptName, - Then
StudentID → DeptNameis a transitive dependency.
Table in 2NF but not in 3NF
| EmpID | EmpName | DeptID | DeptName | DeptLocation |
|---|---|---|---|---|
| 1 | Alice | 10 | HR | Dhaka |
| 2 | Bob | 10 | HR | Dhaka |
| 3 | Charlie | 20 | IT | Chittagong |
| 4 | David | 30 | Finance | Khulna |
Candidate Key:
EmpID(each employee has a unique ID).
Functional Dependencies:
EmpID → EmpName, DeptID, DeptName, DeptLocationDeptID → DeptName, DeptLocation
Problem:
DeptNameandDeptLocationdepend onDeptID, not directly onEmpID.- This is a transitive dependency:
EmpID → DeptIDDeptID → DeptName, DeptLocation⇒ EmpID → DeptName, DeptLocation(transitive).
Redundancy Issues:
- DeptName & DeptLocation repeat for every employee in the same department.
- If DeptLocation changes, many rows need updating (update anomaly).
Conversion to 3NF
We remove transitive dependencies by splitting the table into smaller relations.
Decomposed Tables:
Employee Table
| EmpID | EmpName | DeptID |
|---|---|---|
| 1 | Alice | 10 |
| 2 | Bob | 10 |
| 3 | Charlie | 20 |
| 4 | David | 30 |
Department Table
| DeptID | DeptName | DeptLocation |
|---|---|---|
| 10 | HR | Dhaka |
| 20 | IT | Chittagong |
| 30 | Finance | Khulna |
Explanation of Fix
- Now:
- In Employee, non-prime attributes (
EmpName) depend only on the keyEmpID. - In Department, non-prime attributes (
DeptName,DeptLocation) depend only on the keyDeptID.
- In Employee, non-prime attributes (
- All transitive dependencies are removed.
Key Points about 3NF
| Rule of 3NF | Explanation | Example |
|---|---|---|
| Must be in 2NF | No partial dependencies | Achieved |
| No transitive dependency | Non-prime attributes must depend only on key | DeptName & DeptLocation moved to Department table |
| Goal | Eliminate redundancy & anomalies | Avoid repeating Dept details for each employee |
Boyce-Codd Normal Form (BCNF)
BCNF is an advanced version of Third Normal Form (3NF).
A relation is in BCNF if:
For every functional dependency (X → Y):
- X must be a superkey (a key that can uniquely identify a row).
Difference from 3NF:
- In 3NF, a functional dependency is allowed if Y is a prime attribute (part of a candidate key).
- In BCNF, this exception is not allowed.
So:
BCNF = A stricter form of 3NF.
Why do we need BCNF?
Even if a table is in 3NF, it may still have anomalies if a non-trivial FD exists where determinant (X) is not a superkey.
BCNF eliminates:
- Update anomalies
- Insertion anomalies
- Deletion anomalies
Table in 3NF but not in BCNF
| CourseID | Instructor | Room |
|---|---|---|
| C1 | Dr. Smith | R101 |
| C2 | Dr. Smith | R102 |
| C3 | Dr. Jones | R101 |
Candidate Keys:
\{CourseID, Instructor\}(a course + instructor pair is unique).
Functional Dependencies:
CourseID → RoomRoom → Instructor
Check 3NF:
- Yes, table is in 3NF because all non-prime attributes depend on keys.
Problem (Why not BCNF?):
Room → Instructorviolates BCNF.- Here,
Roomis not a superkey. - But it determines
Instructor.
- Here,
- This causes anomalies:
- If Dr. Smith moves to another room, multiple rows must be updated.
Conversion to BCNF
We decompose into two relations:
Course Table
| CourseID | Room |
|---|---|
| C1 | R101 |
| C2 | R102 |
| C3 | R101 |
RoomInstructor Table
| Room | Instructor |
|---|---|
| R101 | Dr. Smith |
| R102 | Dr. Smith |
| R101 | Dr. Jones |
- In Course,
CourseIDis the key → No violation. - In RoomInstructor,
\{Room, Instructor\}is the key → No violation.
Key Points about BCNF
| Normal Form | Rule | Example of Violation |
|---|---|---|
| 1NF | Atomic values, no repeating groups | Multi-valued subjects |
| 2NF | No partial dependency | Non-prime depends on part of composite key |
| 3NF | No transitive dependency | Non-prime depends on another non-prime |
| BCNF | For every FD X → Y, X must be a superkey | Room → Instructor when Room is not a key |
Fourth Normal Form (4NF)
A relation is in 4NF if:
- It is already in BCNF.
- It has no multi-valued dependency (MVD), except when it is implied by a candidate key.
Multi-Valued Dependency (MVD):
- Occurs when one attribute in a table uniquely determines multiple independent sets of values.
- Example notation: A ↠ B ("A multi-determines B").
Example: Table NOT in 4NF
| StudentID | Hobby | Language |
|---|---|---|
| 101 | Chess | English |
| 101 | Chess | French |
| 101 | Painting | English |
| 101 | Painting | French |
Functional / Multi-Valued Dependencies:
StudentID ↠ Hobby(a student can have many hobbies).StudentID ↠ Language(a student can know many languages).
Problem:
- Unnecessary repetition → For each hobby, we repeat all languages.
Conversion to 4NF
Decompose into two independent relations:
StudentHobby
| StudentID | Hobby |
|---|---|
| 101 | Chess |
| 101 | Painting |
StudentLanguage
| StudentID | Hobby |
|---|---|
| 101 | Chess |
| 101 | Painting |
Fifth Normal Form (5NF) / Project-Join Normal Form (PJNF)
A relation is in 5NF if:
- It is already in 4NF.
- It has no join dependency (JD), except when implied by candidate keys.
Join Dependency:
- A table can be losslessly decomposed into smaller tables and then reconstructed using joins.
- If such decomposition is necessary to remove redundancy, the table is not in 5NF.
Example: Table NOT in 5NF
| Supplier | Part | Project |
|---|---|---|
| S1 | P1 | J1 |
| S1 | P2 | J1 |
| S1 | P1 | J2 |
| S1 | P2 | J2 |
Dependencies:
- Supplier supplies Parts.
- Supplier works on Projects.
- Parts are used in Projects.
Problem:
All three facts are independent, but stored together, leading to redundancy.
Conversion to 5NF
Decompose into three smaller relations:
SupplierPart
| Supplier | Part |
|---|---|
| S1 | P1 |
| S1 | P2 |
SupplierProject
| Supplier | Part |
|---|---|
| S1 | P1 |
| S1 | P2 |
PartProject
| Part | Project |
|---|---|
| P1 | J1 |
| P1 | J2 |
| P2 | J1 |
| P2 | J2 |
- When joined, we reconstruct the original table.
- Redundancy is removed.
Sixth Normal Form (6NF)
- A relation is in 6NF if:
- It is already in 5NF.
- It has no non-trivial join dependencies at all.
- 6NF is used in temporal databases (where data changes with time).
- Goal: Break tables down into the smallest possible structures (each fact stored in its own table).
Example: Temporal Database (Not in 6NF)
| Employee | Department | StartDate | EndDate |
|---|---|---|---|
| E1 | HR | 2020-01-01 | 2021-01-01 |
| E1 | Finance | 2021-01-02 | 2022-01-01 |
Problem:
Both Department and employment period are stored in the same table → mixing multiple time-dependent facts.
Conversion to 6NF
Split into multiple tables, each capturing a single time-dependent fact:
EmployeeDepartment
| Employee | Department | StartDate | EndDate |
|---|---|---|---|
| E1 | HR | 2020-01-01 | 2021-01-01 |
| E1 | Finance | 2021-01-02 | 2022-01-01 |
EmployeeContract
| Employee | StartDate | EndDate |
|---|---|---|
| E1 | 2020-01-01 | 2022-01-01 |
This allows better handling of historical (temporal) data.
Summary of Normalization
| Normal Form | Removes | Example Problem |
|---|---|---|
| 1NF | Repeating groups / non-atomic values | Multiple subjects in one cell |
| 2NF | Partial dependency | Non-prime depends on part of composite key |
| 3NF | Transitive dependency | DeptName depends on DeptID, not EmpID |
| BCNF | FD where determinant not a superkey | Room → Instructor |
| 4NF | Multi-valued dependency | Student has multiple Hobbies and Languages |
| 5NF | Join dependency | Supplier-Part-Project redundancy |
| 6NF | Temporal / trivial join dependency | Time-dependent data |