Skip to main content

Normalization

Functional Dependency

A Functional Dependency (FD) is a constraint that describes a relationship between attributes (columns) in a relation (table).

Formally:

For a relation R, an attribute (or a set of attributes) X is said to functionally determine another attribute (or set of attributes) Y if, for every pair of tuples in R, whenever the values of X are the same, the values of Y are also the same.

We write this as:

X → Y

(read: X functionally determines Y)

Example of Functional Dependency

Suppose we have a relation Student:

StudentIDNameDepartmentDeptHead
101AliceCSEDr. Rahman
102BobCSEDr. Rahman
103CharlieEEEDr. Hasan

Functional Dependencies:

  1. StudentID → Name, Department
  • A StudentID uniquely determines a single Name and Department.
  • If two rows have the same StudentID, they must have the same Name and Department.
  1. Department → DeptHead
  • A department has exactly one Head.
  • If two rows have the same Department, they must have the same DeptHead.

Types of Functional Dependencies

Trivial Functional Dependency

  • A functional dependency is trivial if the dependent is a subset of the determinant.
  • Example:
    • \{StudentID, Name\} → StudentID
    • Always true, because StudentID is already part of \{StudentID, Name\}.

Non-Trivial Functional Dependency

  • A dependency is non-trivial if the dependent is not a subset of the determinant.
  • Example:
    • StudentID → Name
    • Here, Name is not part of StudentID, so it is non-trivial.

Completely Non-Trivial

  • When X → Y and X ∩ Y = ∅ (they share no attributes).
  • Example:
    • StudentID → Department
    • StudentID and Department have no attributes in common.

Importance of Functional Dependencies in Normalization

Functional Dependencies are the foundation of Normalization.

  • They help detect redundancy.
  • They identify candidate keys.
  • They are used to decide the normal form of a relation.

Example Problem:

Consider this unnormalized table Employee:

EmpIDEmpNameDeptIDDeptNameDeptHead
1Alice10CSEDr. Rahman
2Bob10CSEDr. Rahman
3Charlie20EEEDr. Hasan

Functional Dependencies:

  1. EmpID → EmpName, DeptID (An employee ID determines a unique name and department).

  2. DeptID → DeptName, DeptHead (Each department ID determines one department name and head).

Problem: Redundancy

  • DeptName and DeptHead are repeated for every employee in the same department.
  • If the head of DeptID 10 changes, multiple rows must be updated.

Solution: Normalization using FDs

  • Split into two relations:

Employee(EmpID, EmpName, DeptID) Department(DeptID, DeptName, DeptHead)

Now redundancy is removed, thanks to identifying FDs.

Summary of Functional Dependency

ConceptMeaningExample
Functional Dependency (FD)X determines Y (X → Y)StudentID → Name
Trivial FDDependent ⊆ Determinant{EmpID, Name} → EmpID
Non-Trivial FDDependent not subset of determinantEmpID → DeptID
Role in NormalizationDetect redundancy & guide decompositionDeptID → DeptHead

1st Normal Form (1NF)

First Normal Form (1NF) is the basic level of normalization in relational database design. It deals with the structure of data inside a table and ensures that the relation follows the principles of a relational model.

A relation is in 1NF if it satisfies the following rules:

  1. Atomic values only – Each cell must contain a single indivisible value (no lists, sets, repeating groups).

  2. No repeating groups/arrays – Each column should represent one attribute, not multiple.

  3. Unique rows (tuples) – Each record must be unique, identified by a primary key.

Why is 1NF important?

  • Ensures the database structure is simple and consistent.
  • Eliminates multi-valued attributes (like storing multiple phone numbers in one column).
  • Sets the foundation for higher normal forms (2NF, 3NF, BCNF).

Example: Table NOT in 1NF

StudentIDNameSubjectsPhone Numbers
101AliceMath, Physics111-2222, 333-4444
102BobChemistry555-6666
103CharlieMath, Chemistry777-8888, 999-0000

Problems:

  1. Subjects column contains multiple values (Math, Physics).
  2. Phone Numbers column contains multiple values (111-2222, 333-4444).
  3. The table violates atomicity (not atomic values).

Conversion to 1NF

We split multi-valued attributes into separate rows.

StudentIDNameSubjectPhone
101AliceMath111-2222
101AlicePhysics333-4444
102BobChemistry555-6666
103CharlieMath777-8888
103CharlieChemistry999-0000
  • Each cell now holds a single value (atomic).
  • No repeating groups → “Subjects” split into rows.
  • Primary Key can be \{StudentID, Subject, Phone\} or separate surrogate key.

Key Points about 1NF

Rule of 1NFExplanationExample
Atomic valuesNo multi-valued attributesOne phone number per row
No repeating groupsNo lists or arrays in a cellMath, Physics must be split
Unique rowsEach row must be distinguishable by a key\{StudentID, Subject\}

2nd Normal Form (2NF)

A relation is in 2NF if:

  1. It is already in 1NF.
  2. It has no partial dependency → meaning no non-prime attribute depends on part of a composite primary key.

Key Terms

  • Prime attribute → An attribute that is part of a candidate key.
  • Non-prime attribute → An attribute that is not part of any candidate key.
  • Partial dependency → When a non-prime attribute depends only on part (not all) of a composite key.

In short: 2NF removes partial dependency.

Example: Table in 1NF but not in 2NF

StudentIDCourseIDStudentNameCourseNameInstructor
101C1AliceDBMSDr. Rahman
102C1BobDBMSDr. Rahman
101C2AliceNetworksDr. Hasan
103C2CharlieNetworksDr. Hasan

Candidate Key:

  • Composite key = {StudentID, CourseID} (because each student can take multiple courses).

Functional Dependencies:

  1. \{StudentID, CourseID\} → StudentName, CourseName, Instructor
  2. StudentID → StudentName (partial dependency problem)
  3. CourseID → CourseName, Instructor (partial dependency problem)

Problem:

  • StudentName depends only on StudentID.
  • CourseName and Instructor depend only on CourseID.
  • This creates redundancy:
    • CourseName and Instructor repeat for every student enrolled in that course.
    • If Dr. Rahman changes department, we must update multiple rows.

Conversion to 2NF

We remove partial dependencies by decomposing into smaller relations.

Decomposed Tables:

Student Table

StudentIDStudentName
101Alice
102Bob
103Charlie

Course Table

CourseIDCourseNameInstructor
C1DBMSDr. Rahman
C2NetworksDr. Hasan

Enrollment Table

StudentIDCourseID
101C1
102C1
101C2
103C2

Explanation of Fix

  • Now:
    • StudentID → StudentName is stored in Student table.
    • CourseID → CourseName, Instructor is stored in Course table.
    • Enrollment table just links students and courses.
  • No partial dependency remains, because:
    • In Enrollment, the only dependency is {StudentID, CourseID} (the whole composite key).

Key Points about 2NF

Rule of 2NFExplanationExample
Must be in 1NFAtomic values, no repeating groupsAlready achieved in Enrollment table
No partial dependencyNon-prime attributes cannot depend on part of composite keyStudentName depends only on StudentID → moved to Student table
GoalRemove redundancy caused by composite keysAvoid repeating Instructor name for each student

3rd Normal Form (3NF)

A relation is in 3NF if:

  1. It is already in 2NF.
  2. It has no transitive dependency → meaning, no non-prime attribute depends on another non-prime attribute.

In other words:

  • Every non-prime attribute must depend only on the candidate key and nothing else.
  • The rule: For every functional dependency X → Y, either:
    • X is a superkey, OR
    • Y is a prime attribute (part of a candidate key).

Key Term: Transitive Dependency

A transitive dependency occurs when:

A → B and B → C ⇒ then A → C (indirect dependency).

Example:

  • If StudentID → DeptID and DeptID → DeptName,
  • Then StudentID → DeptName is a transitive dependency.

Table in 2NF but not in 3NF

EmpIDEmpNameDeptIDDeptNameDeptLocation
1Alice10HRDhaka
2Bob10HRDhaka
3Charlie20ITChittagong
4David30FinanceKhulna

Candidate Key:

  • EmpID (each employee has a unique ID).

Functional Dependencies:

  1. EmpID → EmpName, DeptID, DeptName, DeptLocation
  2. DeptID → DeptName, DeptLocation

Problem:

  • DeptName and DeptLocation depend on DeptID, not directly on EmpID.
  • This is a transitive dependency:
    • EmpID → DeptID
    • DeptID → DeptName, DeptLocation
    • ⇒ EmpID → DeptName, DeptLocation (transitive).

Redundancy Issues:

  • DeptName & DeptLocation repeat for every employee in the same department.
  • If DeptLocation changes, many rows need updating (update anomaly).

Conversion to 3NF

We remove transitive dependencies by splitting the table into smaller relations.

Decomposed Tables:

Employee Table

EmpIDEmpNameDeptID
1Alice10
2Bob10
3Charlie20
4David30

Department Table

DeptIDDeptNameDeptLocation
10HRDhaka
20ITChittagong
30FinanceKhulna

Explanation of Fix

  • Now:
    • In Employee, non-prime attributes (EmpName) depend only on the key EmpID.
    • In Department, non-prime attributes (DeptName, DeptLocation) depend only on the key DeptID.
  • All transitive dependencies are removed.

Key Points about 3NF

Rule of 3NFExplanationExample
Must be in 2NFNo partial dependenciesAchieved
No transitive dependencyNon-prime attributes must depend only on keyDeptName & DeptLocation moved to Department table
GoalEliminate redundancy & anomaliesAvoid repeating Dept details for each employee

Boyce-Codd Normal Form (BCNF)

BCNF is an advanced version of Third Normal Form (3NF).

A relation is in BCNF if:

For every functional dependency (X → Y):

  • X must be a superkey (a key that can uniquely identify a row).

Difference from 3NF:

  • In 3NF, a functional dependency is allowed if Y is a prime attribute (part of a candidate key).
  • In BCNF, this exception is not allowed.

So:

BCNF = A stricter form of 3NF.

Why do we need BCNF?

Even if a table is in 3NF, it may still have anomalies if a non-trivial FD exists where determinant (X) is not a superkey.

BCNF eliminates:

  • Update anomalies
  • Insertion anomalies
  • Deletion anomalies

Table in 3NF but not in BCNF

CourseIDInstructorRoom
C1Dr. SmithR101
C2Dr. SmithR102
C3Dr. JonesR101

Candidate Keys:

  • \{CourseID, Instructor\} (a course + instructor pair is unique).

Functional Dependencies:

  1. CourseID → Room
  2. Room → Instructor

Check 3NF:

  • Yes, table is in 3NF because all non-prime attributes depend on keys.

Problem (Why not BCNF?):

  • Room → Instructor violates BCNF.
    • Here, Room is not a superkey.
    • But it determines Instructor.
  • This causes anomalies:
    • If Dr. Smith moves to another room, multiple rows must be updated.

Conversion to BCNF

We decompose into two relations:

Course Table

CourseIDRoom
C1R101
C2R102
C3R101

RoomInstructor Table

RoomInstructor
R101Dr. Smith
R102Dr. Smith
R101Dr. Jones
  • In Course, CourseID is the key → No violation.
  • In RoomInstructor, \{Room, Instructor\} is the key → No violation.

Key Points about BCNF

Normal FormRuleExample of Violation
1NFAtomic values, no repeating groupsMulti-valued subjects
2NFNo partial dependencyNon-prime depends on part of composite key
3NFNo transitive dependencyNon-prime depends on another non-prime
BCNFFor every FD X → Y, X must be a superkeyRoom → Instructor when Room is not a key

Fourth Normal Form (4NF)

A relation is in 4NF if:

  1. It is already in BCNF.
  2. It has no multi-valued dependency (MVD), except when it is implied by a candidate key.

Multi-Valued Dependency (MVD):

  • Occurs when one attribute in a table uniquely determines multiple independent sets of values.
  • Example notation: A ↠ B ("A multi-determines B").

Example: Table NOT in 4NF

StudentIDHobbyLanguage
101ChessEnglish
101ChessFrench
101PaintingEnglish
101PaintingFrench

Functional / Multi-Valued Dependencies:

  • StudentID ↠ Hobby (a student can have many hobbies).
  • StudentID ↠ Language (a student can know many languages).

Problem:

  • Unnecessary repetition → For each hobby, we repeat all languages.

Conversion to 4NF

Decompose into two independent relations:

StudentHobby

StudentIDHobby
101Chess
101Painting

StudentLanguage

StudentIDHobby
101Chess
101Painting

Fifth Normal Form (5NF) / Project-Join Normal Form (PJNF)

A relation is in 5NF if:

  1. It is already in 4NF.
  2. It has no join dependency (JD), except when implied by candidate keys.

Join Dependency:

  • A table can be losslessly decomposed into smaller tables and then reconstructed using joins.
  • If such decomposition is necessary to remove redundancy, the table is not in 5NF.

Example: Table NOT in 5NF

SupplierPartProject
S1P1J1
S1P2J1
S1P1J2
S1P2J2

Dependencies:

  • Supplier supplies Parts.
  • Supplier works on Projects.
  • Parts are used in Projects.

Problem:

All three facts are independent, but stored together, leading to redundancy.

Conversion to 5NF

Decompose into three smaller relations:

SupplierPart

SupplierPart
S1P1
S1P2

SupplierProject

SupplierPart
S1P1
S1P2

PartProject

PartProject
P1J1
P1J2
P2J1
P2J2
  • When joined, we reconstruct the original table.
  • Redundancy is removed.

Sixth Normal Form (6NF)

  • A relation is in 6NF if:
    1. It is already in 5NF.
    2. It has no non-trivial join dependencies at all.
  • 6NF is used in temporal databases (where data changes with time).
  • Goal: Break tables down into the smallest possible structures (each fact stored in its own table).

Example: Temporal Database (Not in 6NF)

EmployeeDepartmentStartDateEndDate
E1HR2020-01-012021-01-01
E1Finance2021-01-022022-01-01

Problem:

Both Department and employment period are stored in the same table → mixing multiple time-dependent facts.

Conversion to 6NF

Split into multiple tables, each capturing a single time-dependent fact:

EmployeeDepartment

EmployeeDepartmentStartDateEndDate
E1HR2020-01-012021-01-01
E1Finance2021-01-022022-01-01

EmployeeContract

EmployeeStartDateEndDate
E12020-01-012022-01-01

This allows better handling of historical (temporal) data.

Summary of Normalization

Normal FormRemovesExample Problem
1NFRepeating groups / non-atomic valuesMultiple subjects in one cell
2NFPartial dependencyNon-prime depends on part of composite key
3NFTransitive dependencyDeptName depends on DeptID, not EmpID
BCNFFD where determinant not a superkeyRoom → Instructor
4NFMulti-valued dependencyStudent has multiple Hobbies and Languages
5NFJoin dependencySupplier-Part-Project redundancy
6NFTemporal / trivial join dependencyTime-dependent data