It is set of rules that have been established to aid in the design of tables that are meant to be connected through relationships. This set of rules is known as Normalization.
Benefits of Normalizing your database include:
- Avoiding repetitive entries
- Reducing required storage space
- Preventing the need to restructure existing tables to accommodate new data
- Increased speed and flexibility of queries, sorts, and summaries
Note: During an interview, people expect to answer a maximum of three normal forms and that's what is expected practically. Actually you can normalize database to fifth normal form. But believe this book, answering three normal forms will put you in a decent shape during an interview.
The three normal forms as follows:
First Normal Form
For a table to be in first normal form, data must be broken up into the smallest units possible. In addition to breaking data up into the smallest meaningful values, tables in first normal form should not contain repetitions groups of fields.
In the above example, city1
and city2
are repeating. In order for these tables to be in First normal form, you have to modify the table structure as follows. Also note that the Customer Name is now broken down to first name and last name (First normal form data should be broken down to the smallest unit).
Second Normal Form
The second normal form states that each field in a multiple field primary key table must be directly related to the entire primary key. In other words, each non-key field should be a fact about all the fields in the primary key.
In the above table of customer
, city
is not linked to any primary field.
That takes our database to a second normal form.
Third Normal Form
A non-key field should not depend on another Non-key field. The field Total
is dependent on Unit price
and qty
.
So now the Total
field is removed and is the multiplication of Unit price * Qty
.
What is Denormalization?
Denormalization is the process of putting one fact in numerous places (it is vice-versa of normalization). Only one valid reason exists for denormalizing a relational design - to enhance performance. The sacrifice to performance is that you increase redundancy in a database.
Can you Explain Fourth Normal Form?
Note: Whenever the interviewer is trying to go above the third normal form, there can be two reasons, ego or to fail you. Three normal forms are really enough, practically anything more than that is an overdose.
In fourth normal form, it should not contain two or more independent multi-valued facts about an entity and it should satisfy “Third Normal form”.
So let us try to see what multi-valued facts are. If there are two or more many-to-many relationship in one entity and they tend to come to one place, it is termed as “multi-valued facts”.
In the above table, you can see that there are two many-to-many relationships between Supplier
/ Product
and “Supplier
/ Location
(or in short multi-valued facts). In order for the above example to satisfy the fourth normal form, both the many-to-many relationships should go in different tables.
Can you Explain Fifth Normal Form?
Note: UUUHHH if you get this question after joining the company, do ask him if he himself really uses it?
Fifth normal form deals with reconstructing information from smaller pieces of information. These smaller pieces of information can be maintained with less redundancy.
Example: Dealers
sell Product
which can be manufactured by various Companies
. Dealers
in order to sell theProduct
should be registered with the Company
. So these three entities have a mutual relationship within them.
The above table shows some sample data. If you observe closely, a single record is created using lot of small information. For instance: JM Associate
can sell sweets under the following two conditions:
JM Associate
should be an authorized dealer ofCadbury
Sweets
should be manufactured byCadbury
company
These two smaller bits of information form one record of the above given table. So in order for the above information to be “Fifth Normal Form” all the smaller information should be in three different places. Below is the complete fifth normal form of the database.
What is the Difference between Fourth and Fifth normal form?
Note: There is a huge similarity between Fourth and Fifth normal form, i.e. they address the problem of “Multi-Valued facts”.
“Fifth normal form” multi-valued facts are interlinked and “Fourth normal form” values are independent. For instance in the above two questions Supplier
/Product
and Supplier
/Location
are not linked. While in fifth form, theDealer
/Product
/Companies
are completely linked.
Have you Heard about Sixth Normal Form?
Note: Arrrrggghhh yes there exists a sixth normal form also. But note guys you can skip this statement. Just in case you want to impress the interviewer...
If you want a relational system in conjunction with time, you use sixth normal form. At this moment SQL Server does not support it directly.