First Normal Form (1NF)

Definition

Each row contains one value, of the appropriate type, for each column, and there are no duplicate rows.

This is a simple rewording of the formal definition:

Let relation r have attributes A1, …, An, of types T1, …, Tn, respectively. Then r is in first normal form (1NF) if and only if, for all tuples t appearing in r, the value of attribute Ai in t is of type Ti (i= 1, …, n).”
Date, C.J. Database Design & Relational Theory. Sebastopol: O,Reilly Media, Inc., 2012. Print. (Emphasis theirs)

One Value of the Appropriate Type

An attribute’s value must match the type defined in the header. Consider the table below:

Figure 1 – dbo.Customer

1NF figure 1 - Customer
The values are not appropriate for the columns. Name has something other than a name in it, PhoneNumber has things that are not phone numbers, and so on. In order for this table to be in 1NF, CustomerId must contain an identifier (whether it’s defined as an integer, a guid, or otherwise), Name must contain a customer’s name, PhoneNumber must only contain phone numbers, etc.

Email is of particular interest. It does have only email addresses in it, but the attribute is meant to contain one, and only one, value of type email. Row 3 has two values of type email in the attribute. This is sometimes called a ‘repeating group’, or a multivalued attribute. An attribute must only contain one value.

In a database system we usually only have the primitive data types to work with, that is integers, character arrays, floating point numbers, etc. If the database supported array types, then Email could be type email[] (array of emails), and the value for attribute Email in row 3 would be valid, and would meet the requirement for 1NF.

Such a construct may look like figure 2, where table Projects has attribute Employees (array of type employee):

Figure 2 – dbo.Project

1NF figure 2 - Project
(Example created by mashing data into LINQPad to prove a point, http://www.linqpad.net/)

What about Nulls?

NULL is the absence of a value, which means that the presence of a NULL violates the type of the column, and the table is not in 1NF. One could argue that if your column definition is “int null” then it is type “nullable int”, but because database systems cannot treat NULL the same as the data type that was defined as nullable then it may not be a valid claim. Again, this is a debatable point!

There are some cases in practical design where allowing NULL values may make sense, while still pursuing further normalization. For this reason the exclusion of NULL values is not usually cited as a rule for 1NF, and I do not include it either. C. J. Date, however (provider of the formal definition), would insist that there must be no nulls.

NULLs do come with their own set of issues and I would advise against permitting them almost always.

No Duplicate Rows

A table is a set, and is meant to represent a collection of real-world entities or events. There should not be two instances of a single real-world entity or event. For example, the set of whole numbers less than 5, T.

T = {0,1,2,3,4}

If I was to claim T = {0,1,2,2,3,4} then the problem is apparent. The number two in this set does not truly exist twice. This set has a duplicate value and is not a proper set.

Say that we have a table of friends:

Figure 3 – dbo.Friend

1NF figure 3 - Friend
We see here what appears to be a duplicate row, or a record that cannot be uniquely identified by any set of its attributes. Either we counted Jacob twice, or we did not capture enough information to be able to uniquely identify two Jacobs. Either way, the Jacob rows must be dealt with before the table is in 1NF.

Misconceptions

You may see 1NF defined as ‘a key has been identified’ or even ‘the table has a primary key’. But really the requirement for a proper set is to have no duplicate values. Identifying a key is something that we do to determine if a table is in 1NF, and a key constraint is something that we use to enforce 1NF. But a table can have no PK and still be in 1NF. In fact, a table must be in 1NF before we can apply a PK constraint. (Unless your DB allows you to apply a PK with a ‘no check’ hint, in which case you can have a PK and your table is still not in 1NF!)

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s