Second Normal Form (2NF)

Second Normal Form (2NF) is where we start analyzing the relationships of columns within a table to each other.

In order to dig into this, let’s briefly review a couple of concepts:

Any field or combination of fields that uniquely identifies a row is a superkey. It has the following property:

Uniqueness: Every value of the superkey appears only once in the table.

In figure 1 below, we have a table showing where invoices are being sent out to customers. The system sends them in batches and records the action.

We have the following superkeys:
{BatchId, InvoiceId, ProcessDate, PastDue}
{BatchId, InvoiceId, ProcessDate}
{BatchId, InvoiceId, PastDue}
{BatchId, InvoiceId}

Figure 1 – dbo.BillingLog

2NF Billing - figure 1
Candidate Keys:
A candidate key is a superkey that is irreducible, sometimes called a ‘minimal superkey’. It has the following properties:

Uniqueness: Every value of the superkey appears only once in the table.
Irreducibility: No proper subset of the superkey has the uniqueness property.

Note: “Proper” subset: If X is a subset of Y, but X is not equal to Y (i.e. there exists at least one element of Y which is not an element of X), then X is a proper subset of Y.
{BatchId, InvoiceId} is a subset of {BatchId, InvoiceId}, but is not a ‘proper’ subset.

In figure 1, the superkey {BatchId, InvoiceId, ProcessDate} is unique for all rows. But it is reducible, as a subset of it {BatchId, InvoiceId} is also unique for all rows. So it is not a candidate key.

Candidate keys are what we all commonly refer to as the ‘key’ of a table, and are often defined in databases as a ‘primary’ key. A table can have more than one candidate key.

A subkey is a subset of a candidate key. Our table in figure 1 has one candidate key {BatchId, InvoiceId}, and the following subkeys:

{BatchId, InvoiceId} – (any set is a subset of itself)
{} – (an empty set of attributes is considered a subkey)

A subkey that isn’t a key is called a proper subkey. {BatchId} and {InvoiceId} are proper subkeys.

One more thing before we get to 2NF, because 2NF is all about…

Functional Dependencies:

“Attribute A determines attribute B (that is, B is functionally dependent on A) if all of the rows in the table that agree in value for attribute A also agree in value for attribute B.”
Peter Rob, Database Systems: Design, Implementation & Management, 7th Edition. Cengage Learning, 2006. Print.

C.J. Date has a better definition, as it makes the point that the attributes (referred to as A and B above, and X and Y below) can both be a single column or a set of columns.

“Let X and Y be subsets of the heading of relvar R; then the functional dependency (FD) XY holds in R if and only if, whenever two tuples of R agree on X, they also agree on Y. X and Y are the determinant and the dependent…”
Date, C.J. Database Design & Relational Theory. Sebastopol: O,Reilly Media, Inc., 2012. Print. (Emphasis theirs)

In figure 1, a batch being processed is a single event, and we can see that every instance of a particular BatchId has the same value for the batch’s process date. We can say that BatchId determines ProcessDate, or {BatchId}→{ProcessDate}.

An FD (Functional Dependency) is trivial if it cannot be violated. This only happens if the dependent is a subset of the determinant, for example:

{BatchId, InvoiceId}→{BatchId}     (Literally stated, if BatchId = x and InvoiceId = y, then BatchId must equal x. Yes, a trivial FD is stating the obvious. That’s why it’s trivial!)

{BatchId, InvoiceId}→{InvoiceId}

Functional Dependency Irreducibility:
A FD is irreducible if any proper subset of the determinant does not also determine the dependent.

In figure 1, FD {BatchId, InvoiceId}→{ProcessDate}. However, {BatchId}→{ProcessDate} is also true (a batch can only occur in a single day). So this FD is reducible.

FD {BatchId, InvoiceId}→{PastDue} is not reducible, as we can see that neither {BatchId}→{PastDue} nor {InvoiceId}→{PastDue} hold true as a functional dependency.

Now we can finally take a look at 2NF:


“Relvar R is in second normal form (2NF) if and only if, for every key K of R and every nonkey attribute A of R, the FD K→{A} (which holds in R, necessarily) is irreducible.”
Date, C.J. Database Design & Relational Theory. Sebastopol: O,Reilly Media, Inc., 2012. Print. (Emphasis theirs)

To paraphrase: A table is in 2NF if, for every candidate key and nonkey attribute, the functional dependency {candidate key}→{nonkey attribute} is irreducible.

We can test this with our table from figure 1. We have one candidate key {BatchId, InvoiceId}, and two nonkey attributes {ProcessDate} and {PastDue}. So for this table to be in 2NF these functional dependencies must be irreducible:

{BatchId, InvoiceId}→{ProcessDate}
{BatchId, InvoiceId}→{PastDue}

As we have already established, FD {BatchId, InvoiceId}→{ProcessDate} is in fact reducible, because FD {BatchId}→{ProcessDate} holds true. So this table is not in 2NF.

C. J. Date gives us a second equivalent definition:

“Relvar R is in second normal form (2NF) if and only if, for every nontrivial FD XY that holds in R, at least one of the following is true: (a) X is a superkey; (b) Y is a subkey; (c) X is not a subkey.”
Date, C.J. Database Design & Relational Theory. Sebastopol: O,Reilly Media, Inc., 2012. Print. (Emphasis theirs)

We can apply this same test to our example table. There are three nontrivial functional dependencies:

{BatchId, InvoiceId}→{ProcessDate}
{BatchId, InvoiceId}→{PastDue}

The first FD satisfies requirement (a) ‘X is a superkey’, so that passes. (A candidate key is also a superkey!)

The second FD also satisfies requirement (a), so that passes.

The third FD does not pass (a). But only one of the tests have to pass, so let’s check (b), ‘Y is a subkey’. No, {ProcessDate} is not a subkey. How about (c) ‘X is not a subkey’? No, {BatchId} is in fact a subkey. This FD has failed all three tests, indicating that this table is not in 2NF.

Other 2NF Definitions:
You may sometimes see 2NF defined as ‘there must be no partial dependencies on the primary key’. But this definition is not wholly satisfactory. It only examines dependencies on one key, while a table can have multiple candidate keys. It is only in database software that we must elect one of them to be the ‘primary’ one. You may have seen instances where one key was a ‘primary’ key and another was constrained by using a unique index.

In figure 2 below, we see a new version of the Billing table with a surrogate key. We will assume that {BillingLogId} has been marked as the primary key. In this case the primary key is only one field, so we know that any FD where {BillingLogId} is the determinant will be irreducible. However, is the table truly in 2NF?

Figure 2 – dbo.BillingLog with a surrogate key

2NF Billing - figure 2
The candidate key {BatchId, InvoiceId} is still there. In fact, FD {BatchId, InvoiceId}→{BillingLogId} holds just as true as FD {BillingLogId}→{BatchId, InvoiceId}.

We know that an invoice cannot (or should not) be sent in the same batch twice, and if there were two identical values for {BatchId, InvoiceId} then it would indicate either a system error or that something or someone has manipulated our data into an invalid state.

This table has two keys, and both must be considered in order to judge whether it has redundancy or not. We know that our testing has already revealed that {ProcessDate} has some redundancy, and that something should be done about it. If we had evaluated this table based only on primary key {BillingLogId}, then this fact would have been missed.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s