In this video we are going to talk about first normal form or 1NF. There are two rules for 1NF. Here they are
Let’s go through each one by one.
For the first rule, suppose we want to create a database storing the following information about the characters in the video game Detroit Become Human
The character Kara is played by Valorie Curry, whose hometown was Orange County, CA and was born on Feb. 12, 1986.
The character Connor is played by Bryan Dechart, whose hometown was St. Lake City, UT and was born on March 17, 1987.
The character Markus is played by Jesse Williams, whose hometown was Chicago, IL and was born on August 5, 1981.
The character North is played by Minka Kelly, whose hometown was Los Angeles, CA and was born on June 24, 1980.
The character Hank Anderson is played by Clancy Brown, whose hometown was Urbana, OH and was born on January 5, 1959.
Consider doing the following:
CREATE TABLE characters (
info text
);
INSERT INTO characters (info) VALUES
('The character Kara is played by Valorie Curry, whose hometown was Orange County, CA and was born on Feb. 12, 1986.'),
('The character Connor is played by Bryan Dechart, whose hometown was St. Lake City, UT and was born on March 17, 1987.');
or some variation of that scheme such as
INSERT INTO characters (info) VALUES
('Kara, Valorie Curry, Orange County, CA, Feb. 12, 1986.'),
('Connor, Bryan Dechart, St. Lake City, UT, March 17, 1987.');
In both these cases all the data is jammed into one column. Note that we can divide the information in that column into recognizable parts. There is the part that is the character name, the part that is the actor name, and so on. So this representation violates Data in a column should not be able to be broken down further into parts and is not in First Normal Form.
Consider a representation like the following:
role | actor | hometown | born |
---|---|---|---|
Kara | Valorie Curr | Orange County, CA | 1986-02-12 |
Connor | Bryan Dechart | St. Lake City, UT | 1987-03-17 |
Markus | Jesse Williams | Chicago, IL | 1981-08-05 |
North | Minka Kelly | Los Angeles, CA | 1980-06-24 |
Hank Anderson | Clancy Brown | Urbana, OH | 1959-01-05 |
This looks better. For each column we should ask whether the data in that column could be broken down further, with the caveat of ‘for our current purposes’. Maybe for our purposes it would be better to divide the actor column into firstname and lastname. And perhaps the hometown column could be divided into hometown and home state. This results in
role | firstname | lastname | hometown | homestate | born |
---|---|---|---|---|---|
Kara | Valorie | Curr | Orange County | CA | 1986-02-12 |
Connor | Bryan | Dechart | St. Lake City | UT | 1987-03-17 |
Markus | Jesse | Williams | Chicago | IL | 1981-08-05 |
North | Minka | Kelly | Los Angeles | CA | 1980-06-24 |
Hank Anderson | Clancy | Brown | Urbana | OH | 1959-01-05 |
That first rule Data in a column should not be able to be broken down further into parts is a bit of a wishy washy “it depends” rule. If we might want to search by the homestates of actors, this last representation would be better. Don’t be compulsive about applying this rule. For example, looking at a slightly different table:
firstname | lastname | address | city | state |
---|---|---|---|---|
Ann | Mulkern | 1290 Washington Ave | Virginia Beach | VA |
Ben | Rhodes | 107 Canyon Vista | Reston | VA |
Matt | Carriker | 227 FM 207 | San Antonio | TX |
we might be tempted to divide that address into the number and street name
firstname | lastname | street_number | street_name | city | state |
---|---|---|---|---|---|
Ann | Mulkern | 1290 | Washington Ave | Virginia Beach | VA |
Ben | Rhodes | 107 | Canyon Vista | Reston | VA |
Matt | Carriker | 227 | FM 207 | San Antonio | TX |
but for most applications this doesn’t matter. At some point you will have to say good enough. And dividing further would be silly. it’s subjective
Now the hard and fast rule about atomic data.
An example will clarify this.
Here is some data we want to represent in a database table
Ann Mulkern works out of our Detroit MI office and knows Python and Javascript
Ben Rhodes works out of our Detroit MI office and knows Golang and Java
Clara Wieck works out of our Reston VA office and knows Python and R.
Samantha Fish works out of our Reston VA office and knows Python, Javascript, and Golang.
One way to represent this is
firstname | lastname | city | state | languages |
---|---|---|---|---|
Ann | Mulkern | Boston | MA | Python Javascript |
Ben | Rhodes | Boston | MA | Golang Java |
Clara | Wieck | Reston | VA | Python R |
Samantha | Fish | Reston | VA | Python, Javascript, Golang |
However, this violates condition 1 A row cannot have several values of the same type in one column
since Python and Javascript are the same type (programming languages) and there are two values of the same type in the languages column.
Here is a slightly different way to represent the information:
firstname | lastname | city | state | language1 | language2 |
---|---|---|---|---|---|
Ann | Mulkern | Boston | MA | Python | Javascript |
Ben | Rhodes | Boston | MA | Golang | Java |
Clara | Wieck | Reston | VA | Python | R |
Samantha | Fish | Reston | VA | Python | Javascript |
This violates condition 2 multiple columns cannot have the same type of data since columns language1
and language2
contain the same type of information, namely programming languages the people know.
Later on we will learn how to design databases that have multiple tables and handle this beautifully. For now we can make this table meet the atomic rule by doing:
firstname | lastname | city | state | language |
---|---|---|---|---|
Ann | Mulkern | Boston | MA | Python |
Ann | Mulkern | Boston | MA | Javascaript |
Ben | Rhodes | Boston | MA | Golang |
Ben | Rhodes | Boston | MA | Java |
Clara | Wieck | Reston | VA | Python |
Clara | Wieck | Reston | VA | R |
Samantha | Fish | Reston | VA | Python |
Samantha | Fish | Reston | VA | Javascript |
Samantha | Fish | Reston | VA | Golang |
This table may look a little wonky but it meets the atomic criteria.
In this database we don’t have a column that contains a unique identifier – meaning that makes each row unique. Later we will learn other options for primary keys but for now we will make an id column.
id | firstname | lastname | city | state | language |
---|---|---|---|---|---|
1 | Ann | Mulkern | Boston | MA | Python |
2 | Ann | Mulkern | Boston | MA | Javascaript |
3 | Ben | Rhodes | Boston | MA | Golang |
4 | Ben | Rhodes | Boston | MA | Java |
5 | Clara | Wieck | Reston | VA | Python |
6 | Clara | Wieck | Reston | VA | R |
7 | Samantha | Fish | Reston | VA | Python |
8 | Samantha | Fish | Reston | VA | Javascript |
9 | Samantha | Fish | Reston | VA | Golang |
Now we have a table in 1NF or first normal form.
there are two ways to declare a primary key in SQL
The first is on the line itself
CREATE DATABASE library;
\c library
CREATE TABLE books_new
(
id SERIAL PRIMARY KEY,
author TEXT NOT NULL,
title TEXT NOT NULL,
acquired DATE
);
the second is at the end
CREATE TABLE users
(
id SERIAL,
name text,
PRIMARY KEY(id)
);
You can see the results by using \d to describe the tables.
so far we