Backend Developer Fundamentals

First Normal Form

In this video we are going to talk about first normal form or 1NF. There are two rules for 1NF. Here they are

In This Video: Rules for First Normal Form

Each row of data must contain atomic values
Each row of data must have a unique identifier called the primary key

Let’s go through each one by one.

The two rules for Atomic Values

Data in a column should not be able to be broken down further into parts (for your current purposes)
Do not have multiple values of same type in a single row
1. You can’t have several values of the same type in a single column
2. You can’t have multiple columns with the same type of thing.

For the first rule, suppose we want to create a database storing the following information about the characters in the video game Detroit Become Human

The character Kara is played by Valorie Curry, whose hometown was Orange County, CA and was born on Feb. 12, 1986.
The character Connor is played by Bryan Dechart, whose hometown was St. Lake City, UT  and was born on March 17, 1987.
The character Markus is played by Jesse Williams, whose hometown was Chicago, IL and was born on August 5, 1981.
The character North is played by Minka Kelly, whose hometown was Los Angeles, CA and was born on June 24, 1980.
The character Hank Anderson is played by Clancy Brown, whose hometown was Urbana, OH and was born on January 5, 1959.

Data in a column should not be able to be broken down further into parts

Consider doing the following:

CREATE TABLE characters (
	info text
	);


INSERT INTO characters (info) VALUES 
  ('The character Kara is played by Valorie Curry, whose hometown was Orange County, CA and was born on Feb. 12, 1986.'),
  ('The character Connor is played by Bryan Dechart, whose hometown was St. Lake City, UT  and was born on March 17, 1987.');

or some variation of that scheme such as

INSERT INTO characters (info) VALUES 
  ('Kara, Valorie Curry,  Orange County, CA,  Feb. 12, 1986.'),
  ('Connor, Bryan Dechart, St. Lake City, UT, March 17, 1987.');

In both these cases all the data is jammed into one column. Note that we can divide the information in that column into recognizable parts. There is the part that is the character name, the part that is the actor name, and so on. So this representation violates Data in a column should not be able to be broken down further into parts and is not in First Normal Form.

Consider a representation like the following:

role	actor	hometown	born
Kara	Valorie Curr	Orange County, CA	1986-02-12
Connor	Bryan Dechart	St. Lake City, UT	1987-03-17
Markus	Jesse Williams	Chicago, IL	1981-08-05
North	Minka Kelly	Los Angeles, CA	1980-06-24
Hank Anderson	Clancy Brown	Urbana, OH	1959-01-05

This looks better. For each column we should ask whether the data in that column could be broken down further, with the caveat of ‘for our current purposes’. Maybe for our purposes it would be better to divide the actor column into firstname and lastname. And perhaps the hometown column could be divided into hometown and home state. This results in

role	firstname	lastname	hometown	homestate	born
Kara	Valorie	Curr	Orange County	CA	1986-02-12
Connor	Bryan	Dechart	St. Lake City	UT	1987-03-17
Markus	Jesse	Williams	Chicago	IL	1981-08-05
North	Minka	Kelly	Los Angeles	CA	1980-06-24
Hank Anderson	Clancy	Brown	Urbana	OH	1959-01-05

That first rule Data in a column should not be able to be broken down further into parts is a bit of a wishy washy “it depends” rule. If we might want to search by the homestates of actors, this last representation would be better. Don’t be compulsive about applying this rule. For example, looking at a slightly different table:

firstname	lastname	address	city	state
Ann	Mulkern	1290 Washington Ave	Virginia Beach	VA
Ben	Rhodes	107 Canyon Vista	Reston	VA
Matt	Carriker	227 FM 207	San Antonio	TX

we might be tempted to divide that address into the number and street name

firstname	lastname	street_number	street_name	city	state
Ann	Mulkern	1290	Washington Ave	Virginia Beach	VA
Ben	Rhodes	107	Canyon Vista	Reston	VA
Matt	Carriker	227	FM 207	San Antonio	TX

but for most applications this doesn’t matter. At some point you will have to say good enough. And dividing further would be silly. it’s subjective

Now the hard and fast rule about atomic data.

Do not have multiple values of same type in a single row

you can’t have several values of the same type in a single column
you can’t have multiple columns with the same type of thing.

An example will clarify this.

Here is some data we want to represent in a database table

Ann Mulkern works out of our Detroit MI office and knows Python and Javascript
Ben Rhodes works out of our Detroit MI office and knows Golang and Java
Clara Wieck works out of our Reston VA office and knows Python and R.
Samantha Fish works out of our Reston VA office and knows Python, Javascript, and Golang.

One way to represent this is

firstname	lastname	city	state	languages
Ann	Mulkern	Boston	MA	Python Javascript
Ben	Rhodes	Boston	MA	Golang Java
Clara	Wieck	Reston	VA	Python R
Samantha	Fish	Reston	VA	Python, Javascript, Golang

However, this violates condition 1 A row cannot have several values of the same type in one column
since Python and Javascript are the same type (programming languages) and there are two values of the same type in the languages column.

Option 2

Here is a slightly different way to represent the information:

firstname	lastname	city	state	language1	language2
Ann	Mulkern	Boston	MA	Python	Javascript
Ben	Rhodes	Boston	MA	Golang	Java
Clara	Wieck	Reston	VA	Python	R
Samantha	Fish	Reston	VA	Python	Javascript

This violates condition 2 multiple columns cannot have the same type of data since columns language1 and language2 contain the same type of information, namely programming languages the people know.

Later on we will learn how to design databases that have multiple tables and handle this beautifully. For now we can make this table meet the atomic rule by doing:

firstname	lastname	city	state	language
Ann	Mulkern	Boston	MA	Python
Ann	Mulkern	Boston	MA	Javascaript
Ben	Rhodes	Boston	MA	Golang
Ben	Rhodes	Boston	MA	Java
Clara	Wieck	Reston	VA	Python
Clara	Wieck	Reston	VA	R
Samantha	Fish	Reston	VA	Python
Samantha	Fish	Reston	VA	Javascript
Samantha	Fish	Reston	VA	Golang

This table may look a little wonky but it meets the atomic criteria.

first normal form reminder

Each row of data must contain atomic values
Each row of data must have a unique identifier called the primary key

In this database we don’t have a column that contains a unique identifier – meaning that makes each row unique. Later we will learn other options for primary keys but for now we will make an id column.

id	firstname	lastname	city	state	language
1	Ann	Mulkern	Boston	MA	Python
2	Ann	Mulkern	Boston	MA	Javascaript
3	Ben	Rhodes	Boston	MA	Golang
4	Ben	Rhodes	Boston	MA	Java
5	Clara	Wieck	Reston	VA	Python
6	Clara	Wieck	Reston	VA	R
7	Samantha	Fish	Reston	VA	Python
8	Samantha	Fish	Reston	VA	Javascript
9	Samantha	Fish	Reston	VA	Golang

Now we have a table in 1NF or first normal form.

declaring a primary key

there are two ways to declare a primary key in SQL

The first is on the line itself

CREATE DATABASE library;
\c library

CREATE TABLE books_new
(
  id SERIAL PRIMARY KEY,
  author TEXT NOT NULL,
  title  TEXT NOT NULL,
  acquired DATE
);

the second is at the end

CREATE TABLE users
(
  id SERIAL,
  name text,
  PRIMARY KEY(id)
);

You can see the results by using \d to describe the tables.

summary

so far we

learned how to install postgresql on our machine
learned how to initially create a password for the default user postgres
learned how to create a database
learned how to show all databases and to delete dbs.
learned how to create and delete tables.
learned the basic data types (text, integer, numeric, date)
learned how to insert data into a table.
learned about NULL
learned about SERIAL
learned about first normal form or 1NF.
learned how to declare a column to be the primary key – must be unique
learned how to see the structure of a database table \d
learned how to see the entries in a table using SELECT.

Previous section:

Postgres Getting Started

Next section:

Querying A Db