Normalization

       The purpose of normalization is to avoid redundancy and inconsistency. Redundancy is the phenomenon, that I repeat myself - it's relations that I don't want to repeat. If e.g. I want to save in my database, that the player named Jean is a man, it would be practical to only save it one place. If I have a giant table, keeping track of all players, chat rooms and so on, I probably have to write more than once, that Jean is a man. This is redundancy, and I want to avoid it. Inconsistency is the phenomenon, that I try to repeat myself, but don't succeed. Maybe I made a typo, and somewhere it says, that Jean is a woman.
       Another problem connected to not having a normalized database is, that I might delete data, I wanted to save. If Jean isn't chatting anymore, I might delete his password and gender. And that's not fair, just because the man doesn't want to chat 24/7.
       Before I begin normalizing, I need some concepts in place. A primary key is a column, where all the values are unique, and where every value somehoe corresponds to the whole row. A phone company would probably have a table, where the primary key is the phone number. CPR (Central Person Register) probably has a table with CPR-numbers as a primary key. A primary key can also be more than one column - then it's a composite primary key. E.g. a table to keep track of which players chat in which rooms, would have a primary key composed of the IDs for players and chat rooms. Later when we cut tables up, there will typically also be a foreign key. E.g. the table with the composite key above, will also have a foreign key to the player table. With a foreign key we can also accomplish something else: referentiel integrity. If I have a table with all the registrered players, I can make sure, that there isn't suddenly a non existant player chatting.
       And then there's the normalization itself - see the rules elsewhere.
       Once I'm this far, the next step is to write a data dictionary. Here I register the name of every table, and for every table the name of every column. Apart from the name, I register the data type, whether the field might be empty and whether it's a primary and/or foreign key. The datatype can be integer, string, date and so on. An interesting datatype is autonumber, where the database makes sure, every row created gets the next number.

Concept last updated: 06/05 2004.

Relations

Normalization

optimizes

Database

is a part of

Good programming

also is

The 3 rules of normalization

Other sources

Databaseteori; Alf (obl.) - 1.2
Normalisering, eksempler (obl.) - Dataordbog