Normalisation - data modeling for relational Databases

(Benjamin Kampmann) #1

At one day or the other you’ll run into having to model your Data. Data is an important topic in Computer Science as it is ultimately what most operations are about and done to. Modelling the data correctly therefore has been a topic of discussion for a long time.

There are a lot of database models out there. Most commonly known ones right now are the Document Model (as it done by many NoSQL-Database) and the Relational Model as it is implemented by most SQL-Databases. If you are working with one of the big Web-Frameworks like Django and Rails, then you are already running on an SQL-Database though they both abstract it away using their own Object-Relational-Mappers (Making SQL-table rows into objects). But the underlying technique is an SQL Database.

The idea of relational Databases is to multiple tables each one have a single source of truth and through joins (the combining of two rows) to create the full set of information needed at a given point in time. A classic example of that would be the Person and their Address. Each being there own table. Now if a second person entry is created instead living at the same address instead of creating a second address it is just pointing to the existing one, too. This has the benefit of a small storage footprint and simplifies certain activities, like when the street gets renamed it only has to change that one entry.

The common method to figure out what should be stored where is called “normalisation” and most relational database expect a normalised data pool for best performance. The Wikipedia article, though a little theoretical, is a really good resource for learning and understanding this model.

After going through the links in this article and the Wikipedia article, try a real world example of normalising a dataset. For example take your music collection, pick five albums of three artists and write down all of their songs into a table with title, artist, album name, length, track number and so forth. Now normalise it using the technique you’ve just learned in the article.

You’ve just successfully normalised your first dataset!