Show Menu

Rules for a Better SQL Schema Cheat Sheet by

Better SQL coding
sql     database     programming     optimize     schema

Introd­uction

There are a lot of decisions to make when creating new tables and data wareho­uses. Some that seem incons­equ­ential at the time end up causing you and your users pain for the life of the database.

We've worked with thousands of people and their databases and, after countless hours of reading and writing queries, we've seen almost everyt­hing. Here are our top 10 rules for creating pain-free schemas.

1. Only Use Lowercase Letters & Unders­cores

Don't use dots, spaces, or dashes in database, schema, table, or column names. Dots are for identi­fying objects, usually in the databa­se.s­ch­ema.ta­ble.column pattern.
Having dots in names of objects will cause confusion. Likewise, using spaces in object names will force you to add a bunch of otherwise unnece­ssary quotes to your query:

select "user name" from events
-- vs
select user_name from events

Queries are harder to write if you use capital letters in table or column names. If everything is lowercase, no one has to remember if the users table is Users or users.
And when you eventually change databases or replicate your tables into a warehouse, you won't need to remember which database is case-s­ens­itive, as only some are.

2. Use Simple, Descri­ptive Column Names

If the users table needs a foreign key to the packages table, name the key packag­e_id. Avoid short and cryptic names like pkg_fk; others won't know what that means. Descri­ptive names make it easier for others to understand the schema, which is vital to mainta­ining efficiency as the team grows.
Don't use ambiguous names for polymo­rphic data. If you find yourself creating columns with an item_type or item_­value pattern, you're likely better off using more columns with specific names like photo­_co­unt, view_­count, transa­cti­on_­price.
This way, the contents of a column are always known from the schema, and are not dependent on other values in the row.

select sum(i­tem­_value) as photo_­count
from items
where item_type = 'Photo Count'
-- vs
select sum(p­hot­o_c­ount) from items

Don't prefix column names with the name of the containing table. It's generally unhelpful to have the users table contain columns like user_­bir­thday, user_c­rea­ted_at, user_n­ame.
Avoid using reserved keywords like column, tag, and user as column names. You'll have to use extra quotes in your queries and forgetting to do so will get you very confusing error messages. The database can wildly misund­erstand the query if a keyword shows up where a column name should be.

3. Use Simple, Descri­ptive Table Names

If the table name is made of up of multiple words, use unders­cores to separate the words. It's much easier to read packag­e_d­eli­veries than packag­ede­liv­eries.
And whenever possible, use one word instead of two: deliveries is even easier to read.

select * from packag­ede­liv­eries
-- vs
select * from deliveries

Don't prefix tables to imply a schema. If you need the table grouped into a scope, put those tables into a schema. Having tables with names like store_­items, store_­tra­nsa­ctions, store_­cou­pons, like prefixed column names, is generally not worth the extra typing.
We recommend using pluralized names for tables (e.g. packages), and plural­izing both words in the name of a join table (e.g. packag­es_­users). Singular table names are more likely to accide­ntally collide with reserved keywords and are generally less readable in queries.

SQL Optimi­zation

 

4. Have an Integer Primary Key

Even if you're using UUIDs or it doesn't make sense (e.g. for join tables), add the standard id column with an auto-i­ncr­eme­nting integer sequence. This kind of key makes certain analyses much easier, like selecting only the first row of a group.

And if an import job ever duplicates data, this key will be a life-saver because you'll be able to delete specific rows:

delete from my_table
where id in (select ...) as duplic­ate­d_ids

Avoid multi-­column primary keys. They can be difficult to reason about when trying to write efficient queries, and very difficult to change. Use an integer primary key, a multi-­column unique constr­aint, and several single­-column indexes instead.

5. Be Consistent with Foreign Keys

There are many styles for naming primary and foreign keys. Our recomm­end­ation, and the most popular, is to have a primary key called id for any table foo, and have all foreign keys be named foo_id.

Another popular style uses globally unique key names, where the foo table has a primary key called foo_id and all foreign keys are also called foo_id. This can get confusing or have name collisions if you use abbrev­iations (e.g. uid for the users table), so don't abbrev­iate.

Whatever style you choose, stick to it. Don't use uid in some places and user_id or users_fk in others.

select *
from packages
join users on users.u­ser_id = packag­es.uid
-- vs
select *
from packages
join users on users.id = packag­es.u­ser_id

-- or

select *
from packages
join users using (user_id)

And be careful with foreign keys that don't obviously match up to a table. A column named owner_id might be a foreign key to the users table, or it might not. Name the column user_id or, if necessary, owner_­use­r_id.

6. Store Datetimes as Datetimes

Don't store Unix timestamps or strings as dates: convert them to datetimes instead. While SQL's date math functions aren't the greatest, doing it yourself on timestamps is even harder. Using SQL date functions requires every query to involve a conversion from the timestamp to a datetime:

select date(f­rom­_un­ixt­ime­(cr­eat­ed_at))
from packages
-- vs
select date(c­rea­ted_at)
from packages

Don't store the year, month, and day in separate columns. This will make every time series query much harder to write, and will prevent most novice SQL users from being able to use the date inform­ation in this table.

select date(c­rea­ted­_year || '-'
|| create­d_month || '-'
|| create­d_day)
-- vs
select date(c­rea­ted_at)

7. UTC, Always UTC

Using a timezone other than UTC will cause endless problems. Great tools (including Periscope) have all the functi­onality you need you convert the data from UTC to your current timezone. In Periscope, it's as easy as adding :pst to convert to from UTC to Pacific Time:

select [creat­ed_­at:­pst], email_­address
from users

The database's time zone should be UTC, and all datetime columns should be types that strip time zones (e.g. timestamp without time zone).

If your database's time zone is not UTC, or you have a mix of UTC and non-UTC datetimes in your database, time series analysis will be a lot harder.

8. Have One Source of Truth

There should only ever be one source of truth for a piece of data. Views and rollups should be labeled as such. This way consumers of that data will know there is a difference between the data they are using and the raw truth.

select *
from daily_­usa­ge_­rollup

Leaving legacy columns around like user_id, user_i­d_old, user_id_v2 can become an endless source of confusion. Be sure to drop abandoned tables and unused columns during regular mainte­nance.

9. Prefer Tall Tables without JSON Columns

You don't want to have super-wide tables. If there's more than a few dozen columns and some of them are named sequen­tially (e.g. answer1, answer2, answer3), you're going to have a bad time later.

Pivot the table into a schema that doesn't have duplicated columns - this schema shape will be a lot easier to query. For example, getting the number of completed answers for a survey:

select
sum(
(case when answer1 is not null
then 1 else 0 end) +
(case when answer2 is not null
then 1 else 0 end) +
(case when answer3 is not null
then 1 else 0 end)
) as num_an­swers
from surveys
where id = 123
-- vs
select count(­res­ponse)
from answers
where survey_id = 123

For analysis queries, extracting data from JSON columns can greatly degrade a query's perfor­mance. While there are a lot of great reasons to use JSON columns in produc­tion, there aren't for analysis. Aggres­sively schematize JSON columns into the simpler data types to make analysis a lot easier and faster.

10. Don't Over-N­orm­alize

Dates, zip codes, and countries don't need their own tables with foreign key lookups. If you do that, every query ends up with a handful of the same joins. It creates a lot of duplicated SQL and a lot of extra work for the database.

select
dates.d,
count(1)
from users
join dates on users.c­re­ate­d_d­ate_id = dates.id
group by 1
-- vs
select
date(c­rea­ted­_at),
count(1)
from users
group by 1

Tables are for first class objects that have a lot of their own data. Everything else can be additional columns on a more important object.

Download the Rules for a Better SQL Schema Cheat Sheet

3 Pages
//media.cheatography.com/storage/thumb/davidpol_rules-for-a-better-sql-schema.750.jpg

PDF (recommended)

Alternative Downloads

Share This Cheat Sheet!

 

Comments

No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          SQL Server Cheat Sheet
          Essential MySQL Cheat Sheet

          More Cheat Sheets by Davidpol

          Nonverbal Pain Indicators (CNPI) Cheat Sheet