Database Design Practices
- Written by Janeve George
This document lists guidelines for designing database schema. Major key concerns to be kept in mind are:
- Data Integrity
These might have side-effects on each other, for example, avoiding usage of functions and stored procedures increases portability but restricts performance. The DB Designer must make the best of all possible choices to make a very reliable and robust database for an application.
Design for Performance
The database schema should be designed to perform the best. The turnaround time for queries should be minimum. Few steps that we could take in improving the database performance are:
- Store data in the format it is expected to be used by the end-user. By doing so, we could directly work on the data retrieved and there is rarely any need to convert the format of the data.
- Avoid the usage of NULLs wherever possible. 'NULL' is not a value that we store in the database column, it means that we do not know the value. Having 'NULL' values basically violates relational database concepts. It does not provide any relevant information + it adds to the processing of determining is the column is NULLable + other weird issues. Sometimes it might also increase the IO overhead. Another issue is whenever you execute a query with 'IS NOT NULL' or 'IS NULL' where clause, the database performs a sequential scan on the table because 'NULL' cannot be indexed.. ouch... It would be better if we use some explicit value like 'NA' or 'NULL' (as literal string ofcourse), etc.
- Use 'vharchar' instead of 'char' if a column needs to be nullable or if the size of the value is variable. By doing so, we could save on space hence IO time.
- Every base tables should have a primary key. Primary keys improves the performance (if there is huge number of records). You may use a self incrementing primary key if necessary, but it is advisable to use natural keys wherever possible.
- Design database for minimal user contention. Almost all applications that uses these databases are multi-threaded. Wiping out user contention will be impossible. but we have to try and minimize the contention as much as possible through proper database design. This would improve the performance considerably.
- Use indexes wherever possible. Indexes are the best and most commonly used way to improve the performance of select queries. The drawback of indexes is that it turn out to be heavy for modify queries like Insert, Update and Delete. I would recommend to use minimal indexes for transaction tables. In case you are creating multi-key index, it is best to have the most unique column to come first.
- Archive records in transactional tables often and in small chunks. All records that have served the purpose should be archived so that it gives better indexing and reduces IO wait.
Design for Portability
The database schema should be portable. There are numerous database vendors in the market. At some point in time we might require to migrate the data to a different database platform. The schema should be designed in such a way that the migration would not become a pain at any time in the future if necessary. Few steps that we could take are:
- Environment and future expansion must be kept in mind. As an extension to develop a scalable application, we should have a scalable platform that supports it. The database schema should be scalable.
- Use data types supported by all databases. Some database might provide some special data types. It is better to avoid these data types which might be specific to a particular database vendor. Most common data types are: integer, char, char varying, float, decimal.
- Use size and lengths of entities such that it is supported by all databases. It is best if we use the entity names and data type size that is supported by all database.
- Max. database name Length = 8
- Max. table name length = 30
- Max. column name length = 30
- Max. index name length = 30
- Max. char size limit = 254
- Max. varchar size = 4000
- Max. columns per table = 1000
- While designing keep in mind the most commonly used database available in the market. Some of the most popular database used are:
- MS SQL Server
- Using vendor specific features and data types should be documented properly.
Design for Data Integrity
Data integrity is an important aspect that database designers ponder over.
- Strive for a 'Star' schema design when optimizing for retrievals.
- You should add standard columns to all tables. These columns may be omitted in performance sensitive tables to minimize overhead and performance impact. Following columns supports data integrity of the database tables.
- updated_by(optional), updated_at
- created_by(optional), created_at
- lock_version (for optimistic locking)
- Always design to third normal form. De-normalize a table only as a last resort.
- Every base tables should have a primary key. It not only helps in performance, but also in data integrity.
- Use Foreign key for referential Integrity.
- Index both sides of foreign key
- Use integer keys as foreign keys for faster retrieval
- Foreign keys may slow down transactions
- Minimize the number of columns that form a foreign key
- Use History and Audit Tables.
- For Master tables plan on optional audit or history tables, but not both per master table
- History Table - History tables record row changes. The current row is inserted into the history table before making any update on the row of the master table. History tables will have exactly identical columns as that of master table.
- Audit Table - Audit tables record columns changes. Each column changed in master creates one audit record. Columns in audit table are: column_name, before_value, after_value, updated_at, updated_by.
- Do not use these for tables that are updated very frequently like transaction tables.
Design for Readability
The database schema should document itself. We have to provide appropriate nomenclatures for the entities so that they convey what the represent through their names. Some naming conventions are:
- Names should be readable, not cryptic
- e.g: students, teachers, etc
- Minimize use of abbreviations or use standardized abbreviations
- Separate words with '_'
- Use only lower case for table names.
- Do not use any special character
- Do not use names that are used as special keywords in any of major RDBMS. See attachment DBReservedWords.xls.
- Table Naming Convention
- Pluralize - use domains, not domain
- Suffix history tables with 'history' and audit tables with 'audit'. Example history of the table called application_preferences should be stored in application_preferences_history.
- Column Naming Convention
- Do not use different names for columns which are very similar. If you are using state in a particular table say country, you should not use any other alterations, like province, anywhere in the database.
- Use 'id' as the name of primary key on all tables
- Use <tablename>_id for foreign key columns