The point of this post is not to establish credibility as an authority on how SQL developers should name the tables and columns they create. My aim is just to get the point across that I have good habits, that’s all. There are many articles on the web which advocate naming conventions of various types with sound reasoning, but I would not count this blog post as one of them.
Business Intelligence developers often use two different naming conventions, depending on the use. The conventions for each are completely different. Here are the two types:
- Naming Conventions for SQL data
- Naming Conventions for Analytics
SQL Data Naming Conventions
The following are some conventions that I use in SQL data, not counting SQL data warehouse databases. I will cover those within the Analytics section. The main point of having a convention is to be consistent. Granted, saying it like that is redundant, but the point to make here is not so much the specifics that I list out. Those can be up for debate. Just be consistent.
- If an existing convention is in place and it was consistently used, it’s best to continue using it (even if it goes against your usual practice).
- Do not use spaces in names. The use of capital letters can take the place of needing spaces.
- Used mixed case, avoiding all caps unless there is a good reason.
- SQL objects are named in Pascal Case, what people often call Camel Case. Examples: PascalCase and camelCase. Pascal Case capitalizes the first word, and Camel Case does not.
- SQL Views are prefaced by a lower case “v” so they can be recognized as views (they are not tables) where views are used. A variation on this is to use “v” and an underscore. I try to limit uses of underscores.
- Stored Procedures are prefaced by a lower case “sp” so they are recognized as being stored procedures when appearing in code.
- I prefer that tables are named as the singular of the record contained in them, for example: Employee, City, Product. This is different in analytics databases such as SSAS Tabular.
- Tables used for data transformation which receive imported, raw data (untransformed) should mirror the data source’s column names, even if this breaks naming convention standards. If the source file has a column called “TRANS_DATE” then the import table should have that same named column. Change the name during the transformation.
- Primary Keys, normally integers, will be named by appending “ID” to the name, such as EmployeeID. Plus I will usually name this column the same as it’s table name (or close to it), such as Table: Invoice and Primary Key Column: InvoiceID.
- Foreign Keys, used as columns in tables (where they aren’t the Primary Key) must be named exactly the same as they are named in the table where they are the Primary Key. This makes writing JOIN criteria much easier — such as TableA.EmployeeID = TableB.EmployeeID
- Don’t use spaces in table names, even though you can. When you use spaces you have to use brackets to reference the tables without error in code, such as this: [Frustratingly Named Table] instead of: WellNamedTable. This convention is also different in the context of analytics.
I mentioned earlier that there are some additional conventions for SQL data warehouses. In particular, I go against my earlier stated goals of avoiding all caps and also the use of underscores. A traditional data warehouse (Kimball method) has tables for facts and dimensions.
- I name a dimension table with the prefix “DIM” followed by an underscore. Example: DIM_Customer
- I name a fact table with the prefix “FACT” followed by an underscore. Example: FACT_InvoiceLineItem
- Another commonly seen practice is to use a lower-case “d” or “f” to denote dimensions and facts, such as dCustomer or fInvoiceLineItem. If that is already in place, I would continue to use it, but if I am building a new solution, I would not choose that naming convention unless it is required in the specs.
The conventions regarding facts and dims are very standard, a practice used by many data warehouse developers. I also, then, extend that style for other tables in a data warehouse such as the following:
- IMPORT_CustomerList would hold a flat-file import of a customer list
- TEMP_Customer1 would hold the first transformation of customer data after import if a TEMP table is needed
- STAGE_DIM_Customer would hold the last transformation of customer data prior to building the final dimension table, DIM_Customer
- The same goes for all dimensions and facts, too. The last table prior to building the production-used dimension or fact is named with the prefix “STAGE” followed by an underscore.
The use of underscores is called “snake case” but what I describe is not quite that. Snake case would have a tables called “fact_invoice_line_item.” However, I use the underscore only to bump off of the DW entity type, and then I resort back to Pascal Case. It all sounds kind of complicated written out like this, but it’s actually easy to follow in practice.
One more thing about the SQL DW naming, if the users are granted permission to the SQL DW database (if there is no additional layer such as SSAS Tabular), then I use a schema called “etl” and I put all the non-production tables in it. The advantage of this is that you can deny permissions to see the etl tables and the users are left only to see the actual DIM and FACT tables, as they should.
Analytics Naming Conventions
I’ll define “analytics” as meaning data models that are either published by a server such as SQL Server Analysis Services or modeled within a tool such as Power BI. In this environment, the conventions listed above do not apply. There is a more important guiding principle: name things in a way that it makes sense on a report (or can be understood by an analyst/report designer).
- Use spaces. Do not mash-up names with Pascal Case and/or underscores. Example: Invoice Line Items
- Use plurals. Do not hold to the long-time database standard of naming something in the singular. Example: Customers
- Hidden objects can use the previously outlined conventions as a hint that they are hidden from the report user. Hidden objects might be the scalar, column-based fact which has not yet been defined as a measure. Example: LineItemExtendedAmount (as hidden, scalar value), as a measure it might be Line Item Amount or Sale Amount.
- Derived measures should be named first with the name of the base measure, and then the specificity of it’s derived value type. For example, you may have a base measure called “Hospitalizations” and also derived measures from that, such as Hospitalizations YTD, Hospitalizations Annualized, Hospitalizations L7, etc.
- Don’t use “DIM” or “FACT” as a prefix in the analytics data model, unless that is already in place in a deployed solution and you prefer to leave that alone.
SSIS Package Names
Here is one more thing on this topic — the name of SSIS packages. If I am working in a solution comprised of many SSIS packages, I like to name the packages with a number so I can sort them in the order they are executed.
This is not necessary, but I find that it saves me time when I am debugging. It’s easy to see the order of events. I usually start with numbers like 100, 200, 300, etc. Then later I add in incremental steps such as 110, 120, 130, and maybe even 111, 112, etc.
TLDR; Just Be Consistent
This article went into more detail than is needed for a non-technical explanation. Plus, the “best” conventions are open to debate. Personally, I have found that the most useful sources of info on naming conventions are published with a discussion format, where people present different ideas and discuss the merit of each. Here is a good discussion on Stack Overflow:
Rest assured, I value consistency as being the most important quality, and I don’t present any idea as universally correct. I’m happy to discuss these details with anyone who cares to question the rules I’ve adopted.