Data Warehouse Consulting
Thank you for your interest in the data warehouse consulting work that I do. Data warehousing sometimes means different things to people, so I’ll clarify what it means to me, but first I should tell you what services I can provide as a data warehouse consultant, and the roles I can assume.
My Roles as a DW Consultant
- Assessment: determine if you need and can afford to build a DW now, or determine a smaller data preparation solution for reporting.
- Design: provide a database design, including a design for the ETL process that will populate and refresh the data warehouse.
- Build: do the actual development work that results in the solution that will build the data warehouse and keep it refreshed with your latest data.
- Validation: compare the results in the data warehouse with other existing reports data or values from the applications.
I can do all of that, or just some of it. I am also willing to work with your existing developers and advisors to bring it all together. Before I go any further, let’s make sure we are on the same page. Here are some explanations of data warehousing to clear up any misconceptions.
Data Warehouse Definitions
Definition of Data Warehouse: Very simply, a data warehouse is a single source of data that is used for reports and dashboards. It often contains data that comes from different places. For this reason, the main effort to create a data warehouse is developing the integration, the “ETL” jobs, to get the data and write it into a model that is optimized for reports and dashboards.
Definition of ETL: It stands for extract, transform, and load. Sometimes it’s called “ELT” because the load may happen before the transformation. Personally, that distinction is not important to me, but if you’ve come across those terms you should know they’re both fundamentally the same. ETL/ELT represents the recurring jobs that flow your data from existing sources to the data warehouse. ETL solutions can be developed using another acronym, SSIS (SQL Server Integration Services).
Definition of Data Mart: A data mart is a subset of a larger data warehouse. Some companies build an enterprise-wide data warehouse (EDW), and from that they can spin off smaller data marts for departments, or to accommodate multiple, differing technical requirements.
There are two main types of modeling a data warehouse, and they are often identified by the names of their inventors, Kimball and Inmon. We can skip the theory, but you have to pick one, and a popular choice is the Kimball method, often called, dimensional modeling. When someone recommends a star schema this is the model they are talking about.
In a dimensional model, data is directed into one of two groups:
Facts are quantitative information, such as the value of a sale, or an amount that is due. Typically, facts are staged into fact tables in SQL and the fact records relate to dimensions.
Dimensions describe the entities that the facts belong to, such as a customer’s name, or a store location. You might have a Customer Dimension and each record in that table will be a unique customer.
When designing a data warehouse, you need determine the granularity of your facts (aka “the grain”). This can be challenging, because you might prefer to have what is called a “transactional grain” (every transaction), but one of the other sources of data only has snapshots, like a daily or weekly summary.
It’s strongly preferred to use only one grain in a data mart. If multiple grains are unavoidable, one way to deal with it is to dedicate different data marts to different grains. This is the type of topic that needs to be discussed during the design of a data warehouse. I can facilitate this kind of discussion, warn you about consequences, and help you make informed decisions.
Data Warehouse Plans and Actions
If you already have a Data Warehouse I am able to add to it, make changes, extend it into a Tabular Model, a Power BI data model, or connect to it and develop dashboards and reports. If you don’t have a Data Warehouse, I can provide some or all of the work that it takes to build one.
Before building anything, though, you need a plan. I can do this for you, too. Here is some of what we would need to assess in order to complete a plan:
- Describe the desired reports and KPI
- Identify the data contained on those reports
- Determine the best sources of data (and log possible complications)
- Clarify all technical requirements and constraints
- Prioritize the work, possibly dividing it into multiple releases if appropriate
- Agree upon a schedule
- Is there an existing DW? If so, are the values that it contains going to be used for validation? What if it is wrong? Is the objective, first, to match legacy DW results and then fix problems in later phases?
It is a good practice to break up the work into the smallest (reasonable) tasks, and then estimate the effort for each of those tasks. By doing so, you are likely to come up with accurate milestone dates. The following image is an excerpt from a project plan formulated like that. I estimated the time to complete every task.
2. Data Warehouse Design
If you do not already have a formal, written document to guide the development of the data warehouse it is absolutely necessary to develop a design. As a matter of fact, you cannot estimate the timeline, and henceforth, cannot estimate the cost of the project until the design is completed.
This creates a somewhat awkward scenario, needing to commit to an estimate for design services without knowing what a follow-on estimate for the build will be. It helps to discuss this and identify best-case and worst-case estimates. More importantly, it demands that you are working with someone who is trustworthy.
The best way to mitigate risk is to settle on larger, overall plan to build the data warehouse in phases. Each phase increases the capabilities. Done properly, a phased rollout does not lengthen the timeline or the cost.
3. Build and Implement
The build of the data warehouse is a completely hands-on effort. Most of the effort is the creation of the ETL processes that will integrate existing sources of data with the data warehouse.
There is a lot of testing that happens mid-stream during the build phase, however, when the entire process is complete, it is necessary to conduct a formal validation. This is a collaborative effort. In my opinion, this is best guided by the development side, but the actual validation is completed by the client.
Conclusion: Data Warehouse Consulting
So Many Nuances, You Need Someone You Can Trust
It’s nice to spell this all out in a simple web page, but there are so many nuances to each phase of these projects. That provides a good enough reason to break the large, overall implementation into smaller, phased releases. Also, it makes communication and trustworthiness essential. I believe that I am a good fit for your data warehouse project, and I encourage you to start the conversation. Please send me an email, and we can schedule a meeting to talk about what you want to achieve. Thank you.