Wednesday, July 13, 2011

Take the time now for gains later.

Regardless of which data warehouse paradigm you follow or have heard of, Kimball or Inmon. We should all agree that the data warehouse is often a requirement for business. Different people want different things and they all want it from your data. The data warehouse is not a new concept and yet they are over looked at times. A warehouse is never complete, it is an evolving entity that adjusts with the requirements it is given. It is up to us to make sure that the access to enterprise data in an accurate and timely manner is easy and the standard. MySQL can handle a data warehouse perfectly.

MySQL databases are designed in numerous ways, some good some bad. A warehouse can take that data and organize it for the best use of others. What concerns or issues do you often hear when it comes to gathering data from your database? It is easy for all of your developers to query and get the same data? How many ways does your company slice and dice data? Who is your target audience for all of this data? Who wants just the important data? How well do you keep your results accurate?

If these are questions you can relate too but you do have not a warehouse then it is time to consider one. They can be considered to be, to big of a project for the current resources, they never work , poor data quality, always has data inconsistencies. These all are common complaints but those are not complaints of the warehouse but more of the execution behind it.

Use the tools available to you to keep the data updated. ETLs (extract , transform, load) are a requirement of course with data warehouse tables, use them to your advantage. Do not get lazy or let others get lazy. Fix data at the source and re-execute the etl. Do not let “quick fixes” that are not done at the source level. Remember, you do not have to back up the warehouse like you do your source, you can always rerun the ETLs.

Be ready to have eager users try to access your data before your ready to release. Be prepared for complaints. Keeping it accurate and easy to access is the key overall. Have it serve a need first, do not solve everything at once but make an impact with good quality fast data. As I said before a warehouses work is never done, you will continue to add data and available slices of data. Take the time to execute correctly now to have long lasting accurate data.

I am a very big supporter of data normalization but a warehouse also has a place and you can have big performance gains from using it.

There is a lot of resources on the net to help you with this: