Eat The Lasagna? No. Everybody Teaches Love? No, but not a bad idea. Engineers That Last? That’s not it either but the mention of engineers is getting closer.
Wikipedia says data extraction is where data is extracted from homogeneous or heterogeneous data sources; data transformation is where the data is transformed for storing in the proper format or structure for the purposes of querying and analysis; data loading where the data is loaded into the final target database, more specifically, an operational data store, data mart, or data warehouse. That’s a mouthful. Let’s simplify this for those who do not spend their free time reading Data Warehouse design manuals.
ETL is one of those technical buzzwords that is thrown around often. Once the acronym is broken down into its pieces and parts it really is simple to understand. ETL stands for Extract, Transform, Load. It is commonly associated with a Data Warehouse and is related to how data moves from external sources into the Data Warehouse.
The “E” refers to how the data is extracted, or retrieved, from a variety of potential sources. The most common sources are databases and flat files. (Flat files would encompass text files and Excel spreadsheets.) All of the standard ETL tools have connectors that let you easily connect to all of the common flavors of databases as well as any type of flat file. As a result of all of this flexibility, you can easily read a plethora of source data. If it exists, and you would like it to be in your Data Warehouse, it is possible.
The “T” alludes to how you would like the extracted data to be modified/enhanced / changed prior to it reaching its final resting place. This is where all of a company’s business rules would be implemented, putting “intelligence” into the data and making it accurate. This step removes the need to do manual conversions of the data later which eliminates errors. There are a variety of ways that data can be transformed on the way in. Below are some simple examples:
- You want to verify that an email address is properly formed
- You want to be certain that a piece of data is numeric
- You want to join two pieces of data together
- You want to validate that a field has a certain value
- You want to ensure that a date has a consistent format
- You want to derive/manufacture a new column based on business rules
In other words, the ways that data can be transformed are endless. There is extreme power and flexibility in the “T”. We at MacLaurin Group have 20+ years experience coding in ETL and some of the more interesting transformations include derived attributes called indicators. These can be used to capture some of the more challenging institutional events (i.e. – date customer acquired or first time a customer responded) which ensures consistent reporting as these important event markers are interpreted and retained through the ETL process.
The “L” is where the manipulated data is written to a database. The data is now available for business consumption. With the right analytics, it becomes actionable. Any data driven organization can now make game changing decisions.
In conclusion, ETL is nothing to be intimidated about. It is simply reading data, manipulating data, and writing data. The ultimate goal is for the data to be pristine for the decision makers. Any solid Data Warehouse developer should be familiar with at least one ETL tool, such as Informatica or SQL Server Integration Services, and be able to easily implement the rules that the business would desire.