ETL (Extract, Transform, Load), Data Conversion, Data Migration: What’s the Difference?
If you ask multiple people the question in this article’s title, you will get different answers.
This is nothing new in Quality Assurance and Testing. Just like if you were to ask multiple people their definition of Test Case vs. Test Script, or Test Plan vs. Test Strategy, you will get different definitions.
So, let’s define each using their names:
ETL is the process of Extracting data from one system (source), Transforming it to the needs of the system it will be loaded into (target), and then Loading the data into the target system.
This is usually done when a legacy system is being replaced with a new system or when loading a data warehouse or data mart.
Data Conversion is the process of taking data in the current system, converting (Transform) the data, format, attributes, etc., and keeping the data in the same system.
This is usually done when a current system is being upgraded with a new version or release. Think of it as the “T” in “ETL”.
Data Migration is the process of taking data from one system (Extract) and loading it into a new system (Load), but no real data transformation takes place.
This is usually done when moving from one set of hardware to another. Think of it as the “E” and the “L” in “ETL”.
So, if you put Data Migration and Data Conversion together, it spells “ETL”.
Now that we have the basic definitions down for each, let’s talk high-level about how to test each. With any type of software project, the testers need to review the requirements and interview business, technical, and project resources to figure out what testing is needed. The same is true with ETL, Data Conversion, and Data Migration projects.
There is no mystery to these types of projects. They are software projects that need requirements. The only difference is they are back end projects that require knowledge of the database, SQL, and the operating system they are run on. Testers do not need to be DBAs or system admins, but they need to know enough to get the testing completed. Other than this, they are regular software testing projects.
In order to test, you need three sets of requirements:
- Extraction Requirement so you know what source(s) data and data you will be working with
- Transform Requirement so you know how the data is transformed from source to target system
- Load Requirement (a.k.a. Mapping Requirement) so you know where and how the transformed data gets loaded in the target system
Just like any other software project – if you do not have requirements, how do you know what the system is supposed to do, and how do you test to prove the system meets the requirements?
The last piece in all of this, which is sometimes overlooked until the 800-pound gorilla wakes up and says, “Hey, what about me?” is the quality of the source data. If the quality of the source data is low, it needs to be improved. There is no way around this. To prove my point: if there is source data that is missing, how do you extract, transform, and load it, if it does not exist? The older the legacy system(s), the more likely that data quality issues exist. Pushing low quality data through an ETL process only gives you low quality data at the target system, at best. Low quality data in the target system only gives you a low–quality target system, and that is not the goal, is it?
Even though determining and improving the quality of the source data is more of a data improvement plan/task, the business must execute at the beginning of a project. Without it, you really do not know how good or bad the source data is. Without that knowledge, testing becomes much harder, costlier, and inefficient.
Reach out to Olenick to start a conversation about how we can support your project’s ETL needs.