ETL, Data Conversion, and Migration: The “L” in ETL
Now that we’ve determined how to test the translation of the data from the source system to the target system, it is now time to discuss testing for the loading of the data into the Target system, the “L” in ETL.
As with any IT project, you need the approved Load requirements. These tell you how the data is mapped from the source system to the target system, schemas, tables, columns, and directory structures. These need to be very detailed and list out each columns’ mapping from source to target for each record for each table. They also explain where the data contained in the files are loaded. Some files from the source to the target may remain files, while others may have their data incorporated into the database. You need the Load requirements, so you can prove the load processes are functioning according to these requirements.
The load process is where the source data is moved (migrated) to the target system. Below are common types of data load entities that need to be tested:
- Row Counts
- Data has been loaded into the correct database schemas, tables, and columns
- Files have been loaded into the correct directory structures
- All Triggers have been enabled
- All Indexes have been enabled
- All Constraints have been enabled
- The business selects some data that they will follow throughout the process to ensure it is what they expect
- Any other unique or special test cases that need to be validate for your particular project
This testing can and should be executed in conjunction with testing the transformation processes. While you are verifying that the data was transformed correctly, verify that is was loaded correctly. Also, verify row counts between the source and target systems. Have the business verify the data they selected to follow is correct. Testing that the data was loaded to the correct areas in the target system should not be challenging at all.
Now: you may be asking why I have triggers, indexes, and constraints listed.
Triggers – If your target application uses triggers in its day-to-day processing, you most likely will need to disable these when the ETL processes are running. This is because these triggers may fire when the master or transaction data is being loaded, and now your system will have duplicate records or start throwing errors. This is because you are loading the data that the triggers should have created. If you encounter this situation, you will need to document each trigger you are disabling. Then, once the ETL processes have completed, the triggers will need to be enabled and must be verified that they are set correctly.
Indexes – If your target application uses indexes, you should disable these while the ETL processes are running. his is because the database may try to re-index a table after every record loaded. This will slow the ETL process down to a crawl, so turning these off will allow for optimum performance and turning them back on once the ETL processes have completed will allow the table(s) to be indexed. Verify the indexes are set correctly once you enable them.
Constraints – From my previous experience, I find disabling constraints while the ETL processes are running allows it to complete without hitting constraint issues. In other words, it is better to let the ETL processes complete and then turn on the constraints and hit an issue, than to have the ETL processes hit a constraint issue while in the middle of processing and then stop or throw errors. This allows you to fix the constraint issue(s) while still able to test other things because the ETL processes completed. Verify the constraints are set correctly once you enable them.
At this point, you now have the test cases that allow you to test the load requirements against the “L” in the ETL processes and have the ability to repeat this testing at will.
Subscribe to Olenick Expertise to receive more informational series content, right to your inbox.