ETL, Data Conversion, and Migration: The “E” in ETL

Data Migration Diagram

Now that we understand that Data Migration/ ETL Testing is “all about the data,” the quality of the data is being investigated, and plans are being developed on who, what, when and how long it will take to clean and scrub the source data (if needed), it is now time to discuss the test approach of the extraction process, the “E” in ETL.

 

I am an advocate for using a copy of source production data for this testing. What better way to test the extraction process – and for that matter the entire ETL process – than testing on a copy of source production data? And yes, masking all the personal identification information (PII data) before the data is copied to the source test system is needed and any NDAs signed, if needed. I am also an advocate of using the full load of data. This gives you all the twists, turns, and “gotchas” that production can throw at you and gives you an early indication of the performance of the extraction process.

 

Just a word of caution on masking PII data: this masking needs to keep the transformation rules of this data in mind while it is being masked. A tester should be involved in this process to ensure all test data is set up to test the transformation rules in this masked data. You do not want to find out when you run this in Production that the transformation rules do not work because the PII masking removed all the complicated data, and a lot of the transformation rules were never tested.

 

As with any IT project, you need the approved Extraction requirements. These tell you what data needs to be extracted, how much, how far back in time, and from with source systems (i.e. old data bases, current production systems, access data bases, excel files, etc). You need the extraction requirements, so that you can prove the extraction processes are functioning according to these requirements.

 

Extraction processes usually take the data from the source systems and load it into a staging system that will then be processed. Usually the staging database structure is the same as the source database structure, and the staging file system structure is the same as the source file system structure. Remember the extraction process is doing just that: extracting the needed data and placing it where the next processes know where to access it.

 

Before you begin testing the extraction process, verify all the staging database structures match the source database structures, and the staging file system structure matches the source file system structure.

 

Here are a few key extraction tests that should be included, but of course with every project there are unique requirements:

  • Verify all needed tables were extracted and loaded into the staging database(s)
  • Verify all master records required were extracted and loaded into the staging database(s)
  • Verify all transactional records required were extracted and loaded into the staging database(s)
  • If there is an extraction cut-off time (i.e. loading data warehouses nightly), make sure the records around this cut off time are in either this run or the last run, but not both
  • Number of files ported over from the source to the staging file system
  • Test the project’s unique extraction requirements
  • And many more…

 

Like I said, the main thing ETL, Data Conversion, and Data Migration have in common is “Data.” Whether the data is from old databases on old Mainframe or Client Server applications, Access databases, Excel files, or anywhere else data hides, at this point you now have the exact, high-quality data you are required to have extracted and it is now ready to be pushed through the Transform and Load processes.

Ken Stawarz Headshot

Ken Stawarz