ETL, Data Conversion, and Migration: The “T” in ETL
Now that we have a plan to get the data to a high-quality level, and we know what data we need to work with, it is now time to determine how to test the translation of the data from the source system to the target system: the “T” in ETL.
Usually the Translate and Load processes are run together and tested as one process, but for this blog I am breaking them up into two separate test processes.
As with any IT project, you need the approved Translation requirements; these tell you how the data will be translated from the source system to the target system. The Translation requirements need to be very detailed and list out each column’s translation for each record for each table, along with explanation of how the files are translated. Some files from the source to the target may remain files, while others may be incorporated into the database – the translation requirements need to tell you all of this. You need the translation requirements, so you can prove the translation processes are functioning according to these requirements.
The translation process is where the source data is run through the data translation (conversion) rules to transform it into a format and structure needed by the target system. Below are common types of translations of data that need to be tested:
- All sequence numbers in the target database are set to what they were in the source database
- Columns that are just a straight copy with no translation/conversion rules
- New columns in the target database that are set with default values, including null
- Source columns that are not part of the target database
- Translation rules on a column in the source database that has a related column in the target database
- Translation rules on many source columns that are loaded into one target column
- Translation rules on one source column that is split into many target columns
- Column formats/attributes
- Column name
- Data type
- Data length
- Null/Not null
- Primary key
- Foreign key
- Default value
- File translations
- Any unique translation requirements for your particular project
There is no easy way to test these. You need to look at each source and target column and understand the translation, if any, that is needed on that column. Next, you need to determine how to test it. Look at all the data in that column, and then move on to the next column. Repeat the process until have covered every column in every record in every table. Same thing with files: you need to determine how to test each one.
The best way to test the translation rules is with a mix of tools like SQL code, Excel files, Excel Macros, file comparison tools, and other tools created by the test engineers. Since this testing will be done over and over, tools make this as efficient and effective as possible. Also, do not forget to test that your tools are working correctly – each project is different, so each project needs its own set of customized tools. The data that is just a straight copy is the low hanging fruit, and you should get this verified and out of the way using tools. This will allow you to spend the bulk of the time verifying the translations rules. It is also a good idea to spot check a few rows in each table just as an added verification.
Using tools allows you to get as close as possible to verifying every column in every record in every table; this is optimal as it reduces risk.
At this point, you now have the test cases and test tools to allow you to test the translation requirements against the “T” in the ETL process and are able to repeat this testing at will.