Data Lineage for Standard ETL process based on Relational Storage

Gopinath Rajee, Data Engineer

Data Lineage for Standard ETL process based on Relational Storage

All,

Most of the ETL processes that run in my company are based off of Relational Data Store such as SQLServer, DB2, AzureSQL which essentially means that the each of the E-T-L is done via SQL/T-SQL and SP.

How do the Data Lineage features offered in several of the Data Cataloging tool capture data lineage when most of the processing is done by Stored Procedures which are native to the database engine (as in SQLServer engine, DB2 engine ... etc).

I understand that these Data Lineage feature will work very well for Big-Data native data sources which are based off of parquet, json, avro ... etc since the processing engines such as Spark, Hadoop ... etc consume these data sources natively.

What has the experience been of others and how  did they workaround the problem. I feel this question is important since we are seeing quite a lot of Data Cataloging vendors claiming Data Lineage feature.

 

Thanks,

grajee