‘The root cause of all problems is incomplete Knowledge.’
‘Lookup’ by dictionary means ‘The action of systematic electronic information retrieval.’ Fundamentally, this is the heart of any ETL process, i.e. Extracting Data in a Systematic Way.
Related: Common Business Intelligence Interview Questions
A Data Warehouse typically contains Dimensions (Objects) and Facts (Measures). A fact table is a central table in a star schema of a data warehouse and quantitative information for analysis. But more importantly, it’s denormalized.
Denormalization is a strategy that is used on a previously-normalized database to improve performance. In Datawarehousing terms, denormalization is the process of optimizing the operational data to improve the read performance of a database, at the cost of losing some write performance, by adding redundant copies of the data or by grouping data.
Denormalization can be achieved in two ways viz. using joins in SQL or an ETL tool counterpart to it. That’s where SSIS Lookup Transform comes in. Does that mean that we can conveniently replace joins with Lookups? There aren’t easy answers.
Lookups come with three modes of comparison viz. Full Cache, Partial Cache, No Cache. Hence, a Full Cache mode is the way out unless you are not ready for an Exceptional Case! Yes, an Exceptional Case of Case-Sensitive Comparison.
Usually, SQL Joins perform Case-Insensitive Comparison. Let’s take an Example.
The Illustration contains a Dimension DimGeo containing a GeoId and Geography. There’s a Staging table called as stgFactSales with ID and Geography. After Denormalization using Joins, we get FactSales.
Note that DimGeo contains India in CamelCase whereas stgFactSales holds it in Upper Case. After Denormalization, we get the GeoKey as ‘1’ for India.
Now let’s go to another illustration with Lookups.
After a lookup on Geography between DimGeo and stgFactSales, we get the following results.
It can be figured out clearly that GeoKey was populated as NULL for India.
One might be tempted to ask why? A genuine question indeed!
The reason is simple. In Full Cache, the comparison is carried out in SSIS Engine, while in partial and no cache the control is transferred to SQL Server Engine.
You can’t assert that whether this feature is a boon or a bane. It’s typically the call of a Business Requirement. But interchanging the two approaches blindly could lead to Undesirable Outcomes.
Comments are closed.