Recently I had to compare lots of sets of data. Determining if there are any differences at all, or if there are rows that exist in one but not the other, is easy. But I needed to compare every column, calculate match percentages, and report every difference.
I was able to take what would have been a lot of tedious query writing and create a standard way to do it. All you have to do is a little bit of prep on your two data sets, and these queries will compare, quantify, and report all mismatches. (more…)
Spreading values across rows (or subsets of rows within a set of rows) was once cumbersome and difficult, but the invention of window functions (which have been around for a while now) have made it much easier. Here are some examples you can take and tweak to fit your needs. (more…)
We tie data to dimensions using matching rules that are either deterministic or probabilistic.
Deterministic matching is what we use to find an ‘exact’ match. (more…)
One of the first things a newcomer to Netezza learns is that it does not enforce primary key or unique constraints. (more…)