Recently I had to compare lots of sets of data. Determining if there are any differences at all, or if there are rows that exist in one but not the other, is easy. But I needed to compare every column, calculate match percentages, and report every difference.
I was able to take what would have been a lot of tedious query writing and create a standard way to do it. All you have to do is a little bit of prep on your two data sets, and these queries will compare, quantify, and report all mismatches. (more…)
Every solution I have seen required some control flow statements (such as looping) and often creation of a sproc or function. These are usually acceptable, but it really only requires a very simple SQL statement. (more…)
A primary key is simply a candidate key labelled as the ‘primary’ (see more in A Key to Keys), and they are enforced by the same mechanism in Sql Server – the unique index. We can query the system tables to get a quick list of these.
I have a series of joins that insert a large number of rows into a table. As you can imagine, I want this process to perform as well as it can, and in Netezza that means finding the best distribution and utilizing zone maps where possible.
To optimize these joins I would tweak the tables involved, run the process, and kill it if it took too long. I had an idea of how long it should take in an ideal scenario, so anything much longer than that I halted and adjusted.
But the more I tweaked, the slower it ran. It just became worse and worse. So finally I reverted back to the original structure… and it was slower than ever! What happened? (more…)
People are in your data. They’re looking at it, even changing it. Should they be stopped? Did they break something? If so, who broke it? When? These are common questions that come up (especially after someone breaks something), and DBAs are often called upon to solve the problem. Here are three great ways to start identifying your data gremlins: (more…)
In case you ever need to, it is possible to groom a table without taking a backup first. (more…)
Spreading values across rows (or subsets of rows within a set of rows) was once cumbersome and difficult, but the invention of window functions (which have been around for a while now) have made it much easier. Here are some examples you can take and tweak to fit your needs. (more…)
The Unbearable Lightness of Data
Have you ever tried explaining your job to family or friends and seen their eyes glaze over? Will Thrash explains it better than we did. In the first 15 minutes or so of this video he talks about how he first got interested in data at a young age, and why it still holds his interest to this day. You can tell he gets why we do what we do, and why we all love it. (more…)
Over my time I’ve seen many unique technical solutions in databases. And now I can share some of the worst ones with you!
Disclaimer: I may have been the architect of some of these. Sorry DBA team! (more…)
A reference for all the different types of keys used in logical and physical database design.
These tables will be used to illustrate all the different keys: (more…)