Recently I had to compare lots of sets of data. Determining if there are any differences at all, or if there are rows that exist in one but not the other, is easy. But I needed to compare every column, calculate match percentages, and report every difference.
I was able to take what would have been a lot of tedious query writing and create a standard way to do it. All you have to do is a little bit of prep on your two data sets, and these queries will compare, quantify, and report all mismatches. (more…)
Every solution I have seen required some control flow statements (such as looping) and often creation of a sproc or function. These are usually acceptable, but it really only requires a very simple SQL statement. (more…)
Welcome to a new series: Discerning Design Patterns! These are how-to tutorials on common developer tasks but with the voice of (painful) experience. Newcomers to IT will benefit greatly, and if you’re a veteran, please give it a quick read and hop down to the comments to share your own lessons and experiences.
The Table Reload Pattern
Intent: Replace all the rows of a table with new rows in a way that will handle failure.
This task is often handed to junior developers, and rightly so: it is simple to understand, it can be done with many technologies so they can become familiar with your team’s language or software of choice, and it gets them familiar with your systems and data.
The goal is to create a periodic process that copies data from system A and puts it in system B. A new developer’s first attempt will be something like this:
- Delete all rows from system A.
- Read all rows from system B.
- Write all rows to system A.
An experienced developer will spot the flaw — probably because they experienced it before! (we all learn the hard way sometimes) But you can be forgiven if you don’t see it. This design can run flawlessly for years before disaster strikes, and it’s not always difficult to recover from. (more…)
‘Master Data’ is not as easy to define as other data sciences. Relational theory is mathematical in nature. Dimensional design is based on ease-of-use and taking advantage of system architectures. Master Data doesn’t have the same foundation. (more…)
A primary key is simply a candidate key labelled as the ‘primary’ (see more in A Key to Keys), and they are enforced by the same mechanism in Sql Server – the unique index. We can query the system tables to get a quick list of these.
I have a series of joins that insert a large number of rows into a table. As you can imagine, I want this process to perform as well as it can, and in Netezza that means finding the best distribution and utilizing zone maps where possible.
To optimize these joins I would tweak the tables involved, run the process, and kill it if it took too long. I had an idea of how long it should take in an ideal scenario, so anything much longer than that I halted and adjusted.
But the more I tweaked, the slower it ran. It just became worse and worse. So finally I reverted back to the original structure… and it was slower than ever! What happened? (more…)
Should a table always have a surrogate key? Is it really best practice, or should we stick to natural keys? Let’s dig in to see if there’s one clear answer. (more…)
There are no indexes in Netezza. This is an intentional design choice. Indexes are effective in an OLTP system, where transactions affect a single row or two, but not in an OLAP system, where transactions affect large sets of data. Netezza is designed to be a fully OLAP-focused system, in both hardware and software, and indexes did not meet that goal. (more…)
Want to view deleted data? See which rows were added in a transaction? These extra fields will allow you to do so. (more…)
People are in your data. They’re looking at it, even changing it. Should they be stopped? Did they break something? If so, who broke it? When? These are common questions that come up (especially after someone breaks something), and DBAs are often called upon to solve the problem. Here are three great ways to start identifying your data gremlins: (more…)