Inside Netezza are specialized processing units called SPUs, which have their own sets of disks, each containing a portion of your data. The SPUs work in parallel to return the answer you need. But if one of them has more data than the others, then we will be waiting on just that one to finish while the others are already done and sitting idle. The workload is unbalanced because the data is skewed. To conquer big data, we need to divide work evenly. (more…)
An Intro to Netezza
In any database, whether it’s SQL Server, Oracle, Sybase, DB2, Access (does Access count?), or anything else, the primary bottleneck is disk IO. Even if you have an SSD, the disk is the slowest part. In those databases we use indexes to alleviate this. Their query engine uses the index to find exactly where on the disk to start and stop reading, which keeps it from scanning entire tables to find every answer.
Netezza has an entirely different solution to the IO problem (in fact, it has no indexes). It cuts the cost of disk reads by doing the work in parallel. Simply put, it’s a brute force solution to the problem using specialized hardware, but that’s just the start. (more…)
Netezza Code Generation Failure
ERROR: 256 : Code generation failure
In my experience this error means “I can’t figure out how to execute your query” or “The plan I created to execute your query failed.” (more…)
Netezza: Groom a Table Without a Backup
In case you ever need to, it is possible to groom a table without taking a backup first. (more…)
Numeric and Weighted Distribution Queries
Spreading values across rows (or subsets of rows within a set of rows) was once cumbersome and difficult, but the invention of window functions (which have been around for a while now) have made it much easier. Here are some examples you can take and tweak to fit your needs. (more…)
Make Fuzzy Data Safe by Modeling your Matching Rules
We tie data to dimensions using matching rules that are either deterministic or probabilistic.
Deterministic matching is what we use to find an ‘exact’ match. (more…)
5 Great Articles on Data – Oct. 2015
The Unbearable Lightness of Data
Have you ever tried explaining your job to family or friends and seen their eyes glaze over? Will Thrash explains it better than we did. In the first 15 minutes or so of this video he talks about how he first got interested in data at a young age, and why it still holds his interest to this day. You can tell he gets why we do what we do, and why we all love it. (more…)
How to Enforce Primary Keys in Netezza
One of the first things a newcomer to Netezza learns is that it does not enforce primary key or unique constraints. (more…)
The Entity-Attribute-Value Model
The entity-attribute-value model is useful for situations where attributes are dynamically added to or removed from an entity. It is normally composed of three tables: (more…)
Top 10 (or 11) Data Mistakes
Over my time I’ve seen many unique technical solutions in databases. And now I can share some of the worst ones with you!
Disclaimer: I may have been the architect of some of these. Sorry DBA team! (more…)