Best Practice

Discerning Design Pattern #1: The Table Reload Pattern

Welcome to a new series: Discerning Design Patterns! These are how-to tutorials on common developer tasks but with the voice of (painful) experience. Newcomers to IT will benefit greatly, and if you’re a veteran, please give it a quick read and hop down to the comments to share your own lessons and experiences.

The Table Reload Pattern

Intent: Replace all the rows of a table with new rows in a way that will handle failure.

This task is often handed to junior developers, and rightly so: it is simple to understand, it can be done with many technologies so they can become familiar with your team’s language or software of choice, and it gets them familiar with your systems and data.

The goal is to create a periodic process that copies data from system A and puts it in system B. A new developer’s first attempt will be something like this:

  1. Delete all rows from system A.
  2. Read all rows from system B.
  3. Write all rows to system A.

An experienced developer will spot the flaw — probably because they experienced it before! (we all learn the hard way sometimes) But you can be forgiven if you don’t see it. This design can run flawlessly for years before disaster strikes, and it’s not always difficult to recover from. (more…)

Netezza Distributions: Divide and Conquer Big Data

Inside Netezza are specialized processing units called SPUs, which have their own sets of disks, each containing a portion of your data. The SPUs work in parallel to return the answer you need. But if one of them has more data than the others, then we will be waiting on just that one to finish while the others are already done and sitting idle. The workload is unbalanced because the data is skewed. To conquer big data, we need to divide work evenly. (more…)