I have a series of joins that insert a large number of rows into a table. As you can imagine, I want this process to perform as well as it can, and in Netezza that means finding the best distribution and utilizing zone maps where possible.
To optimize these joins I would tweak the tables involved, run the process, and kill it if it took too long. I had an idea of how long it should take in an ideal scenario, so anything much longer than that I halted and adjusted.
But the more I tweaked, the slower it ran. It just became worse and worse. So finally I reverted back to the original structure… and it was slower than ever! What happened?
Meanwhile, inside Netezza…
Netezza handles concurrency issues with the createxid and deletexid fields. Rows being created hold the ID of their transaction in createxid. Any query that runs is limited to rows where the createxid is less than or equal to the value of the last committed transaction. This keeps you from selecting rows that aren’t yet committed.
Every delete, update, and truncate in Netezza is a soft delete. The rows are still in the table, and they will remain until the table is groomed or dropped. The ID of the deleting transaction is stored in the deletexid field, and every query ignores rows where this value is not zero (unless you set the ‘show deleted’ setting).
When you cancel an insert, where do the rows go?
You may see my problem now. I was inserting rows into a table, but when I aborted the transaction the rows were not ‘rolled back’, they were marked as deleted. So even though I was cancelling the insert statement, the table was continually growing, throwing off my performance tests.
If you aren’t familiar with Aginity, the red section is deleted rows, and the blue section is available rows. Oh, you can’t see the blue section? That’s because it is utterly eclipsed by the scale of the deleted rows from my cancelled inserts. 99.8% of my table is deleted rows.
If you are doing testing, or if you have a process that fails while writing data, you may want to check the affected tables to see if they should be groomed. In most cases it may be negligible, but if your process is writing large amounts of rows, or if it fails repeatedly, your table size could unexpectedly increase to an extreme.
No, I need to go re-run some tests…