People are in your data. They’re looking at it, even changing it. Should they be stopped? Did they break something? If so, who broke it? When? These are common questions that come up (especially after someone breaks something), and DBAs are often called upon to solve the problem. Here are three great ways to start identifying your data gremlins:
Audit data with triggers
Triggers are the typical go-to for auditing. They are easy to set up and have little risk to implement, unless they are added to a table with a lot of activity. Plus, there are many examples online of audit triggers.
In most cases an application has a service account that it uses to connect to the database. This means that your triggers have no visibility to ‘who’ is changing the data. Unless you are using impersonation for database access you will not be able to identify individuals responsible for changes.
Triggers can act on inserts, updates, and deletes. They cannot act on reads, there is no ‘on select’ trigger. Triggers will not be able to record what data is being viewed.
- Very reliable
- Easy to set up
- Adds overhead to every write operation
- Can’t track read operations
- In most cases can’t track ‘who’ changed a record
Audit data with application logic
In my opinion this is an underutilized option. Auditing requirements are often ignored or put-off until a later phase of development, but actually should be factored into the application design. Adding this functionality after initial development usually means a high cost of re-development, which drives developers to look to DBAs for an answer.
If it is addressed in the application design, it is by far the most flexible and powerful option. The application can easily handle copying changes to an audit table (the programmers might even find a way to control this in the data layer so that it happens automatically), or the application could utilize soft-deletes instead of actually changing rows.
The application can capture who is making changes, a difficult or impossible task with other techniques.
Data reads can be audited. You could track who is looking at what reports, tables, or even individual rows, as well as whatever query parameters they applied through the application.
The biggest drawback is that the whole system can be circumvented by a direct query. If developers have write access in your production environment (no one does that, right?) the data could change at any time with no indication that it ever happened. This could also happen if developers add new code that does not implement the auditing feature. Special attention should be used in code reviews to make sure all new development utilizes the data layer properly.
- Can capture user login and record ‘who’ did it
- Can record what data was selected as well as edited
- Could use a ‘soft delete’ technique
- Can audit activity at different levels (report, table, or row level)
- Requires more development time than other techniques
- Audit logic could be bypassed with a direct query
- A software bug could cause auditing to be mis-recorded or not captured
Audit data with CDC
CDC (change data capture) is a process that reads the transaction logs to find data changes and records the changes in a separate table or database. While this has a small overhead on server resources, it has no impact on the tables themselves. There are no locks, reads, or anything to block or slow down transactions.
CDC is usually used to replicate data to another system (such as the data warehouse), so it’s a bit overkill for simply auditing a table. However, in my experience it is reliable, fast, and accurate (at least the SQL Server version is, I haven’t used any others). No changes can slip through the cracks. Because it’s using the transaction log, no one can ‘cheat’ their way around it unless they have access to disable it.
Unfortunately, it is the hardest to set up. If you have SQL Server it is built-in (if you have the necessary version, I think it’s only in Enterprise and up), if not you will need to get a 3rd party tool. You will also need to create a process to pull the CDC data off the server and store it somewhere, which will require development (some 3rd party tools may include this feature).
I almost left CDC off of this list because the cons seemed to greatly outweigh the pros. However, there are two niche situations where it might be useful: if your table has a high transactional requirement (like thousands of records inserting/changing every second all day long) then triggers may add too much overhead, or if you need to add auditing to a vendor’s application database and you can’t add triggers or modify any code.
- Users can’t ‘cheat’ around it
- Does not create any locks or blocking on the table
- Could be used on 3rd party software
- Can’t track who made changes
- Can’t track read operations
- Have to create a process to archive the CDC data
- Only available under certain licensing models or as a 3rd party application