ApexSQL Audit fault tolerant auditing – part I

What is fault tolerant auditing and why it is important

One of the characteristics a SQL Server auditing solution must have is to be reliable. It means that it is a solution you can trust, that doesn’t provide a false sense of security while in fact it is not auditing the parameters set, has information leak, data loss, or in some situations it’s not running at all

ApexSQL Audit is designed to be fault tolerant and handle the situations which can bring to auditing breakdowns

Network failures

ApexSQL Audit is a centralized auditing solution for SQL Servers, with centralized storage of audited data and the first issue it has to deal with is to provide fault tolerant auditing when it comes to a network outage. As a distributed instance can run independently, a loss of connection to the central instance doesn’t affect auditing – the events on a distributed instance are captured in SQL traces as usual, therefore continued auditing is provided. The only issue is transfer of the audited records to the central repository

When auditing distributed instances, SQL trace files created on a remote SQL Server instance are transferred to the central instance once they reach the size of 1 MB. If the network connection between distributed and central instance is lost, these files cannot be transferred. ApexSQL Audit stores them locally as long as the central instance is not available. Once the connection to the central repository is reestablished, the SQL trace files are transferred to the central instance where they are further processed and audited records are stored in the central repository database

So, even when the network is down, auditing of distributed instances is not affected. The only issue a network outage causes is a delay in transfer of audited records and its storing into a central repository

SQL Server instance is not running

Another potential problem is when a SQL Server instance, either on the central or distributed instance stops unexpectedly. ApexSQL Audit instances are Windows services; they are independent of SQL Server service and are designed to automatically reestablish connections to monitored SQL Server instances, if connections are lost

If the SQL Server instance on the central instance stops, ApexSQL Audit will continue to audit the events on a distributed instance seamlessly. The SQL traces will be transferred to the central instance and once the central SQL Server instance is running again, the processed records will be inserted into the central repository database

If the SQL Server instance on a distributed instance stops, there will be no new events on the SQL Server instance, SQL traces will not be created and auditing will be stopped as expected, but will continue as soon as the SQL Server instance is started again. New events will be added into the last SQL trace

In the next article, we’ll describe some additional scenarios and how these are handled by ApexSQL Audit

See also:

ApexSQL Audit fault tolerant auditing – part II

August 12, 2013