Nowadays, high availability (HA) and disaster recovery (DR) solutions are built with different techniques like failover clustering, log shipping, mirroring, and replication to achieve maximum application availability and data protection for SQL Server instances and databases within and across data centers. These techniques have their imitations and concerns which are solved by the integrated solution AlwaysOn in SQL Server 2012 (Enterprise Edition). This article describes how AlwaysOn can be used in a real-life scenario wherein an ETL proces can executed without influencing the availability and consistency of data, and performance. The assumption is made that you already have basic knowledge about AlwaysOn, so we won't go into detail too much.
The company "Always Online" has a website that communicates via a webservice, which is used as the interface to the data warehouse database containing huge amounts of data.
The current version of the website has difficulties with handling the increasing number of users and data volumes, which negatively influences the performance tremendously. Every night, the data warehouse is not available for three hours, because during this time window the data in the database is updated by an ETL process. This availability is not acceptable, because also customers in different time zones access the website.
The database part of the solution has to be rebuilt to maximize the availability and minimize data loss in case of failover. To make this possible the company is going to use AlwaysOn in SQL Server 2012.
To solve the availability issues the following solution has been setup. The webservice connects with a listener that directs the communication to the primary or secondary server node. Each node has a SQL Server instance that contains a data warehouse database in an Availability Group and a snapshot of the data warehouse database that's not part of the Availability Group. The ETL process is running on the primary node and updates the data in the data warehouse database of the primary replica. The updated data of the primary replica is automatically synchronized with the secondary replica. The secondary replica offloads the primary replica and is used for reporting and maintenance tasks like backup.
AlwaysOn Availability Group
An Availability Group consists of availability replicas; one primary replica and up to four active, secondary replicas. All secondary replicas are considered "active" as they can all be used for read-only operations. Each replica can host multiple availability databases, which can be stored on different storage media. Data synchronization between the replica and its secondaries can be done in two modes, namely synchronous and a-synchronous. With synchronous-commit-mode a transaction is committed on all replicas to guarantee transactional consistency (introduces transaction latency, no data loss, automatic or manual failover, usable for replicas that exist in the same data center). With asynchronous-commit-mode the primary replica is not waiting for a transaction to be committed on the secondary replicas (better performance, more chance of data loss, manual failover, well suited for disaster recovery solutions). Secondary replicas are available for read-only operations, and its connection mode setting tells how a client can connect to this secondary replicata (not, read-intent connections, all connections).
The following flow describes the steps to make it possible to have high availability when the ETL process runs:
- The ETL process is not running
De webservice uses the listener to connect to the data warehouse database on the primary replica. It's completely transparant for the webservice if the primary node failed over to the secondary node. The connectionstring that's used by the webservice must connect to the virtual server name of the listener (don't use the node server names directly).
The following connection string connects to the data warehouse database "MS_DWH" via the virtual server name "MS_LISTENER" of the listener.
Data Source=MS_LISTENER;Initial Catalog=MS_DWH;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;
- Before the ETL process is started
Before the ETL Process is started, snapshots of the data warehouse database are created on both replicas and the connection string in the web.config of the webservice is changed to look at the secondary replica on the second node. We are taking a risk here when the secondary replica fails; the choice being made here is alligned with the requirements.
The following connection string connects to the snapshot of the data warehouse database "MS_SNAP_DWH" directly via the server name "MS_NODE2" of the second node (bypassing the listener).
Data Source=MS_NODE2;Initial Catalog=MS_SNAP_DWH; Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;
- The ETL process is running
While the data is updated in the data warehouse database on the primary replica, the data is synchronized immediately to the secondary replica. This synchronization has impact on the performance of the ETL process and will require more time to complete.
- The ETL process is completed
When the ETL Process has completed, the connection string in the web.config of the webservice will be changed back to point to the listener and the snapshots will be removed.
- Suspend / Resume
In our solution snapshots are used to connect in the case the ETL process is running, because you don't want that users see inconsistent data while the data on the primary replica gets updated. A nicer solution would be to switch the webservice to the secondary replica via the listener (using ApplicationIntent=ReadOnly) before the ETL processs is started and suspending (ALTER DATABASE [MS_DWH] SET HADR SUSPEND) the data synchronization. When the ETL process is done the webservice can be switched back to the primary replica (don't use ApplicationIntent=ReadOnly) and data movement can be resumed (ALTER DATABASE [MS_DWH] SET HADR RESUME). Unfortunately this solution is not working when data is suspended. Existing connections to one of the replicas will keep working, new connections will fail, showing the following exception message:
Unable to access the 'MS_DWH' database because no online secondary replicas are enabled for read-only access. Check the availability group configuration to verify that at least one secondary replica is configured for read-only access. Wait for an enabled replica to come online, and retry your read-only operation.
- Webservice & stateless
A webservice is stateless, so for every call a new connection is made. Under the hood, an existing physical connection will be reused, but because you don't have control over the application pool recycling and connection pooling you must always assume that every connection is new.
When you connect to the listener without the use of the ApplicationIntent=ReadOnly parameter in the connectionstring, the connection will be directed to the primary replica, otherwise to the secondary replica.
- .NET Framework version
The parameter "ApplicationIntent" in a connectionstring is a new feature and supported by the SqlClient class in the .NET Framework 4.0 and higher.
AlwaysOn is a very useful, new feature of SQL Server 2012 to implement high availability and disaster recovery solutions. Very useful in the scenario described above, but it's a pity that it's not possible to connect to a replica in an availability group when data is being suspended. That's why snapshots had to be used, which makes the solution more difficult to implement and maintain.