maandag 20 juli 2015

Data Warehouse Automation: Selecting the right tool


The (Enterprise) Data Warehouse environments we are using nowadays become very big, complex, and mature. Many companies are now experiencing that it takes a lot of time to add new requirements (time to value). By using a Data Warehouse Automation (DWA) tool, companies are able to automate the building of a Data Warehouse and reducing the costs: doing more in less time. To find out if this is possible you should do Proof of Concept (PoC) with the different Data Warehouse Automation tools that are available on the market to find out which best suits your needs. This article describes which steps you should take to select the right tool for your company.

Data Warehouse Automation tool

A Data Warehouse Automation tool is an integrated development environment (IDE) that automates the development of an (Enterprise) Data Warehouse using standards and best practices (patterns). It automates (repetitive) development tasks and simplifies the management and maintenance of a Data Warehouse. By using a Data Warehouse Automation tool you can drive the costs down (less effort, less resources), speed up development, and respond faster to new or changing requirements (time to value) in an agile way.


Before starting the selection of a Data Warehouse Automation tool you need to take some time to create a list of requirements that should be satisfied. You can think of requirements of different categories (Development, Support, Documentation, Integration, Vendor, General) and it should be able to score them (MoSCoW) so you can easily compare tools.

Here a small selection of possible requirements:
  • Using and forcing of custom naming conventions
  • Version control (check-in, shelve, compare, merge, policies, comments/annotations)
  • Real-time monitoring of processes.
  • Query logs (history, trends, start/stop date, duration, job status, task/step status, record counts (select, update, insert, delete, error), (system/business) errors/exceptions).
  • Easy to learn (a developer is quickly up-and-running).
  • The User Interface is intuitive.
  • Ability to automate deployments and roll-backs.
  • Supports multiple ETL integration methods (full-load, incremental-load (Id, Date, Checksum, Hashes, Change Data Capture (CDC), combined methods / multiple fields, custom parameters).
  • Supports multiple dimension types (Type 1 to Type N).
  •  Supports multiple fact types (Accumulating, Snapshot, and Transactional).
  • upports multiple sources and targets.
  • Ability to import meta data from other tools (PowerDesigner).
  • There's a lineage from front to back on object and data level.
  • Ability to generate AS Multidimensional cubes and AS Tabulars.
Proof of Concept tool selection

There are several Data Warehouse Automation tools available on the market like Kalido, Quipu (open source), Attunity Compose (formerly known as BI-Ready), TimeXtender, and WhereScape 3D/RED. It’s very time consuming and costly to do a Proof Of Concept with all of these tools, so you should define a short list of criteria a Data Warehouse Automation tool has to satisfy, and eventually select one or two tools that you want to "Poc".

Here a selection of those criteria:
  • Data-driven and agile development 
  • Multi-user 
  • Multi-environment (DTAP)
  • Versioning
  • Generation of documentation
  • Comparison of objects (between versions and environments)
  • Maturity
  • Clear roadmap
  • Licensing
  • Customer references
  • Support for particular sources and targets

Proof of Concept

The goal for the Proof of Concept is checking which Data Warehouse Automation tool fits the best in your organization by going through a list of requirements. High-level the tool must be able to build a Data Warehouse with simple or complex ETL much faster than how it’s done nowadays with the existing ETL tools (like SSIS or Informatica). If you have the possibility you could use existing projects/solutions and build them again but then with a Data Warehouse Automation tool, so that you can easily compare the old and new way of building.To get a good feeling about the user experience and the way of working with a tool I would advice to use your own colleagues controlling the buttons of the tool.

Fitting in current architecture

Most companies that start using a Data Warehouse Automation tool already have an existing BI environment where the new tool has to be fitted in. There are a couple of important points to consider here:
  • Does the tool require a separate server with particular hard- and software requirements?
  • Does the tool integrate with the currently used sources and targets, schedulers, ETL processes, reports for monitoring / maintenance purposes?
  • Is the tool able to reuse databases and tables (dimensions, facts) or does it has its own repository?
Moving forward

When you have chosen your Data Warehouse Automation tool, it’s not just buying the tool and using it.

There are a couple things you need to take into account:
·         Are you going to migrate all existing projects or are you only automating new projects?
- Budget is needed to migrate existing projects.
- Existing projects and new projects will co-exist for a while which introduces extra complexity (integration, maintenance).
·         What kind of developer profiles are needed for working with the tool?
- Less monkey-work is needed.
- More focus on modeling and business logic.
·         How does the way of working change (change management) and how do we handle that?
- Clearly describe and present a plan towards the developers, so it's clear what they can expect and how this influences their work (way of working).
- There will always be resistance when introducing new tools (the developers are familiar with the currently used tools and have to step out of their comfort zone) where you need to be aware of (you won't be able to eliminate resistance, but you should minimize it).
- You need to create commitment and train the developers.


Using a Data Warehouse Automation tool is not just buying a tool and generating your Data Warehouse; you need to take into account a lot of things: does it satisfy your requirements, does it fit in the current architecture, what kind of people are needed, does it fit our budget, how does the way of working change, how is it supported in case of issues, etc. When selecting a tool always make sure that you have requirements and that you figure out which tool satisfies those the best by doing a Proof of Concept. 

dinsdag 19 mei 2015

Data Quality Services (DQS) - SSIS Cleansing Task – Reference to DQS Knowledge Base identifier

You can use the cleansing functionality of Data Quality Services (DQS) in SSIS by using the “DQS Cleansing” task. This cleansing task used the selected DQS Knowledge Base to cleanse data. Under the hood this task points to the numerical identifier of the selected DQS Knowledge Base. In a DTAP (Development, Test, Acceptance, and Production) environment it could happen that this identifier is different throughout the DTAP environment and also when a DQS Knowledge Base is redeployed / recreated it gets a new identifier. When this identifier is not the same as the hard-coded one in the SSIS packages, the package will fail. This article gives you more background information about this issue and how to work around it. 

Storage of DQS Knowledge Base

When you create a new DQS Knowledge Base or import an existing one, a new numerical identifier will be created specific for this knowledge base.
This identifier is stored in the table [DQS_MAIN].[dbo].[A_KNOWLEDGEBASE] where also project related records are stored.  The following table shows a couple of important columns of this table:

You can execute the following T-SQL query to get all above mentioned values with their descriptions:

SELECT  groupDQS.[ID] AS [groupId],
        groupDQS.[NAME] AS [GroupName],
        memberDQS.[KEY] AS [MemberId],
        memberDQS.[VALUE] AS [MemberValue]
FROM           [DQS_MAIN].[dbo].[A_CODE_MEMBER] AS memberDQS
WHERE   groupDQS.[NAME] = ''


Import DQS Knowlege Base

A DQS Knowledge Base can be imported throughout the DTAP environment via the Data Quality Client or via the command line.

Export / import functionality in the Data Quality Client

An existing DQS Knowledge Base can be exported to a file (*.dqs) by selecting “Export Knowledge Base”. 

A new DQS Knowledge Base can be imported from a file (*.dqs) via the “New Knowledge Base” button on the main screen. You can’t overwrite an existing knowledge base.

Export / import functionality via command line

The following command exports all DQS databases (it’s not possible to export one specific DQS database) from the “MDM” instance to the file “MyKnowledgeBases.dqsb”.
DQSInstaller -exportkbs c:\MyKnowledgeBases.dqsb –instance MDM
The following command imports all DQS databases (it’s not possible to import one specific DQS database) from the file “MyKnowledgeBases.dqsb” into the “MDM” instance.
DQSInstaller -importkbs c:\MyKnowledgeBases.dqsb –instance MDM

SSIS Cleansing Task

The SSIS Cleansing Task references a DQS Knowledge Base by using an identifier. The following image shows a piece of the XML code behind the SSIS package. It shows that the DQS Cleansing task refers to the hard-coded knowledge base name “2000000” (strange the property is named “Knowledgebasename” while it contains an identifier value). 

You can’t deploy a working SSIS package throughout the DTAP environment as it could be that Development is using another identifier for the DQS Knowledge Base than Acceptance, which makes it impossible to execute automated deployments. 


On the internet several work-arounds are mentioned but none of them makes it possible to deploy your SSIS package (without any adjustments) throughout your DTAP environment. Follow the following steps to make this possible:

  • Reset the auto-increment column [DQS_MAIN].[dbo].[A_KNOWLEDGEBASE].[ID] by executing the following T-SQL script:
    After executing this script, a newly created DQS Knowledge Base will get the identifier 2000000.
  • Create a new DQS Knowledge Base by importing one via the export/import functionality in the Data Quality client or via the command line. If you need to overwrite an existing one, you first need to delete that one via the Data Quality client tool.
  • Deploy your SSIS package which points to the DQS Knowledge Base with the identifier 2000000.

This issue with the hard-coded identifier is really a bug in the SSIS Cleansing task which should be solved by Microsoft. When the identifier is replaced by a DQS Knowledge Base name, no work-arounds are needed and it’s then possible to deploy your SSIS packages throughout the DTAP environment without any adjustments.


Querying Active Directory Using T-SQL

When you want to use Active Directory information, like users and user groups, in your BI environment to implement row-level security (based on department) you need to query Activery Directory. Querying Active Directory can be done easily by using T-SQL, which is explained in this article.

Linked Server

To query Active Directory from T-SQL you need to setup a linked server that uses the ADsDSOObject (Microsoft OLE DB Provider for Microsoft Directory Services) provider. The following T-SQL code creates the linked server “ADSI” (Provider = OLE DB Provider for Microsoft Directory Services, Data Source = machielsesoftware.local:389, Product = Active Directory Services). The port number 389 is used for LDAP traffic.

EXEC master.dbo.sp_addlinkedserver @server = N'ADSI', @srvproduct=N'Active Directory Services', @provider=N'ADsDSOObject', @datasrc=N'machielsesoftware.local:389'

T-SQL query

The following T-SQL query creates a dynamic query that gets the following user information:
  • UserId
  • Username
  • Department
  • Country
  • Notes
  • LastLogon

-- Variables
DECLARE @finalQuery AS VARCHAR(2048);
DECLARE @queryActiveDirectory AS VARCHAR(1024);
DECLARE @activeDirectory AS VARCHAR(64);

SET @activeDirectory = 'LDAP://machielsesoftware.local:389';

-- Active Directory query
SET @queryActiveDirectory =
   '''SELECT sAMAccountName, name, department, co, notes, lastlogon ' +
   '  FROM   ''''' + @activeDirectory + ''''' ' +
   '  WHERE  objectCategory=''''person'''' AND ' +
   '         objectClass = ''''user''''''';

-- Final query
SET @finalQuery =
  'SELECT sAMAccountName AS [UserId], ' +
  '       name AS [Username], ' +
  '       department AS [Department], ' +
  '       co AS [Country], ' +
  '       notes AS [Notes], ' +
  '       CASE WHEN ISNULL(CAST(lastlogon AS BIGINT), 0) > 0 ' +
  '            THEN DATEADD(MINUTE, (ISNULL(CAST(lastlogon AS BIGINT), 0) / 600000000) –
                    157258080 + DATEDIFF(MINUTE,GETUTCDATE(),GETDATE()), 0) ' +
  '            ELSE 0 END AS [LastLogon] ' +
  'FROM OPENQUERY( ADSI, ' + @queryActiveDirectory + ') ' +
  'ORDER BY [UserId];';

EXEC (@finalQuery);

Converting Active Directory Date/Time values

Date/Time values (and also other attributes) from Active Directory are stored as large numbers (data type = Integer8 à 8 bytes / 64 bits). To convert a date/time value from Active Directory to a readable date/time value, you need know how it’s stored so you can convert it.
Here some facts and calculations:
  • The date/time value in Activee Directory represents the number of 100 nanosecond intervals that have been elapsed since 1600-01-01.
  • 1 nanosecond = 10-9 (1 / second.
  • 100 nanosecond = 10-7 (1 / 10.000.000) second.
  • 1 second = 107 x 100 nanoseconds.
  • 1 minute = 107 x 60 = 600.000.000 nanoseconds.
  • The minimum date value in SQL Server equals 1900-01-01.
  • Difference in minutes between 1600-01-01 and 1900-01-10 = 157.258.080 minutes.
  • You need to correct the date/time value for your local time zone by determining the difference between the Greenwich Mean Time (GMT, independent of time zones) and your local time zone. The GMT date/time can be fetched by calling the function GETUTCDATE().

Conversion calculation:
  • MinutesAfterMinimumSqlDate = (DateTimeActiveDirectory / 600.000.000) - 157.258.080
  • MinutesBetweenGmtAndLocal = DATEDIFF(MINUTE, GETUTCDATE(), GETDATE())
  • Result = 1900-01-01 00:00 + (MinutesAfterMinimumSqlDate + MinutesBetweenGmtAndLocal)


The provider that’s used to query Active Directory has a couple of restrictions, which you need to take into account:
  • A query via the provider returns a maximum of 1000 objects.
  • It’s not possible to get the contents of the attribute “Description” as it is a multi-value field. 


It’s pretty easy to query Active Directory using T-SQL, but the provider has some restrictions which you need to take into account. If you want to query Active Directory without the restrictions I mentioned above than you should use the classes in the System.DirectoryServices namespace of the .NET Framwork in for example an SSIS package or a SQL Server CLR User-Defined Function (UDF).


maandag 11 augustus 2014

Hybrid Query (Combining T-SQL and MDX)

This article describes how to query data from a Analysis Services Multidimensional cube or Tabular model from within T-SQL code. Believe it or not, there are situations where you have to do this!


To get data from a Multidimensional cube or a Tabular model you need to talk "MDX" (or "DAX"). By using OPENQUERY, you are able to query them via a linked server from within your T-SQL code.

Hybrid query

The T-SQL code below shows an example of how you can query a Multidimensional cube by using dynamic T-SQL:
  • To query a Multidimensional cube or Tabular model you need to create a linked server first (in the code below this linked server is called "MULTIDIMENSIONAL").
  • The MDX query returns data where the column names contain square brackets. You can rename those column names by surrounding them with double quotes and giving them an alias ("[Measures].[Turnover]" AS [Turnover]).
  • You can load the data in a temporary table (in this example a table variable is created) so that you can use the results further in your code/process.


zaterdag 22 maart 2014

PASS Nederland (2014-03-18)

This week I attended the PASS evening at the HP office in Amstelveen where they invited Marco Russo to give two sessions about "DAX Patterns" and "DAX from the Field: Real-World Case Studies", which I'd like to share.

DAX Patterns

When building a tabular model people face problems wherefore already a solution exist. Marco Russo and his colleagues created a lot of solutions in tabular and Power Pivot and transformed those into patterns (a general reusable solution to a commonly occurring problem) so they can be easily reused in other projects where the tabular model or Power Pivot is used. These patterns are collected and put on a special website: A couple of examples he showcased were "Cumulative Balance (without using snapshots)", "Custom calendar", and "Actual Vs. Budgets".

PDW (Parallel Data Warehouse)
Between the sessions of Marco Russo a short session has been presented about HP's PDW appliance with SQL Server 2012 and how it is being used at a big customer in Africa. This client has a Data Warehouse of approximately 3 TB, has difficulties with finishing the ETL process on time and also has big performance issues when multiple people were executing ad hoc queries. By using HP's PDW appliance all those issues were solved:
  • The Data Warehouse runs on the appliance.
  • The performance is great (you should experience it to believe).
  • The appliance has been made High Available (they just bought two appliances and put them on different locations). 
People are saying that an appliance is a cube-killer, is that so? Well, not really...
  • Analysis Services works smoothly with SQL Server 2012 PDW and the combination with a Multidimensional cube with ROLAP is perfect as the query response from PDW is superfast.
  • When a lot of ad hoc queries are executed, an appliance is really helpful. When queries become used more widely throughout the company it's recommended to build those in a cube or tabular model.
AX from the Field: Real-World Case Studies

Building a tabular model is easy but when it comes to performance there are a couple of things you have to take into account.
  • Reduce the number of distinct values for columns (less memory needed, increases scan speeds).
  • You can check which part of the tabular model needs attention by using the Excel workbook that is provided on this website:
  • How do you determine the hardware you need? The best road to follow is to first develop your model and filling it incrementally (increments of 10%) with real production data. By doing it this way you can create a good prediction of the memory usage. But the problem is that it normally takes a while to order hardware, so it’s not always handy to wait till a tabular model is developed and filled.
  • When choosing hardware you need to take into account the following:
    - CPU (the fastest you can buy, largest L2 cache, lowest number of cores (less than 8)
    - Memory (the fastest you can buy)
    - Disk / other hardware (don’t waste your money here)
It seems that the latest Xeon processor of iNTEL has a large L2 cache wich is shared by all cores. You can switch-off cores so more cache is available for the other cores.

donderdag 18 juli 2013

Generating and printing SSRS reports automatically via SSIS 2012


If you want to schedule the generation and printing of SSRS reports automatically where each report uses different parameter values and data, you can use the Data Driven Subscription functionality in Reporting Services. But, if you don’t have the Enterprise or Business Intelligence editions of SQL Server 2012, you don’t have this functionality. As an alternative you can use a combination of SSIS and SSRS to implement such a solution, which is described in this article.


The company “ARGP Office” is a distributor of office equipment, which are sold by customers (shops) all over the world. All these customers receive a quarterly letter (and specification) about their turnover (and a bonus). It depends on the customer, and the quarter’s number (Q1, Q2, Q3, or Q4)  if they receive a letter and a specification or only a letter. Also, the contract between the company and the customer defines if they receive a bonus quarterly or yearly or don’t receive a bonus at all.

The letters and specifications are generated and printed at the end of every quarter, which is done by an automatically scheduled ETL process. This process extracts data from different source systems, transforms the data in the right format, and loads it in a reporting database. Configuration settings and the loaded data from the reporting database are used to generate and eventually print the letters and specifications, which are sent to the customers by mail.


In the solution I am going to explain, I will focus on the report generation and printing part of the ETL process. The configuration settings stored in the reporting database tell the ETL process which reports have to be generated (letters and specifications) and for which customers. Using a .NET assembly the letters and specifications are generated as PDF files. This assembly uses the webservice methods of Reporting Services to generate those files from defined reports. When the PDF files have been generated they are printed by using a batch file that calls a Ghostscript.

  • Generating reports
    The actual generation of reports is done by a SSIS package that calls the .NET assembly’s methods from within a Script Task. The SSIS package passes the parameters to those methods so they are able to generate the right letters and specifications by making use of the web services of Reporting Services.

    Adding the .NET assembly to the GAC
    To be able to call the methods of a .NET assembly (ReportGenerator.dll) from a Script Task in SSIS you have to install this assembly in the Global Assembly Cache (GAC) on the production machine. The easiest way to install the assembly in the GAC is by copying it to the directory c:\windows\assembly. When the assembly has been copied you have to reference it within the Script Task by adding a reference (Project > Add Reference…) and importing the namespace in your script (Imports ReportGenerator).

    The following code snippet (VB.NET) shows how the “Generate” method of the Generator object is called with parameters. Important parameters here, are the url of the report to be generated, the parameters that have to be passed to the report, the location where the report has to be saved and the format of the generated file (in this case “PDF”).

    .NET assembly configuration settings
    When a .NET assembly is called by a SSIS packages and uses configuration settings, these configuration settings have to be stored in the configuration file. When running a package in debug mode (BIDS / SSDT) it’s executed by “DtsDebugHost.exe” that has the related configuration file “DtsDebugHost.exe.config” wherein you can place the configuration settings to be used in debug. When running a package in production, it’s executed by “DTExec.exe” that has the related configuration file “DTExec.exe.config” wherein  you can place the configuration settings to be used in production. These files can be found in the directory C:\Program Files\Microsoft SQL Server\90\DTS\Binn.

    Calling webservice methods
    The Generator method of the Generator class calls several methods of the SSRS web services to load a report with the right parameters and render it to a file. To make use of the web services you need to add two web references to your project:
    - http://hostname/reportserver/reportexecution2005.asmx
    - http://hostname/reportserver/reportservice2010.asmx

    The following code snippet (C#.NET) shows how to use the LoadReport() method to load the report, SetExecutionParameters() to set the parameters of the loaded report, and Render() to render the loaded report to a file. For the simplicity of the example, the declaration of variables and error handling is not shown.

  • Printing reports
    The printing of the generated reports is done by a batch script that calls a Ghostscript, which is an open source tool for printing purposes.

    The following snipped of code (Windows command batch file) takes to parameters as input (filename and printer) and calls Ghostscript to print the generated PDF report.

    The references section of this article shows a link to the website where you can download Ghostscript.

With knowledge of SSIS and .NET it’s easy to implement a Data Driven Subscription solution in the  case you don’t have the expensive Enterprise or Business Intelligence editions of SQL Server.



donderdag 22 november 2012

SQL Server 2012 - High Availability - AlwaysOn in Real-life


Nowadays, high availability (HA) and disaster recovery (DR) solutions are built with different techniques like failover clustering, log shipping, mirroring, and replication to achieve maximum application availability and data protection for SQL Server instances and databases within and across data centers. These techniques have their imitations and concerns which are solved by the integrated solution AlwaysOn in SQL Server 2012 (Enterprise Edition). This article describes how AlwaysOn can be used in a real-life scenario wherein an ETL proces can executed without influencing the availability and consistency of data, and performance. The assumption is made that you already have basic knowledge about AlwaysOn, so we won't go into detail too much.


The company "Always Online" has a website that communicates via a webservice, which is used as the interface to the data warehouse database containing huge amounts of data.

The current version of the website has difficulties with handling the increasing number of users and data volumes, which negatively influences the performance tremendously. Every night, the data warehouse is not available for three hours, because during this time window the data in the database is updated by an ETL process. This availability is not acceptable, because also customers  in different time zones access the website.

The database part of the solution has to be rebuilt to maximize the availability and minimize data loss in case of failover. To make this possible the company is going to use AlwaysOn in SQL Server 2012.


To solve the availability issues the following solution has been setup. The webservice connects with a listener that directs the communication to the primary or secondary server node. Each node has a SQL Server instance that contains a data warehouse database in an Availability Group and a snapshot of the data warehouse database that's not part of the Availability Group. The ETL process is running on the primary node and updates the data in the data warehouse database of the primary replica. The updated data of the primary replica is automatically synchronized with the secondary replica. The secondary replica offloads the primary replica and is used for reporting and maintenance tasks like backup.

AlwaysOn Availability Group

An Availability Group consists of availability replicas; one primary replica and up to four active, secondary replicas. All secondary replicas are considered "active" as they can all be used for read-only operations. Each replica can host multiple availability databases, which can be stored on different storage media. Data synchronization between the replica and its secondaries can be done in two modes, namely synchronous and a-synchronous.  With synchronous-commit-mode a transaction is committed on all replicas to guarantee transactional consistency (introduces transaction latency, no data loss, automatic or manual failover, usable for replicas that exist in the same data center). With asynchronous-commit-mode the primary replica is not waiting for a transaction to be committed on the secondary replicas (better performance, more chance of data loss, manual failover, well suited for disaster recovery solutions). Secondary replicas are available for read-only operations, and its connection mode setting tells how a client can connect to this secondary replicata (not, read-intent connections, all connections).


The following flow describes the steps to make it possible to have high availability when the ETL process runs:

  1. The ETL process is not running
    De webservice uses the listener to connect to the data warehouse database on the primary replica. It's completely transparant for the webservice if the primary node failed over to the secondary node. The connectionstring that's used by the webservice must connect to the virtual server name of the listener (don't use the node server names directly).

    The following connection string connects to the data warehouse database "MS_DWH" via the virtual server name "MS_LISTENER" of the listener.

    Data Source=MS_LISTENER;Initial Catalog=MS_DWH;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;
  2. Before the ETL process is started
    Before the ETL Process is started, snapshots of the data warehouse database are created on both replicas and the connection string in the web.config of the webservice is changed to look at the secondary replica on the second node. We are taking a risk here when the secondary replica fails; the choice being made here is alligned with the requirements.

    The following connection string connects to the snapshot of the data warehouse database "MS_SNAP_DWH" directly via the server name "MS_NODE2" of the second node (bypassing the listener).

    Data Source=MS_NODE2;Initial Catalog=MS_SNAP_DWH; Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;
  3. The ETL process is running
    While the data is updated in the data warehouse database on the primary replica, the data is synchronized immediately to the secondary replica. This synchronization has impact on the performance of the ETL process and will require more time to complete.
  4. The ETL process is completed
    When the ETL Process has completed, the connection string in the web.config of the webservice will be changed back to point to the listener and the snapshots will be removed.
  • Suspend / Resume
    In our solution snapshots are used to connect in the case the ETL process is running, because you don't want that users see inconsistent data while the data on the primary replica gets updated. A nicer solution would be to switch the webservice to the secondary replica via the listener (using ApplicationIntent=ReadOnly) before the ETL processs is started and suspending (ALTER DATABASE [MS_DWH] SET HADR SUSPEND) the data synchronization.  When the ETL process is done the webservice can be switched back to the primary replica (don't use ApplicationIntent=ReadOnly) and data movement can be resumed (ALTER DATABASE [MS_DWH] SET HADR RESUME). Unfortunately this solution is not working when data is suspended. Existing connections to one of the replicas will keep working, new connections will fail, showing the following exception message:
    Unable to access the 'MS_DWH' database because no online secondary replicas are enabled for read-only access. Check the availability group configuration to verify that at least one secondary replica is configured for read-only access. Wait for an enabled replica to come online, and retry your read-only operation.
  • Webservice & stateless
    A webservice is stateless, so for every call a new connection is made. Under the hood, an existing physical connection will be reused, but because you don't have control over the application pool recycling and connection pooling you must always assume that every connection is new.
  • ApplicationIntent
    When you connect to the listener without the use of the ApplicationIntent=ReadOnly parameter in the connectionstring, the connection will be directed to the primary replica, otherwise to the secondary replica.

  • .NET Framework version
    The parameter "ApplicationIntent" in a connectionstring is a new feature and supported by the SqlClient class in the .NET Framework 4.0 and higher.

AlwaysOn is a very useful, new feature of SQL Server 2012 to implement high availability and disaster recovery solutions. Very useful in the scenario described above, but it's a pity that it's not possible to connect to a replica in an availability group when data is being suspended. That's why snapshots had to be used, which makes the solution more difficult to implement and maintain.