donderdag 22 november 2012

SQL Server 2012 - High Availability - AlwaysOn in Real-life

Introduction

Nowadays, high availability (HA) and disaster recovery (DR) solutions are built with different techniques like failover clustering, log shipping, mirroring, and replication to achieve maximum application availability and data protection for SQL Server instances and databases within and across data centers. These techniques have their imitations and concerns which are solved by the integrated solution AlwaysOn in SQL Server 2012 (Enterprise Edition). This article describes how AlwaysOn can be used in a real-life scenario wherein an ETL proces can executed without influencing the availability and consistency of data, and performance. The assumption is made that you already have basic knowledge about AlwaysOn, so we won't go into detail too much.

Scenario

The company "Always Online" has a website that communicates via a webservice, which is used as the interface to the data warehouse database containing huge amounts of data.

 
The current version of the website has difficulties with handling the increasing number of users and data volumes, which negatively influences the performance tremendously. Every night, the data warehouse is not available for three hours, because during this time window the data in the database is updated by an ETL process. This availability is not acceptable, because also customers  in different time zones access the website.

The database part of the solution has to be rebuilt to maximize the availability and minimize data loss in case of failover. To make this possible the company is going to use AlwaysOn in SQL Server 2012.

Solution

To solve the availability issues the following solution has been setup. The webservice connects with a listener that directs the communication to the primary or secondary server node. Each node has a SQL Server instance that contains a data warehouse database in an Availability Group and a snapshot of the data warehouse database that's not part of the Availability Group. The ETL process is running on the primary node and updates the data in the data warehouse database of the primary replica. The updated data of the primary replica is automatically synchronized with the secondary replica. The secondary replica offloads the primary replica and is used for reporting and maintenance tasks like backup.


AlwaysOn Availability Group

An Availability Group consists of availability replicas; one primary replica and up to four active, secondary replicas. All secondary replicas are considered "active" as they can all be used for read-only operations. Each replica can host multiple availability databases, which can be stored on different storage media. Data synchronization between the replica and its secondaries can be done in two modes, namely synchronous and a-synchronous.  With synchronous-commit-mode a transaction is committed on all replicas to guarantee transactional consistency (introduces transaction latency, no data loss, automatic or manual failover, usable for replicas that exist in the same data center). With asynchronous-commit-mode the primary replica is not waiting for a transaction to be committed on the secondary replicas (better performance, more chance of data loss, manual failover, well suited for disaster recovery solutions). Secondary replicas are available for read-only operations, and its connection mode setting tells how a client can connect to this secondary replicata (not, read-intent connections, all connections).

Flow

The following flow describes the steps to make it possible to have high availability when the ETL process runs:

  1. The ETL process is not running
    De webservice uses the listener to connect to the data warehouse database on the primary replica. It's completely transparant for the webservice if the primary node failed over to the secondary node. The connectionstring that's used by the webservice must connect to the virtual server name of the listener (don't use the node server names directly).

    The following connection string connects to the data warehouse database "MS_DWH" via the virtual server name "MS_LISTENER" of the listener.

    Data Source=MS_LISTENER;Initial Catalog=MS_DWH;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;
  2. Before the ETL process is started
    Before the ETL Process is started, snapshots of the data warehouse database are created on both replicas and the connection string in the web.config of the webservice is changed to look at the secondary replica on the second node. We are taking a risk here when the secondary replica fails; the choice being made here is alligned with the requirements.

    The following connection string connects to the snapshot of the data warehouse database "MS_SNAP_DWH" directly via the server name "MS_NODE2" of the second node (bypassing the listener).

    Data Source=MS_NODE2;Initial Catalog=MS_SNAP_DWH; Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;
  3. The ETL process is running
    While the data is updated in the data warehouse database on the primary replica, the data is synchronized immediately to the secondary replica. This synchronization has impact on the performance of the ETL process and will require more time to complete.
  4. The ETL process is completed
    When the ETL Process has completed, the connection string in the web.config of the webservice will be changed back to point to the listener and the snapshots will be removed.
Remarks
  • Suspend / Resume
    In our solution snapshots are used to connect in the case the ETL process is running, because you don't want that users see inconsistent data while the data on the primary replica gets updated. A nicer solution would be to switch the webservice to the secondary replica via the listener (using ApplicationIntent=ReadOnly) before the ETL processs is started and suspending (ALTER DATABASE [MS_DWH] SET HADR SUSPEND) the data synchronization.  When the ETL process is done the webservice can be switched back to the primary replica (don't use ApplicationIntent=ReadOnly) and data movement can be resumed (ALTER DATABASE [MS_DWH] SET HADR RESUME). Unfortunately this solution is not working when data is suspended. Existing connections to one of the replicas will keep working, new connections will fail, showing the following exception message:
    Unable to access the 'MS_DWH' database because no online secondary replicas are enabled for read-only access. Check the availability group configuration to verify that at least one secondary replica is configured for read-only access. Wait for an enabled replica to come online, and retry your read-only operation.
  • Webservice & stateless
    A webservice is stateless, so for every call a new connection is made. Under the hood, an existing physical connection will be reused, but because you don't have control over the application pool recycling and connection pooling you must always assume that every connection is new.
  • ApplicationIntent
    When you connect to the listener without the use of the ApplicationIntent=ReadOnly parameter in the connectionstring, the connection will be directed to the primary replica, otherwise to the secondary replica.


  • .NET Framework version
    The parameter "ApplicationIntent" in a connectionstring is a new feature and supported by the SqlClient class in the .NET Framework 4.0 and higher.
Conclusion

AlwaysOn is a very useful, new feature of SQL Server 2012 to implement high availability and disaster recovery solutions. Very useful in the scenario described above, but it's a pity that it's not possible to connect to a replica in an availability group when data is being suspended. That's why snapshots had to be used, which makes the solution more difficult to implement and maintain.

References

maandag 29 oktober 2012

Change Data Capture: process a schema change (DML)

Introduction

CDC (Change Data Capture) captures data changes caused by DML (Data Manipulation Language) statements like inserts, updates, and deletions in so called "change tables". These change tables have exactly the same schema as the source tables, enriched with several meta data columns. This blog article describes a way to handle the situation where the schema of a source table changes without loosing the captured change data in the change tables.

Context

In the table "SoccerStatistics" the wins, draws, and losses per team in the Premier League are maintained on a weekly basis.

 
The table is enabled for CDC; all inserts, updates, and deletions are stored in the change table "SoccerStatistics_CT". Using SQL Server Integration Services (SSIS), only the changes are loaded into the Data Warehouse.
 

Halfway the season, a new requirement came up to add the extra column "GoalDifference" (goals for - goals against) without losing historical data.
 
Steps
 
The following steps have to be followed to add a column to a CDC enabled table:
 
  1. Backup change data
    It's only possible to process a schema change on a source table when it's disabled for CDC. When a table is disabled for CDC the related change table is deleted, including its data, which is an unwanted situation. That's why you first have to create a temporary table having exactly the same schema as the change table, extended with the extra column "StartLSN". Populate this temporary table with the data of the change table, and fill the column "StartLSN" with the value "start_lsn" that can be found in the table cdc.change_tables.

    The fastest way to create a temporary table with the same schema as the change table is by using the construction "SELECT * INTO", like this is done in the following T-SQL code:

    SELECT * INTO [dbo].[SoccerStatistics_Temp] FROM [cdc].[dbo_SoccerStatistics_CT]
    Getting and updating the "StartLsn"column can be done with the following T-SQL code:
    SELECT @StartLsn = [start_lsn] FROM [cdc].[change_tables] WHERE [capture_instance] = 'dbo_SoccerStatistics';
    UPDATE [dbo].[SoccerStatistics_Temp] SET [StartLSN] = sys.fn_varbintohexstr(@StartLsn)


  2. Disable CDC
    Now the change data has been saved to the temporary table, the table can be disabled for CDC.
  3. Process schema changeRun a T-SQL script to add the column "GoalDifference" to the source table and the temporary table.
  4. Enable CDC
    Enable the source table for CDC; the created change table will also contain the newly added column "GoalDifference".
  5. Restore change data
    Copy the change data, stored in the temporary table, to the change table.
  6. Update cdc.change_tables
    When a table is enabled for CDC, an entry is added to the table cdc.change_tables, which is used by the Table-Valued Functions (TVFs). The column "start_lsn" will point to the first logged DML in the change table. Because we populated the  change table in a different way, the "start_lsn" won't be in sync, and the Table-Valued Functions will return unexpected results or will fail. To solve this, the "start_lsn" has to be updated by the value that can be found in the column "StartLSN" in the temporary table. This will make sure that the Table-Valued Functions will work as expected.

    Updating the "start_lsn" value is shown in the following T-SQL code:
    DECLARE @startLsn AS BINARY(10);
    SELECT TOP 1 @startLsn = [StartLSN] FROM [dbo].[SoccerStatistics_Temp]
    UPDATE [cdc].[change_tables] SET [start_lsn] = @startLsn WHERE [capture_instance] = 'dbo_SoccerStatistics';

  7. Remove temporary table
    The change table has been populated with the historical data, so the temporary table can be removed.
Conclusion
 
Although some tweaking is needed it's fairly simple to process schema changes on CDC enabled tables without losing your historical data.




 



dinsdag 10 januari 2012

SQL Server: Change Data Capture

Introduction

CDC (Change Data Capture) captures data changes caused by DML (Data Manipulation Language) statements like inserts, updates, and deletions and makes it possible to archive data changes without any additional programming. These captured changed are stored in change tables that mirror the column structure of the original, tracked, user-created source tables with additional columns that include extra metadata. With CDC no changes are required to the existing source table schemas. Several database objects are provided (stored procedures, functions, and jobs) to handle the captured changes and integrate with external systems like a DWH (Data Warehouse).

Overview

All data changes caused by inserts, updates, and deletions are saved in the transaction log file _log.ldf, which is used as a source for the change data capture process. This change data capture process is scheduled by the SQL Server Agent that executes the capture job cdc._capture. The capture job scans the transaction log every 5 seconds, processing a maximum of 1000 transactions (these numbers are based on default settings, which can be changed for performance optimalisation purposes). Take into account that there’s latency between the time that a change is committed to the source table and the time that the change appears within its associated change table as the capture process extracts data from the transaction log. Archiving changes causes huge amounts of data and therefore a cleanup job cdc._cleanup exists that removes change data from the change tables that is older than 3 days.



The CDC functionality is only supported by the Enterprise and Developer editions of SQL Server 2008 R2.

Setup

For using CDC functionality you have to go through the following steps.
  • Enable CDC for database
    Before changes to any individual table within a database can be tracked change data capture must be explicitly enabled for the database, which you can do by executing the following T-SQL script:

    EXEC sys.sp_cdc_enable_db

    After executing this T-SQL script the following database objects are created:

    Object Description
    Schema Cdc
    Tables Several tables are created using the schema “cdc”
    Stored procedures Several stored procedures are created prefixed by “sp_cdc_” and “sp_MScdc_”.

    You can check which databases are enabled for CDC by querying the table “sys.databases” in the master database.

    SELECT [name], database_id, is_cdc_enabled
    FROM sys.databases
    WHERE [name] = 'DatabaseName'

  • Enable CDC for table(s)
    To enable a table for CDC, you have to execute the following T-SQL script (this example enables CDC for the table “schemaName.TableName”):

    EXEC sys.sp_cdc_enable_table @source_schema = N'schemaName', @source_name = N'TableName', @role_name = 'RoleName'

    After executing this T-SQL script the following database objects are created:

    Object Description
    Table [cdc].[dbo_HcmVolume_CT]
    Table-valued functions (TVF) The following table-valued functions are created:
    - [cdc].[fn_cdc_get_all_changes_<schemaName>_<TableName>]
    - [cdc].[fn_cdc_get_net_changes_<schemaName>_<TableName>]
    Stored procedures The following stored procedures are created:
    - [cdc].[sp_batchinsert_<ChangeTableID>]
    - [cdc].[sp_insdel_<ChangeTableID>]
    - [cdc].[sp_upd_<ChangeTableID>]
    Jobs The following jobs are created and started immediately (only happens once when the first table is enabled for CDC):
    - [cdc].[DatabaseName_capture]
    - [cdc].[DatabaseName_HcmVolume]

    For every tracked table a change table is created and two related table-valued functions. These three database objects together are called a capture instance.

    You can check which tables are enabled for CDC and which columns are tracked by querying the tables “cdc.change_tables” and “cdc.captured_columns”.

Change table schema


Default a change table has the same column structure as the original, tracked table, enriched with five additional metadata columns:

Name Description
__$start_lsn Identifies the commit log sequence number (LSN) that was assigned to the change. You can use this value to query the table [cdc].[lsn_time_mapping] to get date/time information of the change. Changes committed in the same transaction share the same commit LSN value.
__$end_lsn NULL (not used)
__$seqval Used for ordering multiple changes in one transaction.
__$operation Contains a value which corresponds to the executed DML operation:
• 1 = Delete
• 2 = Insert
• 3 = Update (before)
• 4 = Update (after)
__$update_mask Contains a bit mask value that tells which field values have been changed (only useful for updates). In the case of inserts or deletions all bits are set, in the case of updates only those bits are set that correspond to changed columns.

For every insert or delete operation a single row is added to the change table. For an update operation two rows are added (one with the value before the update and one with the values after the update).

Querying for changes

As a user you might want to know how a value changes over time and what DML operation has been executed. CDC provides several useful stored procedures and table-valued functions (TVF) to get insight in that. The following workflow explains how this is done.
  1. Determine time windowDetermine the “from” date/time and the “to” date/time.
  2. Map date/time to LSNMap the “from” date/time and “to” date/time to log sequence numbers (LSN).
  3. Get changesGet changes that occurred in the defined time window. You can query for net / all changes:
    Net: the resultset reflects the final contents of records
    All: the resultset contains all changes executed on records

In the following T-SQL code example, data changes are queried in a particular time window (from now till 24 hours back):

DECLARE @dateTimeFrom DATETIME;
DECLARE @dateTimeTo DATETIME;
DECLARE @lsnFrom BINARY(10);
DECLARE @lsnTo BINARY(10);
SET @dateTimeFrom = GETDATE() – 1;
SET @dateTimeTo = GETDATE();
SELECT @lsnFrom = sys.fn_cdc_map_time_to_lsn('smallest greater than', @dateTimeFrom);
SELECT @lsnTo = sys.fn_cdc_map_time_to_lsn('largest less than or equal', @dateTimeTo);
SELECT * FROM cdc.fn_cdc_get_all_changes_<schemaName>_<tableName>(@lsnFrom,@lsnTo,'all');

Considerations

When implementing CDC take the following into account:
  • CDC does not track who actually executed the change on the data, so you have to add a metadata column with user information to the tracked tables.
  • When using CDC the transaction log will become very big (approximately 250% of the original size), because also the inserted records in the CDC change tables will be written to the transaction log.
  • When enabling a table for CDC, make sure you define a different file group name on a different physical disk.
  • It’s important to only CDC the needed columns: narrow tables perform much better than wide tables.
  • When latency is too high you can change the following parameters of the capture job:
    - Increase maxscans
    - Increase maxtrans
    - Decrease pollinginterval
  • There’s no limitation on the number of tracked columns.

References


Conclusion


Change data capture is a great out-of-the-box functionality to track changes without using any custom programming and touching the database model (except when you want to have information about who modified the data). With the available stored procedures and functions it's very easy to setup CDC and handle change data.

maandag 3 oktober 2011

Book review: Microsoft SQL Server 2008 R2 Master Data Services

Because of my experience and knowledge about Master Data Services I have been asked to review the book Microsoft SQL Server 2008 R2 Master Data Services. Looking at the authors’ names (Ian who wrote the preface is the founder of Master Data Services, and Jeremy is very active on the Master Data Services community) that worked on this book I got the feeling that this had to be an interesting book to read (even with my background), and while reading this book this was definitely confirmed.

The book guides you, step-by-step, through the world of Master Data Services and can be used as an installation guide with a lot of in-depth, background information. This book differentiates itself with other books about Master Data Services on especially the integration part with SharePoint (workflows) and BizTalk, which is described very thoroughly. Also the description of what’s going on under the hood and the API is very detailed, which proves the knowledge and experience of the authors.

I can really recommend this book to everyone who is working or planning to work with Master Data Services and also this book will still be valid (except some minor points, like the GUI and entity based staging) for the next release of Master Data Services (code name “Denali”).

woensdag 10 augustus 2011

Silverlight: PivotViewer Control

Introduction

The Silverlight PivotViewer control is a very powerfull control that visualizes a lot of data at once. The user is able to filter the data and get insight in this data very easily. The data used for the viewer consists of a collection of images and xml data. The creation of this image collection and xml data is described in this article.

Overview

A PivotViewer shows items that have related images and characteristics. Images are stored in the so called Deep Zoom format and have different size versions so that users are able to zoom those images. The characteristics (facets) of all items are stored in an XML file and are categorized in facet categories.



Creation of image collection

For creating a Deep Zoom image collection, two classes are important. Microsoft.DeepZoomTools.CollectionCreator is used to create a collection of images in the Deep Zoom format and uses images of the type Microsoft.DeepZoomTools.Image as input. The following code shows how to create a collection of images.

//Declare variables
CollectionCreator
collectionCreator = new CollectionCreator();
List<Image>
images = new List<Image>();

//Add image to collection
images.Add(new Microsoft.DeepZoomTools.Image(imagePath));
//Specify item collection properties
collectionCreator.TileSize = 256;
collectionCreator.TileFormat = Microsoft.DeepZoomTools.ImageFormat.Png;
collectionCreator.ImageQuality = 1.0;
collectionCreator.TileOverlap = 0;
//Create collection
collectionCreator.Create(images, destinationDirectory);

Creation of Xml data

The XML data is saved in an XML file and contains item characteristics (facets) and facet categories. The following code shows how to create the XML file where only one facet category, item and facet is added.

XmlDocument xmlDocument = new XmlDocument();           
bool created = true;

string xml = "";

xml += "<?xml version=\"1.0\" encoding=\"utf-8\"?>";           
xml += "<Collection xmlns:p=\"http://schemas.microsoft.com/livelabs/pivot/collection/2009\" SchemaVersion=\"1.0\" Name=\"" + collectionName + "\" xmlns=\"http://schemas.microsoft.com/collection/metadata/2009\">";


//Categories
xml += "<FacetCategories>";
xml += "<FacetCategory Name=\"" + facetName + "\" Type=\"" + facetType + "\"/>";
xml += "</FacetCategories>";


//Itemsxml += "<Items ImgBase=\"" + imageCollectionPath + ".xml\">";
xml += "<Item Id=\"0\" Img=\"#0\" Name=\"" + itemName + "\">";
xml += "<Description>" + itemDescription + "</Description>";
xml += "<Facets>";
xml += "<Facet Name=\"" + facetName + "\"><" + facetType + " Value=\"" + facetValue + "\" />";
xml += "<Facet>";
xml += "</Facets>";
xml += "</Item>";
xml += "</Items>";
xml += "</Collection>";


xmlDocument.LoadXml(xml);
xmlDocument.Save(destinationDirectory + collectionName + ".cxml");

The following code shows the generated XML file.

<?xml version="1.0" encoding="utf-8"?>
<
Collection xmlns:p="http://schemas.microsoft.com/livelabs/pivot/collection/2009" SchemaVersion="1.0" Name="collectionName" xmlns="http://schemas.microsoft.com/collection/metadata/2009">
       <
FacetCategories>
              <
FacetCategory Name="facetName" Type="facetType" />
       </
FacetCategories>
       <
Items ImgBase="imageCollectionPath.xml">
              <
Item Id="0" Img="#0" Name="itemName">
                     <
Description>itemDescription</Description>
                     <
Facets>
                            <
Facet Name="attributeName">
                                  <
facetType Value="facetValue" />
                            </
Facet>
                     </
Facets>
              </
Item>   
       </Items>
</
Collection>

The PivotViewer control loads the XML file to get all the information it needs to show the items.

Conclusion

With only a small amount of code it’s pretty easy and straightforward to create the image collection and XML file as input for the PivotViewer control.

References

donderdag 4 augustus 2011

Master Data Services – Denali’s Master Data Services on top of SQL Server 2008 R2

Introduction

Master Data Services ships with SQL Server 2008 R2, which installs the web based Master Data Manager where business users can maintain their master data. One of the major complaints that business users have about the Master Data Manager, is that it performs very bad and that the user interface is not very user-friendly. Microsoft has listened to these complaints and improved the user interface in the next version of SQL Server (codename = “Denali”). The performance is now much better and the user interface is more user-friendly wherefore Silverlight is used.

When a company just rolled out SQL Server 2008 R2 on their servers, it’s unlikely going to happen that they will update immediately to Denali when it’s released. So, a company will have to use the cumbersome Master Data Manager until they upgrade to Denali. As told in the presentation Managing Master Data with MDS and Microsoft Excel it’s possible to keep your SQL Server 2008 R2 instance and install Denali’s Master Data Services on top of it. You now have the ability to make use of the new Master Data Manager and also the Excel add-in. This article describes how this can be done.


Installation

For the installation I made use of Oracle’s VirtualBox wherein I used Windows Server 2008 R2 – Service Pack 1 as operating system. The following steps describe briefly how I installed a working system. 

  • Install SQL Server 2008 R2
  • Install SQL Server 2008 R2 – Master Data Services
    The setup is provided separately on the installation DVD, which can be found in the directory “\MasterDataServices\x64\1033_ENU”. When using default directories, Master Data Services will be installed in the directory “C:\Program Files\Microsoft SQL Server\Master Data Services”.
  • Install Hotfix
    Install the hotfix Cumulative update package 8 for SQL Server 2008 R2 and upgrade existing MDS databases. At the end of the hotfix installation the Master Data Services Configuration Manager is automatically started where you have the possibility to upgrade “old” Master Data Services databases to the new ones by selecting the a database and pushing the “Upgrade Database…” button.



    When you don’t install the hotfix and try to upgrade to Denali’s Master Data Services, you will get the following error (“The upgrade from the version of the Database is not supported or the SQL scripts for the upgrade is missing”):

  • Install .NET Framework 4.0
    Although the installation of Denali’s Master Data Services describe that you need .NET Framework 3.5 as a pre-requisite, you have to install .NET Framework 4.0 to install Master Data Services successfully. If you don’t install .NET Framework 4.0, you will get the following error during installation (“Error writing to file: Microsoft.MasterDataServices.Common.dll. Verify that you have access to that directory.”), which doesn’t really point you to the pre-requisite that you have to install the right .NET Framework:

  • Install Denali - Master Data Services
    With Denali, Master Data Services has now become part of the setup. Go through the setup and only select Master Data Services to install. After the installation you have to start Denali’s Master Data Services Configuration Manager and upgrade the Master Data Services databases that also have been by the hotfix. Denali's Master Data Services is installed in a different directory, namely "C:\Program Files\Microsoft SQL Server\110\Master Data Services".
Now you are ready to use Denali’s Master Data Manager and also the Excel add-in to work with your master data, with only installing Denali’s Master Data Services!

Conclusion

Denali’s Master Data Services works perfectly on top of SQL Server 2008 R2 and I would really recommend everyone to think about this option for those who are using Master Data Services.

I would like to thank Suzanne Selhorn (Microsoft) for her support to make this work.

References

 

vrijdag 8 juli 2011

Master Data Services – Shortcomings, improvements and future release of SQL Server (Denali)

Introduction

The first version of Master Data Services shipped with SQL Server 2008 R2. We all know that a first version of a Microsoft product has some shortcomings and needs some improvements to get more mature, which is also true for this version of Master Data Services. In this article I will focus on what the shortcomings are and what can be improved to make Master Data Services a more mature product and interesting to use for a (enterprise) Master Data Management implementation. This is described from different user perspectives. I will also describe several improvements that are already available in the upcoming release of SQL Server (codename “Denali”).

Data steward perspective

A data steward primarily uses the functional area “Explorer” to maintain data.

The following bullets describe some points that need improvement.

  • User-friendly and performanceWhen working with Master Data Services the first thing you will notice is that the user interface of the Master Data Manager is not very user-friendly and performs very slow. Because of this, the Master Data Manager is not well suited for business users to maintain their data. Fortunately, you can use the Master Data Services WCF API to create your own user interface, but this will take some time and will influence your Return of Investment (ROI). The company Profisee created a tool called “Master Data Maestro”, which has a user friendly and good performing user interface on top of Master Data Services and could be an alternative for creating your own user interface.
  • Hourglass
    You don’t always see the “hourglass” functionality when you perform a particular action. Users will tend to click again, push the back button or refresh, because they think nothing happens.
  • Creation of new members
    When creating a new member, you first populate the system attributes “Code” and “Name” and then you are able to populate the other attributes. It would be nice to be able to populate more attributes at once, because this saves several mouse-clicks and makes the creation of a new member more user-friendly.
  • Filtering domain-based attributes
    An entity could have one or more domain-based attributes that reference other entities. When a referenced entity contains a lot of members, the selection of the domain-based attribute value will take a while, because you need a lot of time to search the value. It would be nice to filter the list of domain-based values, related to a selection you made earlier in another attribute. A workaround to do this is to navigate through a derived hierarchy and add a leaf member.

Administrator perspective

An administrator primarily uses the functional areas “Version Management”, “Integration Management”, “System Administrator”, and “User and Group Permissions” to maintain versions of the data, integration of external systems, adjustment of model(s), and setting permissions.

The following bullets describe some points that need improvement.
  • Renaming attribute group
    When renaming an attribute group, all attributes that belong to this group need to be added again manually. It would be nice that these attributes are still related to the renamed attribute group.
  • Non applied member permissions
    When changing member permissions are when they are not applied, the effective permission equals “99” or contains a value you didn’t expect (example: you set the permission of a member on group level on “Deny”, but effective it shows “Update”). It would be better to show a state that the permission is not applied yet to make it clearer.
  • State of staging process
    When staging a batch of data you have to refresh the page several times to check if the batch has been loaded. If would be better to have some progress bar functionality to have feedback about the process’ progress.

Developer perspective

A developer is needed to interface Master Data Services with the outside world and extend the business rule functionality with custom (SharePoint) workflows. The following bullets describe some points that need improvement.
  • Staging with SQL Server Integration Services (SSIS)
    Loading member data into the staging tables of Master Data Services is a relatively complex and time-consuming task to build in SSIS. It would be nice to have a Master Data Services staging task to load members of an entity into the staging area (entity-based staging).
  • Sending notification to e-mail address defined in attribute
    Within Master Data Services you can relate an e-mail address to a business rule, but in some cases you want to send a notification to an e-mail address that’s defined in an attribute (for example to send an e-mail to the requestor of a member that it has been reviewed). Right now the only way to implement this behavior is in a custom workflow.
  • List of possible conditions and actions (business rules)
    With business rules you can only define “simple” conditions and actions, and for more complex business rules you need to implement this in a custom workflow (for example the generation of a date/time value or formatting a string). Extending the list of conditions and actions with more frequently used actions, would make the life of the developer much easier.
  • Business rules and handling attributes of a referenced entity
    Using the values of referenced attributes (via domain-based attribute) in conditions and actions can cause unpredictable behavior. When a referenced attribute is renamed, this won’t be updated in the business rule and causes an error that eventually directs you to the auto-generated stored procedure (*ProcessRules) that describes the business rule. Using a referenced attribute value in a condition seems to work, but when using it as part of an action this doesn’t work. For example, populating an attribute with an attribute value from a referenced entity fails.

SQL Server Denali

Microsoft continuously listens to the feedback of customers and already provided several small patches/updates. In the next version of SQL Server (codename “Denali”) a lot of improvements are implemented. Here’s a list of the most important improvements:
  • The user interface has been rebuilt in Silverlight and is now user-friendly.
  • The performance is much better.
  • You can now use the Master Data Services Excel plug-in (Excel 2007 and Excel 2010) to update batches of master data.
  • Within SSIS pivoting is not needed anymore and you are able to stage the members of an entity on an “entity-based” way. It’s possible to use the enhanced user interface and the Excel plug-in on SQL Server 2008 R2, but to make this work you need the updated Master Data Services WCP API that ships with Denali. 
Conclusion

Although it’s the first version of Master Data Services it is usable within Master Data Management projects. When the next version is improved on points I mentioned in this article, and I can say that Denali already looks promising, Master Data Services is what you want to use in your Master Data Management projects.

References