You can use the cleansing functionality of Data Quality Services (DQS) in SSIS by using the “DQS Cleansing” task. This cleansing task used the selected DQS Knowledge Base to cleanse data. Under the hood this task points to the numerical identifier of the selected DQS Knowledge Base. In a DTAP (Development, Test, Acceptance, and Production) environment it could happen that this identifier is different throughout the DTAP environment and also when a DQS Knowledge Base is redeployed / recreated it gets a new identifier. When this identifier is not the same as the hard-coded one in the SSIS packages, the package will fail. This article gives you more background information about this issue and how to work around it.
Storage of DQS Knowledge Base
When you create a new DQS Knowledge Base or import an existing one, a new numerical identifier will be created specific for this knowledge base.This identifier is stored in the table [DQS_MAIN].[dbo].[A_KNOWLEDGEBASE] where also project related records are stored. The following table shows a couple of important columns of this table:
You can execute the following T-SQL query to get all above mentioned values with their descriptions:
SELECT groupDQS.[ID] AS [groupId],
groupDQS.[NAME] AS [GroupName],
memberDQS.[KEY] AS [MemberId],
memberDQS.[VALUE] AS [MemberValue]
FROM [DQS_MAIN].[dbo].[A_CODE_MEMBER] AS memberDQS
RIGHT OUTER JOIN [DQS_MAIN].[dbo].[A_CODE_GROUP] AS groupDQS ON groupDQS.[ID] = memberDQS.[GROUP_ID]WHERE groupDQS.[NAME] = '
When you create a DQS Knowledge Base, a new numerical identifier will be created, which is also used to create a knowledge base specific schema (KnowledgeBaseXXXXXXX) that is used to group several knowledge base specific tables (B_ASSOCIATION_RULE, B_ASSOCIATION_RULE_CONSTRAINT, B_ASSOCIATION_RULE_CONSTRAINT_DOMAIN, B_COMPOSITE_DOMAIN_VALUE_RELATION, B_DATA_SERVICE_FIELD, B_DATA_SERVICE_RELATION, B_DATA_SERVICE_SCHEMA, B_DATA_SERVICE_SCHEMA_PROPERTY, B_DOMAIN_TERM_BASED_RELATION, B_DYNAMIC_OBJECTS, B_INDEX_LEXICON, B_INDEX_LEXICON_EXTENSION, B_INDEX_LEXICON_EXTENSION_RELATION, B_MATCHING_RULE, B_MATCHING_RULE_DOMAIN_ELEMENT, B_PATTERN, B_REFERENCE_DATA_MAPPING, B_REFERENCE_DATA_MAPPING_CHILD_DOMAINS, B_REFERENCE_DATA_MAPPING_PARAMETERS, B_RULE).
Import DQS Knowlege Base
A DQS Knowledge Base can be imported throughout the DTAP environment via the Data Quality Client or via the command line.
Export / import functionality in the Data Quality Client
An existing DQS Knowledge Base can be exported to a file (*.dqs) by selecting “Export Knowledge Base”.
A new DQS Knowledge Base can be imported from a file (*.dqs) via the “New Knowledge Base” button on the main screen. You can’t overwrite an existing knowledge base.
Export / import functionality via command line
The following command exports all DQS databases (it’s not possible to export one specific DQS database) from the “MDM” instance to the file “MyKnowledgeBases.dqsb”.
DQSInstaller -exportkbs c:\MyKnowledgeBases.dqsb –instance MDM
The following command imports all DQS databases (it’s not possible to import one specific DQS database) from the file “MyKnowledgeBases.dqsb” into the “MDM” instance.
DQSInstaller -importkbs c:\MyKnowledgeBases.dqsb –instance MDM
SSIS Cleansing Task
The SSIS Cleansing Task references a DQS Knowledge Base by using an identifier. The following image shows a piece of the XML code behind the SSIS package. It shows that the DQS Cleansing task refers to the hard-coded knowledge base name “2000000” (strange the property is named “Knowledgebasename” while it contains an identifier value).
You can’t deploy a working SSIS package throughout the DTAP environment as it could be that Development is using another identifier for the DQS Knowledge Base than Acceptance, which makes it impossible to execute automated deployments.
On the internet several work-arounds are mentioned but none of them makes it possible to deploy your SSIS package (without any adjustments) throughout your DTAP environment. Follow the following steps to make this possible:
- Reset the auto-increment column [DQS_MAIN].[dbo].[A_KNOWLEDGEBASE].[ID] by executing the following T-SQL script:
DBCC CHECKIDENT ('[DQS_MAIN].[dbo].[A_KNOWLEDGEBASE]', RESEED, 1999999);
After executing this script, a newly created DQS Knowledge Base will get the identifier 2000000.
- Create a new DQS Knowledge Base by importing one via the export/import functionality in the Data Quality client or via the command line. If you need to overwrite an existing one, you first need to delete that one via the Data Quality client tool.
- Deploy your SSIS package which points to the DQS Knowledge Base with the identifier 2000000.
This issue with the hard-coded identifier is really a bug in the SSIS Cleansing task which should be solved by Microsoft. When the identifier is replaced by a DQS Knowledge Base name, no work-arounds are needed and it’s then possible to deploy your SSIS packages throughout the DTAP environment without any adjustments.