Saturday 10 December 2011

Using Conditional Split data Transfer in SSIS 2008

1. Introduction

This article uses the Integration Services Conditional Split Data Transformation element to filter and transfer data from a set of flat text files to SQL Server database table. The concept can be easily extended to apply to any other source or destination such as Microsoft Excel. This scenario is useful in creating denormalized database tables in a reporting and analysis situation.

2. Data Load Task

Before we start writing any ETL (Extract, Transform and Load) task, we should first layout a rough sketch of the task and sub tasks involved in the package. In this simple tutorial, we are planning to write an ETL package that does the following:
  • Read a set of text files from a folder, each with a common format
  • Separate the records in the files into two groups
  • Load each group into designated destination table
In this tutorial, we have a set of files that include flower plant names and their types, i.e., Seasonal or Perennial. The files are from different sources but have the same format with a header and data rows. The data elements are comma separated values. The attached compressed file includes a top folder SSISSolution. This folder has two subfolders, i.e., the DataFiles folder containing the two sample data text files and SSISConditionalSplit Visual Studio 2008 solution folder. When you unzip the folder on your local machine, you can open the solution using Visual Studio 2008. However, you will have to update the package for all the connection information based on your SQL Server 2008 installation.

3. Project Setup

Open Visual Studio 2008 IDE and setup an SSIS project as shown in the diagram below. After you name the solution, click OK to create the project. The package will be named as Package.dtsx by default. You may rename the package if you wish.

4. Add Foreach Loop Container on the Control Flow Panel

Select and drag a Foreach Loop Container control flow element from the toolbox on the left panel of the IDE to the control flow design surface. Rename the element to something meaningful such as "Loop Over Nursery Flower Source". While the control is selected on the design surface, double click on it to open its properties dialog. The dialog is shown in the figure below:
 Next, create a variable to hold the file names iterated in the Foreach Loop Container. Select the "Variable Mappings" on the properties dialog of the loop container and then select the "New Variable" item from the combo drop down as shown in the figure below:
Now, in the new variable dialog shown below, specify the variable container scope as Package and name the variable as vSourceFile, namespace as User and type as String and click ok on both dialogs to return to the control flow surface.


5. Create Flat File Source Connection Manager

In this example, the source of our data are a set of text files in a folder and the destinations are SQL Server 2008 tables. Right click on the connection managers pane on the IDE and choose the "New Flat File Connection...". The Flat File Connection Manager dialog opens as shown in the figure below:
 Specify the entries as shown.
  • Connection Manager Name
  • File Name
  • Header row delimiter as Comma{,}
  • Select the check box to use the column names in the first row
All other entries you can leave as default. Note that even if connection will execute within the loop container to use a different file in each loop, you can still specify one file. As you will see below, you need to configure the expression of the flat file connection manager to point to the Foreach Container Loop variable we have already created.
Now, you need to create an expression for the connection string of the Flat File Connection Manager to point to the file variable of the Foreach Loop Container. Select the Flat File Connection Manager you just created and then go the properties pane in the solution explorer and click the "..." button across the expression entry on the properties tree view. This will open the Property Expression Editor dialog as shown below. Select the "Connection String" entry in the left combo box.
 Now, click the "..." button for the Expression Builder dialog as shown below. Select the file variable and specify the expression as shown in the dialog and click Ok to close the dialog. This will bind the variable to the connection manager.


6. Create SQL Server Database Destination Connection Manager

Right click the connection manager pane and select "New ADO.Net Connection" entry. On the dialog that opens, click the "New.." button to open the Connection Manager dialog and make selections as shown in the dialog below. Note that the selections of server name and type of connection you specify depends upon the SQL server 2008 you have setup on your local computer.


7. Add a Data Flow Task to the Foreach Loop Container

Drag a Data Flow Task from the Control Flow Items pane to the surface of the Foreach Loop Container. Note that this Data Flow Task is a control flow element and will control the execution of the data flow tasks configured on the Data Flow panel. You may also rename the Data Flow Task to something meaningful like "Split Text Data". The control flow surface with the connection manager will look something as shown in the dialog below:

8. Configure Data Flow

Now, double click the data Flow Task on the loop container. This will select the Data Flow tab panel on the designer and display the "Data Flow Sources", "Data Flow Transformations" and "Data Flow Destinations" on the tool box pane on the left of the designer. Drag a Flat File Source item from the tool box and place on the Data Flow surface. Double click the Flat File Source to open the properties dialog. Select the Flat File Connection manager you have created in this tutorial in the combo box of the dialog.
Now, drag a Conditional Split element from the Data Flow Transformations panel to the Data Flow surface and connect the Flat File Source to the conditional split by dragging the green arrow from the Flat File Source to the Conditional Split. Double click the Conditional Split element to open the dialog as shown below:
 On this editor, you will create condition for splitting the data to send to the source. Because the Conditional Split is linked to the source that has the connection to the Flat File Source, you will be able to create condition based on the data. We will create the following two conditions. The fact that we can specify multiple conditions is a powerful visual and computational feature of SSIS. This is extremely useful in reporting and analysis situations.
  • Pick up all perennial flowers
  • Pick up all seasonal flowers
In the dialog shown above, expand the column names node and drag PlantType column to the table rows below and name the condition and output name as shown in the dialog. This completes the Conditional Split specification. Next we will add the destinations for the two filtered datasets.
Drag two ADO NET destinations to the Data Flow panel. Rename the destination to something meaningful. Drag the output green arrows from the Conditional Split to the destinations. As you drag the green arrow to the destination, you will be prompted with the following dialog to specify the filter condition for the destination. The dialog is shown below:


Select the condition you have specified before and click ok. Now, you need to select the destination table. Double click the destination dialog to open the editor. Choose the ADO NET connection manager from the drop down and and the destination table as shown in the dialog below. Optionally, you can go to the Mappings pane on the editor and setup the mappings between the filtered source columns and destination table columns. You can do this by dragging the source column to the mappping table on the lower panel of the editor dialog. Both the views of this destination dialog editor are shown in the figures below:


 The final layout of the data flow pane should look similar to the one shown in the figure below:
 Now, you can run the package in the Visual Studio environment and verify that the two destination tables are populated with the filtered record sets.

9. Conclusion and Discussion

This simple tutorial demonstrates the usage of conditional split transfomation in SSIS. The sample can be extended to any source and destination with modifications to connections and split conditions. This feature of the SQL Server 2008 provides powerful mechanism for visually designing ETL packages for reporting, analysis and data warehousing scenarios without writing a single line of code.
In subsequent articles in this column of Project, I will cover other advanced SSIS data transformation tools and their usage.


Thanks for reading .. more to come :)

No comments:

Post a Comment