Synchronous and Asynchronous Processing
The synchronous nature of a transformation refers to its row-processing behavior and whether it must first receive all rows from the input before it is able to process and begin returning rows in its output. A truly synchronous transformation processes a row as it flows through its input, performs the transformation, and immediately outputs the row. Synchronous transformations manipulate data that already exists in a buffer faster than a transformation that must copy records into a new buffer.
An asynchronous transformation must wait for all rows or a group of them before it can perform the transformation. This requires management of data in buffer space and blocks the flow of rows to subsequent components in the data flow pipeline. Some transformations must process only groups of rows (for example, all rows matching a value or having a sorted column value) and might exhibit partially synchronous behavior by releasing a group of processed rows for output after transforming the row group.
Sunday, July 13, 2008
Optimizing SSIS Packages - Step 3
Execution Trees:
Interrelated components are clustered in a package to form execution trees. Upon careful examination, you will find that some of these source adapters, transformations, and destination adapters might have no dependencies at all. Other components might have fewer linear dependencies than originally thought. Generally, each source adapter will be assigned a separate thread of execution. Destination adapters and asynchronous transformations will also be assigned separate threads. Multithreaded operations have the advantage of using the computer's processor(s) more efficiently with instructions executed in parallel.
Separate memory buffers enable threads to operate on data independently and in parallel. Use the multicast transform to generate multiple buffers. This is one of the easiest and most efficient ways to allocate additional memory buffers and to cast multiple execution threads to use the server's physical resources efficiently to improve performance.
Interrelated components are clustered in a package to form execution trees. Upon careful examination, you will find that some of these source adapters, transformations, and destination adapters might have no dependencies at all. Other components might have fewer linear dependencies than originally thought. Generally, each source adapter will be assigned a separate thread of execution. Destination adapters and asynchronous transformations will also be assigned separate threads. Multithreaded operations have the advantage of using the computer's processor(s) more efficiently with instructions executed in parallel.
Separate memory buffers enable threads to operate on data independently and in parallel. Use the multicast transform to generate multiple buffers. This is one of the easiest and most efficient ways to allocate additional memory buffers and to cast multiple execution threads to use the server's physical resources efficiently to improve performance.
Optimizing SSIS Packages - Step 2
Memory Buffer Architecture:
A fundamental understanding of the memory management architecture in SSIS will help you make intelligent choices about how to assist SSIS to move data more efficiently through the pipeline. Some transformations require data to be copied or moved from one buffer to another while others do not. Because some transformations depend on the output of others, this can create blocks and bottlenecks in the process. Some operations require that a preceding transformation completely finish its work before it can begin to process any rows. This is said to be a blocking transformation. Other transformations must begin their duty, but rows can begin to flow to another transformation before the entire process has completed. In this case, the operation of the first transformation only partially blocks the second. Finally, there can be data flows and transformations in the package that are not at all dependent on each other and do not block other transformations in the data flow.
Buffer Usage:
Behind the scenes, the Data Pipeline engine uses a buffer-oriented architecture to load and manipulate data sets in memory efficiently. The benefit of this in-memory processing is that you do not need to copy and stage data physically at each step of the data integration. Rather, the Data Pipeline engine manipulates data as it is transferred from source to destination.
A fundamental understanding of the memory management architecture in SSIS will help you make intelligent choices about how to assist SSIS to move data more efficiently through the pipeline. Some transformations require data to be copied or moved from one buffer to another while others do not. Because some transformations depend on the output of others, this can create blocks and bottlenecks in the process. Some operations require that a preceding transformation completely finish its work before it can begin to process any rows. This is said to be a blocking transformation. Other transformations must begin their duty, but rows can begin to flow to another transformation before the entire process has completed. In this case, the operation of the first transformation only partially blocks the second. Finally, there can be data flows and transformations in the package that are not at all dependent on each other and do not block other transformations in the data flow.
Buffer Usage:
Behind the scenes, the Data Pipeline engine uses a buffer-oriented architecture to load and manipulate data sets in memory efficiently. The benefit of this in-memory processing is that you do not need to copy and stage data physically at each step of the data integration. Rather, the Data Pipeline engine manipulates data as it is transferred from source to destination.
Optimizing SSIS Packages - Step1
SSIS Engine Overview:
As you peel back the layers of the Integration Services processing architecture, there are two separate subcomponents that can have a significant impact on performance and package efficiency. These are the SSIS Runtime engine and Data Pipeline engine. It is important to familiarize yourself with the purpose and behavior of these engines to make performance-tuning decisions.
Runtime Engine:
The Runtime engine is a highly parallel workflow engine that coordinates the execution of tasks or units of work within SSIS and manages the engine threads that carry out those tasks. For the most part, the performance of the Runtime engine is most heavily influenced by conditions external to SSIS such as the number of available threads, the network bandwidth, and the interaction with external systems such as database servers, File Transfer Protocol (FTP) servers, or e-mail servers. When SSIS runs an Execute SQL task, for example, it sends a call to the target database and then waits for a response from the database server before it continues. In this scenario, the performance of the Execute SQL task is more dependent on the performance of the query execution than on the SSIS Runtime engine.
As you peel back the layers of the Integration Services processing architecture, there are two separate subcomponents that can have a significant impact on performance and package efficiency. These are the SSIS Runtime engine and Data Pipeline engine. It is important to familiarize yourself with the purpose and behavior of these engines to make performance-tuning decisions.
Runtime Engine:
The Runtime engine is a highly parallel workflow engine that coordinates the execution of tasks or units of work within SSIS and manages the engine threads that carry out those tasks. For the most part, the performance of the Runtime engine is most heavily influenced by conditions external to SSIS such as the number of available threads, the network bandwidth, and the interaction with external systems such as database servers, File Transfer Protocol (FTP) servers, or e-mail servers. When SSIS runs an Execute SQL task, for example, it sends a call to the target database and then waits for a response from the database server before it continues. In this scenario, the performance of the Execute SQL task is more dependent on the performance of the query execution than on the SSIS Runtime engine.
Optimizing SSIS Packages
These are the main steps to tune the SSIS Packages:
1.Generally understand the SSIS engine components and processes.
2.Review the memory buffer architecture.
3.Monitor package execution trees and their impact on package performance.
4.Manage buffer properties.
5.Manage task and component parallelism.
6.Understanding performance management essentials.
7.Use iterative optimization techniques to simplify and tune a package design.
Each step will be described breifly in the next coming posts,please check.
1.Generally understand the SSIS engine components and processes.
2.Review the memory buffer architecture.
3.Monitor package execution trees and their impact on package performance.
4.Manage buffer properties.
5.Manage task and component parallelism.
6.Understanding performance management essentials.
7.Use iterative optimization techniques to simplify and tune a package design.
Each step will be described breifly in the next coming posts,please check.
Steps to Manage SSIS Packages
1.Create an OnPreExecute event handler:
In BIDS, on the Event Handlers tab, in the Executable drop-down list, select NewProducts, and then click OK. In the Event Handler drop-down list, select OnPreExecute. In the Event Handlers design environment, click the blue text that reads "Click here to create an OnPreExecute event handler for executable NewProducts."
2.Add a task to an event handler:
In BIDS, while on the Event Handlers tab, click Toolbox, and then drag the event handler onto the design environment.
3.Configure a task:
In BIDS, double-click the task to open its editor, and then make the configuration changes.
4.Map SSIS variables to SQL statement parameters:
In BIDS, on the Control Flow tab, double-click the SQL task to open its editor. In the Execute SQL Task Editor, click Parameter Mapping, click Add, and then configure the variable.
5.Create a file system task and a connection manager:
In BIDS, while on the Event Handlers tab, click Toolbox, and then drag File System Task onto the designer. Double-click the task and, in the File System Task Editor, type the name, destination connection, and source connection.
6.Set connection manager settings:
In BIDS, in the Connection Managers pane, right-click the connection manager, and then click Properties. In the Properties list, expand Expressions, and then click the ellipses button. In the Property Expressions Editor, change the settings.
7.Prevent events from escalating to a container:
In BIDS, while on the Event Handlers tab, on the SSIS menu, click Variables. In the Variables list, in the Value column drop-down list, change the value.
8.Change error count properties:
In BIDS, on the Control Flow tab, right-click on the design surface, and then click Properties. In the Properties list, locate MaximumErrorCount, and then change the value.
In BIDS, on the Event Handlers tab, in the Executable drop-down list, select NewProducts, and then click OK. In the Event Handler drop-down list, select OnPreExecute. In the Event Handlers design environment, click the blue text that reads "Click here to create an OnPreExecute event handler for executable NewProducts."
2.Add a task to an event handler:
In BIDS, while on the Event Handlers tab, click Toolbox, and then drag the event handler onto the design environment.
3.Configure a task:
In BIDS, double-click the task to open its editor, and then make the configuration changes.
4.Map SSIS variables to SQL statement parameters:
In BIDS, on the Control Flow tab, double-click the SQL task to open its editor. In the Execute SQL Task Editor, click Parameter Mapping, click Add, and then configure the variable.
5.Create a file system task and a connection manager:
In BIDS, while on the Event Handlers tab, click Toolbox, and then drag File System Task onto the designer. Double-click the task and, in the File System Task Editor, type the name, destination connection, and source connection.
6.Set connection manager settings:
In BIDS, in the Connection Managers pane, right-click the connection manager, and then click Properties. In the Properties list, expand Expressions, and then click the ellipses button. In the Property Expressions Editor, change the settings.
7.Prevent events from escalating to a container:
In BIDS, while on the Event Handlers tab, on the SSIS menu, click Variables. In the Variables list, in the Value column drop-down list, change the value.
8.Change error count properties:
In BIDS, on the Control Flow tab, right-click on the design surface, and then click Properties. In the Properties list, locate MaximumErrorCount, and then change the value.
Maintaining Data Consistency with Transactions
Transactions are used to maintain data integrity by ensuring that a database remains in a consistent state even if a package fails. They accomplish this by binding the database actions that tasks perform into units, which either succeed or fail together. Because all the database actions that are part of a transaction are either committed or rolled back together, you can ensure that data remains in a consistent state. For example, if you have a package with multiple data flow tasks, each updating and inserting data into a different database table, and the package fails, all of the database update actions performed by the data flow tasks are rolled back together, thereby ensuring a consistent state. If all the data flow tasks succeed, the changes performed are committed together.
Configuring Transactions
Transactions can be enabled for all SSIS container types, including tasks, containers, and packages. You configure transactions by using the container's TransactionOption property, which is set in the SSIS design environment.
Configuring Transactions
Transactions can be enabled for all SSIS container types, including tasks, containers, and packages. You configure transactions by using the container's TransactionOption property, which is set in the SSIS design environment.
Subscribe to:
Posts (Atom)