title | description | services | documentationcenter | ms.author | author | manager | ms.reviewer | ms.service | ms.workload | ms.topic | ms.custom | ms.date |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Copy data from SAP Business Warehouse via Open Hub |
Learn how to copy data from SAP Business Warehouse (BW) via Open Hub to supported sink data stores by using a copy activity in an Azure Data Factory pipeline. |
data-factory |
jingwang |
linda33wj |
shwang |
douglasl |
data-factory |
data-services |
conceptual |
seo-lt-2019 |
06/12/2020 |
[!INCLUDEappliesto-adf-asa-md]
This article outlines how to use the Copy Activity in Azure Data Factory to copy data from an SAP Business Warehouse (BW) via Open Hub. It builds on the copy activity overview article that presents a general overview of copy activity.
Tip
To learn ADF's overall support on SAP data integration scenario, see SAP data integration using Azure Data Factory whitepaper with detailed introduction on each SAP connector, comparsion and guidance.
This SAP Business Warehouse via Open Hub connector is supported for the following activities:
You can copy data from SAP Business Warehouse via Open Hub to any supported sink data store. For a list of data stores that are supported as sources/sinks by the copy activity, see the Supported data stores table.
Specifically, this SAP Business Warehouse Open Hub connector supports:
- SAP Business Warehouse version 7.01 or higher (in a recent SAP Support Package Stack released after the year 2015). SAP BW4/HANA is not supported by this connector.
- Copying data via Open Hub Destination local table which underneath can be DSO, InfoCube, MultiProvider, DataSource, etc.
- Copying data using basic authentication.
- Connecting to an SAP application server or SAP message server.
- Retrieving data via RFC.
SAP BW Open Hub Service is an efficient way to extract data from SAP BW. The following diagram shows one of the typical flows customers have in their SAP system, in which case data flows from SAP ECC -> PSA -> DSO -> Cube.
SAP BW Open Hub Destination (OHD) defines the target to which the SAP data is relayed. Any objects supported by SAP Data Transfer Process (DTP) can be used as open hub data sources, for example, DSO, InfoCube, DataSource, etc. Open Hub Destination type - where the relayed data is stored - can be database tables (local or remote) and flat files. This SAP BW Open Hub connector support copying data from OHD local table in BW. In case you are using other types, you can directly connect to the database or file system using other connectors.
ADF SAP BW Open Hub Connector offers two optional properties: excludeLastRequest
and baseRequestId
which can be used to handle delta load from Open Hub.
- excludeLastRequestId: Whether to exclude the records of the last request. Default value is true.
- baseRequestId: The ID of request for delta loading. Once it is set, only data with requestId larger than the value of this property will be retrieved.
Overall, the extraction from SAP InfoProviders to Azure Data Factory (ADF) consists of 2 steps:
-
SAP BW Data Transfer Process (DTP) This step copies the data from an SAP BW InfoProvider to an SAP BW Open Hub table
-
ADF data copy In this step, the Open Hub table is read by the ADF Connector
In the first step, a DTP is executed. Each execution creates a new SAP request ID. The request ID is stored in the Open Hub table and is then used by the ADF connector to identify the delta. The two steps run asynchronously: the DTP is triggered by SAP, and the ADF data copy is triggered through ADF.
By default, ADF is not reading the latest delta from the Open Hub table (option "exclude last request" is true). Hereby, the data in ADF is not 100% up-to-date with the data in the Open Hub table (the last delta is missing). In return, this procedure ensures that no rows get lost caused by the asynchronous extraction. It works fine even when ADF is reading the Open Hub table while the DTP is still writing into the same table.
You typically store the max copied request ID in the last run by ADF in a staging data store (such as Azure Blob in above diagram). Therefore, the same request is not read a second time by ADF in the subsequent run. Meanwhile, note the data is not automatically deleted from the Open Hub table.
For proper delta handling it is not allowed to have request IDs from different DTPs in the same Open Hub table. Therefore, you must not create more than one DTP for each Open Hub Destination (OHD). When needing Full and Delta extraction from the same InfoProvider, you should create two OHDs for the same InfoProvider.
To use this SAP Business Warehouse Open Hub connector, you need to:
-
Set up a Self-hosted Integration Runtime with version 3.13 or above. See Self-hosted Integration Runtime article for details.
-
Download the 64-bit SAP .NET Connector 3.0 from SAP's website, and install it on the Self-hosted IR machine. When installing, in the optional setup steps window, make sure you select the Install Assemblies to GAC option as shown in the following image.
-
SAP user being used in the Data Factory BW connector needs to have following permissions:
- Authorization for RFC and SAP BW.
- Permissions to the “Execute” Activity of Authorization Object “S_SDSAUTH”.
-
Create SAP Open Hub Destination type as Database Table with "Technical Key" option checked. It is also recommended to leave the Deleting Data from Table as unchecked although it is not required. Leverage the DTP (directly execute or integrate into existing process chain) to land data from source object (such as cube) you have chosen to the open hub destination table.
Tip
For a walkthrough of using SAP BW Open Hub connector, see Load data from SAP Business Warehouse (BW) by using Azure Data Factory.
[!INCLUDE data-factory-v2-connector-get-started]
The following sections provide details about properties that are used to define Data Factory entities specific to SAP Business Warehouse Open Hub connector.
The following properties are supported for SAP Business Warehouse Open Hub linked service:
Property | Description | Required |
---|---|---|
type | The type property must be set to: SapOpenHub | Yes |
server | Name of the server on which the SAP BW instance resides. | Yes |
systemNumber | System number of the SAP BW system. Allowed value: two-digit decimal number represented as a string. |
Yes |
messageServer | The host name of the SAP message server. Use to connect to an SAP message server. |
No |
messageServerService | The service name or port number of the message server. Use to connect to an SAP message server. |
No |
systemId | The ID of the SAP system where the table is located. Use to connect to an SAP message server. |
No |
logonGroup | The logon group for the SAP system. Use to connect to an SAP message server. |
No |
clientId | Client ID of the client in the SAP W system. Allowed value: three-digit decimal number represented as a string. |
Yes |
language | Language that the SAP system uses. | No (default value is EN) |
userName | Name of the user who has access to the SAP server. | Yes |
password | Password for the user. Mark this field as a SecureString to store it securely in Data Factory, or reference a secret stored in Azure Key Vault. | Yes |
connectVia | The Integration Runtime to be used to connect to the data store. A Self-hosted Integration Runtime is required as mentioned in Prerequisites. | Yes |
Example:
{
"name": "SapBwOpenHubLinkedService",
"properties": {
"type": "SapOpenHub",
"typeProperties": {
"server": "<server name>",
"systemNumber": "<system number>",
"clientId": "<client id>",
"userName": "<SAP user>",
"password": {
"type": "SecureString",
"value": "<Password for SAP user>"
}
},
"connectVia": {
"referenceName": "<name of Integration Runtime>",
"type": "IntegrationRuntimeReference"
}
}
}
For a full list of sections and properties available for defining datasets, see the Datasets article. This section provides a list of properties supported by the SAP BW Open Hub dataset.
To copy data from and to SAP BW Open Hub, set the type property of the dataset to SapOpenHubTable. The following properties are supported.
Property | Description | Required |
---|---|---|
type | The type property must be set to SapOpenHubTable. | Yes |
openHubDestinationName | The name of the Open Hub Destination to copy data from. | Yes |
If you were setting excludeLastRequest
and baseRequestId
in dataset, it is still supported as-is, while you are suggested to use the new model in activity source going forward.
Example:
{
"name": "SAPBWOpenHubDataset",
"properties": {
"type": "SapOpenHubTable",
"typeProperties": {
"openHubDestinationName": "<open hub destination name>"
},
"schema": [],
"linkedServiceName": {
"referenceName": "<SAP BW Open Hub linked service name>",
"type": "LinkedServiceReference"
}
}
}
For a full list of sections and properties available for defining activities, see the Pipelines article. This section provides a list of properties supported by SAP BW Open Hub source.
To copy data from SAP BW Open Hub, the following properties are supported in the copy activity source section:
Property | Description | Required |
---|---|---|
type | The type property of the copy activity source must be set to SapOpenHubSource. | Yes |
excludeLastRequest | Whether to exclude the records of the last request. | No (default is true) |
baseRequestId | The ID of request for delta loading. Once it is set, only data with requestId larger than the value of this property will be retrieved. | No |
Tip
If your Open Hub table only contains the data generated by single request ID, for example, you always do full load and overwrite the existing data in the table, or you only run the DTP once for test, remember to uncheck the "excludeLastRequest" option in order to copy the data out.
To speed up the data loading, you can set parallelCopies
on the copy activity to load data from SAP BW Open Hub in parallel. For example, if you set parallelCopies
to four, Data Factory concurrently executes four RFC calls, and each RFC call retrieves a portion of data from your SAP BW Open Hub table partitioned by the DTP request ID and package ID. This applies when the number of unique DTP request ID + package ID is bigger than the value of parallelCopies
. When copying data into file-based data store, it's also recommanded to write to a folder as multiple files (only specify folder name), in which case the performance is better than writing to a single file.
Example:
"activities":[
{
"name": "CopyFromSAPBWOpenHub",
"type": "Copy",
"inputs": [
{
"referenceName": "<SAP BW Open Hub input dataset name>",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "<output dataset name>",
"type": "DatasetReference"
}
],
"typeProperties": {
"source": {
"type": "SapOpenHubSource",
"excludeLastRequest": true
},
"sink": {
"type": "<sink type>"
},
"parallelCopies": 4
}
}
]
When copying data from SAP BW Open Hub, the following mappings are used from SAP BW data types to Azure Data Factory interim data types. See Schema and data type mappings to learn about how copy activity maps the source schema and data type to the sink.
SAP ABAP Type | Data factory interim data type |
---|---|
C (String) | String |
I (integer) | Int32 |
F (Float) | Double |
D (Date) | String |
T (Time) | String |
P (BCD Packed, Currency, Decimal, Qty) | Decimal |
N (Numc) | String |
X (Binary and Raw) | String |
To learn details about the properties, check Lookup activity.
Symptoms: If you are running SAP BW on HANA and observe only subset of data is copied over using ADF copy activity (1 million rows), the possible cause is that you enable "SAP HANA Execution" option in your DTP, in which case ADF can only retrieve the first batch of data.
Resolution: Disable "SAP HANA Execution" option in DTP, reprocess the data, then try executing the copy activity again.
For a list of data stores supported as sources and sinks by the copy activity in Azure Data Factory, see supported data stores.