diff --git a/inlong-sdk/dirty-data-sdk/README.md b/inlong-sdk/dirty-data-sdk/README.md new file mode 100644 index 0000000000..e5800db856 --- /dev/null +++ b/inlong-sdk/dirty-data-sdk/README.md @@ -0,0 +1,47 @@ +## Overview + +This SDK is used to collect dirty data and store it in a designated storage location. + +## Features + +### Independent SDK + +Independent SDK, not dependent on platform specific libraries (such as Flink), can be used by Agent, +Data Proxy, Sort modules. + +### Scalable multiple data storage options + +Dirty data can be stored in various different storage locations (currently only supports sending to +DataProxy). + +## Usage + +### Create DirtyDataCollector object + +```java + Map configMap = new ConcurrentHashMap<>(); + configMap.put(DIRTY_COLLECT_ENABLE, "true"); + configMap.put(DIRTY_SIDE_OUTPUT_IGNORE_ERRORS, "true"); + configMap.put(DIRTY_SIDE_OUTPUT_CONNECTOR, "inlong"); + configMap.put(DIRTY_SIDE_OUTPUT_LABELS, "key1=value1&key2=value2"); + configMap.put(DIRTY_SIDE_OUTPUT_LOG_TAG, "DirtyData"); + Configure config = new Configure(configMap); + + DirtyDataCollector collecter = new DirtyDataCollector(); + collector.open(config); +``` + +### Collect dirty data + +```java + // In fact, the dirty data we encounter is often parsed incorrectly, + // so we use byte [] as the format for dirty data. + byte[] dirtyData = "xxxxxxxxxyyyyyyyyyyyyyy".getBytes(StandardCharsets.UTF_8); + // Here, incorrect types can be marked, such as missing fields, type errors, or unknown errors, etc. + String dirtyType = "Undefined"; + // Details of errors can be passed here. + Throwable error = new Throwable(); + collector.invoke(dirtyData, dirtyType, error); +``` + + | \ No newline at end of file