From 81e93af201f84079ea45afcff6def6431489d4b5 Mon Sep 17 00:00:00 2001 From: wenweihuang Date: Thu, 17 Oct 2024 15:33:29 +0800 Subject: [PATCH] [INLONG-11352][SDK] Add readme --- inlong-sdk/dirty-data-sdk/README.md | 47 +++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 inlong-sdk/dirty-data-sdk/README.md diff --git a/inlong-sdk/dirty-data-sdk/README.md b/inlong-sdk/dirty-data-sdk/README.md new file mode 100644 index 0000000000..e5800db856 --- /dev/null +++ b/inlong-sdk/dirty-data-sdk/README.md @@ -0,0 +1,47 @@ +## Overview + +This SDK is used to collect dirty data and store it in a designated storage location. + +## Features + +### Independent SDK + +Independent SDK, not dependent on platform specific libraries (such as Flink), can be used by Agent, +Data Proxy, Sort modules. + +### Scalable multiple data storage options + +Dirty data can be stored in various different storage locations (currently only supports sending to +DataProxy). + +## Usage + +### Create DirtyDataCollector object + +```java + Map configMap = new ConcurrentHashMap<>(); + configMap.put(DIRTY_COLLECT_ENABLE, "true"); + configMap.put(DIRTY_SIDE_OUTPUT_IGNORE_ERRORS, "true"); + configMap.put(DIRTY_SIDE_OUTPUT_CONNECTOR, "inlong"); + configMap.put(DIRTY_SIDE_OUTPUT_LABELS, "key1=value1&key2=value2"); + configMap.put(DIRTY_SIDE_OUTPUT_LOG_TAG, "DirtyData"); + Configure config = new Configure(configMap); + + DirtyDataCollector collecter = new DirtyDataCollector(); + collector.open(config); +``` + +### Collect dirty data + +```java + // In fact, the dirty data we encounter is often parsed incorrectly, + // so we use byte [] as the format for dirty data. + byte[] dirtyData = "xxxxxxxxxyyyyyyyyyyyyyy".getBytes(StandardCharsets.UTF_8); + // Here, incorrect types can be marked, such as missing fields, type errors, or unknown errors, etc. + String dirtyType = "Undefined"; + // Details of errors can be passed here. + Throwable error = new Throwable(); + collector.invoke(dirtyData, dirtyType, error); +``` + + | \ No newline at end of file