Skip to content

Commit

Permalink
docs: add architecture diagram
Browse files Browse the repository at this point in the history
  • Loading branch information
tutunannan committed Dec 10, 2024
1 parent ddae2b2 commit 49904d3
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 28 deletions.
56 changes: 28 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,18 +13,22 @@
</div>

Kubedoop Data Platform is a modular open-source data platform that provides Kubernetes-native deployment
and management of popular open source data apps like Apache Kafka, Apache Doris, Trino and Apache Spark. All data apps work together seamlessly, and can be added or removed in no time.
and management of popular open source data apps like Apache Kafka, Apache Doris, Trino and Apache Spark.
All data apps work together seamlessly, and can be added or removed in no time.
Based on Kubernetes, it allows you to deploy, scale and manage data infrastructure in any environment running everywhere – on-prem or in the cloud.

You can declaratively build these environments, and we don’t stop at the tool level as we also provide ways for the users to interact with the platform in the "as Code"-approach.

All this makes Kubedoop Data Platform an ideal tool for scenarios including modern Data Warehouses, Data Lakehouses, Event Streaming, Machine Learning or Data Meshes.
Use it to create unique and enterprise-class data architectures.

## Features

* **Roles and role groups**: Different processes have different tasks that they need to fulfill, which in turn have different configuration settings that only apply to that task.
For example coordination, storage, logging and processing tasks require different amounts of threads, memory and storage capacity. These different settings can be put in the per role configuration spec.
The roles or processes are often replicated multiple times. These replicas can further be subdivided into role groups. Roles and role groups are defined in the resource manifest.
* **Service discovery ConfigMap**: This ConfigMap has the same name as the product instance and contains information about how to connect to the instance.
The ConfigMap is used by other Operators to connect products together and can also be used by you, the user, to connect external software to Kubedoop-operated software.
For example coordination, storage, logging and processing tasks require different amounts of threads, memory and storage capacity.
The running product instance is made up of multiple pieces of software called roles and can be further subdivided into role groups.
* **Service discovery ConfigMap**: This ConfigMap has the same name as the product instance and contains information about how to connect to the instance.
It is used by other Operators to connect products together and can also be used by you, the user, to connect external software to Kubedoop-operated software.
* **Product image selection**: Uses the publicly available Images from the Image registry hosted by Kubedoop.
Custom docker registries can be used to fetch the image from a local image registry rather than from the internet.
Custom images allows to provide self-hosted or user-created images (e.g. user extended Kubedoop images)
Expand All @@ -35,18 +39,14 @@ The authentication mechanism needs to be configured only in the AuthenticationCl
* **Logging**: Logging is important for observability of the platform. Kubedoop provides human-readable plaintext logs for each running container,
as well as aggregated and persisted logs with identical structure across the whole platform. Log levels can be set for individual modules and configuration
is identical across all products, but custom logging configuration files can be supplied as well.
* **S3 resources**: Uses S3Connection and S3Bucket objects to configure access to S3 storage. An S3Connection object contains information such as the host name of the S3 server, it’s port,
TLS parameters and access credentials. An S3Bucket contains the name of the bucket and a reference to an S3Connection,
the connection to the server where the bucket is located. An S3Connection can be referenced by multiple buckets.
* **TLS server verification**: A TLS section is part of Kubedoop CRDs and describes how to connect to a TLS enabled system like LDAP or S3.

## Architecture

All this makes Kubedoop Data Platform an ideal tool for scenarios including modern Data Warehouses, Data Lakehouses, Event Streaming, Machine Learning or Data Meshes.
Use it to create unique and enterprise-class data architectures.
Kubedoop Data Platform uses a loosely-coupled architecture that was designed with openness and flexibility in mind.
Its core is a collection of Kubernetes Operators and custom resources which are designed to work together.

## Architecture
![Architecture](docs/assets/kubedoop-architecture.drawio.png)

TODO

## Getting Started

Expand All @@ -58,19 +58,19 @@ TODO

| Product | Version |
|-------------------------------|-------------------------------------------------------|
| [Apache Airflow](https://github.com/zncdatadev/airflow-operator) | 2.9.3, 2.10.2 |
| [Apache DolphinScheduler](https://github.com/zncdatadev/dolphinscheduler-operator) | 3.2.2 |
| [Apache Doris](https://github.com/zncdatadev/doris-operator) | 2.1.7 |
| [Apache Hadoop HDFS](https://github.com/zncdatadev/hdfs-operator) | 3.3.4, 3.3.6, 3.4.0 |
| [Apache HBase](https://github.com/zncdatadev/hbase-operator) | 2.4.18, 2.6.0 |
| [Apache Hive](https://github.com/zncdatadev/hive-operator) | 3.1.3, 4.0.0 |
| [Apache Kafka](https://github.com/zncdatadev/kafka-operator) | 3.7.1 |
| [Apache Kyuubi](https://github.com/zncdatadev/kyuubi-operator) | 1.9.3 |
| [Apache Nifi](https://github.com/zncdatadev/nifi-operator) | 1.27.0, 2.0.0 |
| [Apache Spark](https://github.com/zncdatadev/spark-k8s-operator) | 3.5.1 |
| [Apache Superset](https://github.com/zncdatadev/superset-operator) | 3.1.3, 4.0.2 |
| [Trino](https://github.com/zncdatadev/trino-operator) | 451 |
| [Apache Zookeeper](https://github.com/zncdatadev/zookeeper-operator) | 3.8.4, 3.9.2 |
| [Apache Airflow](https://github.com/zncdatadev/airflow-operator) | 2.9.3 (LTS), 2.10.2 (experimental) |
| [Apache DolphinScheduler](https://github.com/zncdatadev/dolphinscheduler-operator) | 3.2.2 (LTS) |
| [Apache Doris](https://github.com/zncdatadev/doris-operator) | 2.1.7 (LTS) |
| [Apache Hadoop HDFS](https://github.com/zncdatadev/hdfs-operator) | 3.3.4, 3.3.6, 3.4.0 (LTS) |
| [Apache HBase](https://github.com/zncdatadev/hbase-operator) | 2.4.18 (LTS), 2.6.0 (Experimental) |
| [Apache Hive](https://github.com/zncdatadev/hive-operator) | 3.1.3 (LTS), 4.0.0 (experimental) |
| [Apache Kafka](https://github.com/zncdatadev/kafka-operator) | 3.7.1 (LTS), 3.8.0 |
| [Apache Kyuubi](https://github.com/zncdatadev/kyuubi-operator) | 1.9.3 (LTS) |
| [Apache Nifi](https://github.com/zncdatadev/nifi-operator) | 1.27.0 (LTS), 2.0.0 (experimental) |
| [Apache Spark](https://github.com/zncdatadev/spark-k8s-operator) | 3.5.1 , 3.5.2 (LTS) |
| [Apache Superset](https://github.com/zncdatadev/superset-operator) | 3.1.3, 4.0.2 (LTS) |
| [Trino](https://github.com/zncdatadev/trino-operator) | 451 (LTS), 455 |
| [Apache Zookeeper](https://github.com/zncdatadev/zookeeper-operator) | 3.8.4, 3.9.2 (LTS) |

## Supported Platforms

Expand All @@ -79,8 +79,8 @@ We develop and test our operators on the following cloud platforms:
* Kubernetes 1.26+
* K3s
* MicroK8s
* Kind
* MiniKube
* Alibaba ACK
* Tencent TKE


## Contributing
Expand Down
Binary file added docs/assets/kubedoop-architecture.drawio.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 49904d3

Please sign in to comment.