This lab demonstrates how Apache Pinot can be leveraged for real-time log analytics. Logs generated by applications, systems, and services contain valuable insights that can help monitor performance, identify anomalies, and troubleshoot issues. By using Apache Pinot, a distributed real-time analytics engine, you can query and analyze large volumes of logs in milliseconds, gaining actionable insights quickly. This lab provides a hands-on introduction to using Apache Pinot for log analytics, guiding you through data ingestion, schema creation, and running queries to visualize log data.
- Introduce participants to the basics of using Apache Pinot for real-time log analytics.
- Demonstrate how to ingest and query log data in Apache Pinot.
- Show how to create and define schemas for effective log data organization.
- Illustrate the steps for running real-time analytical queries on log data.
- Provide insights into using real-time analytics to monitor and troubleshoot applications.
By the end of this lab, participants will be able to:
- Set up and configure Apache Pinot for log analytics.
- Ingest log data from a source into Apache Pinot in real-time.
- Define schemas tailored to log data analysis for optimized querying.
- Run queries to extract meaningful insights from logs with minimal latency.
- Apply real-time analytics principles to monitor systems, detect anomalies, and improve overall system performance.
Run the environment
docker-compose up -d
Run the following command for creating topic in the Kafka Container:
docker exec workshop-kafka ./opt/bitnami/kafka//bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic logs
Run command to create tables in the Pinot Controller container
./bin/pinot-admin.sh AddTable -schemaFile /scripts/schema.json -tableConfigFile /scripts/table.json -controllerHost pinot-controller -exec
Run command to update Superset:
This step sets up Superset Admin account. It needs to be run once per container.
docker ps # to get the container id
docker exec -it <containerid> superset fab create-admin --username admin --firstname Superset --lastname Admin --email [email protected] --password admin
docker exec -it <containerid> superset db upgrade
docker exec -it <containerid> superset init
Run Python app:
pip install kafka-python --upgrade
pip install --upgrade pip
python ./scripts/generate_and_post_log.py
In this demo app, we simulate an observability usecase. We use a simple log generator that emulated HTTPS response logs, using a python script. This log is then posted to a Kafka Topic. Next, Apache Pinot ingests the said message, and allows for realtime querying the data. We use Superset to create the dashboards that diaplay realtime counts of errors, ewrror types etc.
Here's an example of the simplified log format we will be using:
{
'ip': '192.168.198.92',
'timestamp': '2024-09-08 10:45:23',
'method': 'GET',
'uri':'/',
'protocol':'HTTPS',
'version':'1.2'.
'response-code': 200,
'time': 6394,
'domain': 'www.test.com'
}
Here's a list of methods we will be using:
- CONNECT
- DELETE
- GET
- HEAD
- OPTIONS
- PATCH
- POST
- PUT
- TRACE
Here's a list of Reposnses we will be using:
- 100
- 200
- 300
- 400
- 500
Note that we are keeping it brief for demo pusposes. I know that there are more response codes for an HTTp request.
The python script creates a message per second, and ends it to the Kafka topic logs.
To tear down run the following docker command:
docker-compose down