Skip to content

Latest commit

 

History

History

LogAnalytics

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Sample app uses Apache Pinot for Log Anlaytics

Introduction

This lab demonstrates how Apache Pinot can be leveraged for real-time log analytics. Logs generated by applications, systems, and services contain valuable insights that can help monitor performance, identify anomalies, and troubleshoot issues. By using Apache Pinot, a distributed real-time analytics engine, you can query and analyze large volumes of logs in milliseconds, gaining actionable insights quickly. This lab provides a hands-on introduction to using Apache Pinot for log analytics, guiding you through data ingestion, schema creation, and running queries to visualize log data.

Objective

  • Introduce participants to the basics of using Apache Pinot for real-time log analytics.
  • Demonstrate how to ingest and query log data in Apache Pinot.
  • Show how to create and define schemas for effective log data organization.
  • Illustrate the steps for running real-time analytical queries on log data.
  • Provide insights into using real-time analytics to monitor and troubleshoot applications.

Desired Outcomes

By the end of this lab, participants will be able to:

  • Set up and configure Apache Pinot for log analytics.
  • Ingest log data from a source into Apache Pinot in real-time.
  • Define schemas tailored to log data analysis for optimized querying.
  • Run queries to extract meaningful insights from logs with minimal latency.
  • Apply real-time analytics principles to monitor systems, detect anomalies, and improve overall system performance.

Installation

Run the environment

docker-compose up -d

Run the following command for creating topic in the Kafka Container:

 docker exec workshop-kafka ./opt/bitnami/kafka//bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 1 --partitions 1 --topic logs

Run command to create tables in the Pinot Controller container

./bin/pinot-admin.sh AddTable -schemaFile /scripts/schema.json -tableConfigFile /scripts/table.json -controllerHost pinot-controller -exec

Run command to update Superset:

This step sets up Superset Admin account. It needs to be run once per container.

docker ps # to get the container id
docker exec -it <containerid> superset fab create-admin --username admin --firstname Superset --lastname Admin --email [email protected] --password admin
docker exec -it <containerid> superset db upgrade
docker exec -it <containerid> superset init

Run Python app:

pip install kafka-python --upgrade
pip install --upgrade pip
python ./scripts/generate_and_post_log.py

Details

Architecture

In this demo app, we simulate an observability usecase. We use a simple log generator that emulated HTTPS response logs, using a python script. This log is then posted to a Kafka Topic. Next, Apache Pinot ingests the said message, and allows for realtime querying the data. We use Superset to create the dashboards that diaplay realtime counts of errors, ewrror types etc.

Log Analytics Architecture "Log Analytics Architecture"

Samples log

Here's an example of the simplified log format we will be using:

{
    'ip': '192.168.198.92',
    'timestamp': '2024-09-08 10:45:23',
    'method': 'GET',
    'uri':'/',
    'protocol':'HTTPS',
    'version':'1.2'.
    'response-code': 200,
    'time': 6394,
    'domain': 'www.test.com'
}

Here's a list of methods we will be using:

  • CONNECT
  • DELETE
  • GET
  • HEAD
  • OPTIONS
  • PATCH
  • POST
  • PUT
  • TRACE

Here's a list of Reposnses we will be using:

  • 100
  • 200
  • 300
  • 400
  • 500

Note that we are keeping it brief for demo pusposes. I know that there are more response codes for an HTTp request.

The python script creates a message per second, and ends it to the Kafka topic logs.

Teardown

To tear down run the following docker command:

docker-compose down