PDF Question-Answering System with Vector Search

A production-ready system that enables intelligent question-answering capabilities for PDF documents using OpenAI and vector search technology. Built with security and multi-tenancy in mind.

📖 Read the detailed guide on Medium

Features

🚀 Intelligent PDF document processing and analysis
🔍 Semantic search using vector embeddings
💬 AI-powered question answering using OpenAI
🔒 Multi-tenant architecture with team isolation
🎯 Production-ready with Docker support

Quick Start

Prerequisites

Python 3.9+
Docker and Docker Compose
OpenAI API access (Azure or regular)
Qdrant vector database

Installation

Clone the repository:

git clone https://github.com/doganarif/pdf-gpt-vectordb-qa.git
cd pdf-gpt-vectordb-qa

Create and configure your environment file:

cp .env.example .env
# Edit .env with your credentials

Start the services:

docker compose up --build

Usage

Upload a PDF document:

curl -X POST http://localhost:8000/upload \
  -F "file=@/path/to/document.pdf" \
  -F "team_id=your_team_id" \
  -F "document_id=doc123"

Ask questions about the document:

curl -X POST http://localhost:8000/answer \
  -H "Content-Type: application/json" \
  -d '{
    "team_id": "your_team_id",
    "question": "What is the main topic of the document?"
  }'

Architecture

The system consists of several key components:

PDF Processing: Extracts and chunks text from PDFs
Vector Search: Enables semantic document search using Qdrant
Answer Generation: Utilizes OpenAI for intelligent responses
API Layer: Provides RESTful endpoints with security measures

Configuration

Key environment variables:

# OpenAI/Azure Configuration
AZURE_OPENAI_ENDPOINT=your_endpoint
AZURE_OPENAI_API_KEY=your_key
AZURE_DEPLOYMENT_NAME=your_deployment

# Qdrant Configuration
QDRANT_HOST=qdrant
QDRANT_PORT=6333

# Application Configuration
DEBUG=false
ENABLE_HTTPS=true
MAX_UPLOAD_SIZE_MB=16

API Endpoints

Endpoint	Method	Description
`/upload`	POST	Upload and process a PDF document
`/answer`	POST	Get answers to questions about documents
`/documents`	GET	List available documents for a team
`/health`	GET	Check system health status

Security Features

Team-based isolation for multi-tenant setups
Rate limiting per team
Secure file handling
Authorization middleware
Input validation

Development

Create a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Deployment

The system is containerized and can be deployed using Docker Compose:

# Production deployment
docker compose -f docker-compose.prod.yml up -d

For production deployments, consider:

Configuring proper authentication
Implementing backup strategies
Monitoring and logging

Acknowledgments

OpenAI for their powerful language models
Qdrant team for the vector database
All contributors and supporters

Built with ❤️ by doganarif

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Question-Answering System with Vector Search

Features

Quick Start

Prerequisites

Installation

Usage

Architecture

Configuration

API Endpoints

Security Features

Development

Deployment

Acknowledgments

About

Releases

Packages

Languages

doganarif/pdf-gpt-vectordb-qa

Folders and files

Latest commit

History

Repository files navigation

PDF Question-Answering System with Vector Search

Features

Quick Start

Prerequisites

Installation

Usage

Architecture

Configuration

API Endpoints

Security Features

Development

Deployment

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages