Searchlight

Searchlight is a powerful and efficient Text Processing API for PDF's developed with Python. It processes Documents to highlight specified search words and includes various features like word search, unique words count, highlighting search word and integration with MongoDB and AWS S3 bucket.

Features

Word Search: Search for specific words in a PDF.
Unique Words Count: Count the number of unique words in a PDF.
Highlighting: Highlights the Search Word in the PDF.
MongoDB Integration: Store data and results in MongoDB.
AWS S3 Integration: Upload and retrieve PDFs from an AWS S3 bucket.

Installation

Clone the repository

git clone https://github.com/tratum/Searchlight.git

Navigate to the project directory
```
cd Searchlight
```

Create and activate a virtual environment

python -m venv .venv
source .venv/bin/activate  # On Windows, use `.venv\Scripts\activate`

Install the required dependencies
```
pip install -r requirements.txt
```

Configuration

Create a .env file in the root directory
```
cd Searchlight
touch .env
```

Navigate to the .env file and Configure your MongoDB and AWS S3 Settings

ATLAS_URI= your_mongodb_uri
DB_NAME= your_db_name
COLLECTION_NAME= your_collection_name
RAW_COLLECTION_NAME= your_collection_name
USER_COLLECTION_NAME=tbl_users
AWS_ACCESS_KEY ='your_aws_access_key'
AWS_SECRET_KEY='your_aws_secret_access_key'
BUCKET_NAME='your_s3_bucket_name'

Usage

Start the API Server
```
python -m uvicorn main:app --reload
```
Use the following endpoint to upload a PDF and perform Text Processing
```
http://127.0.0.1:8000/searchlight/upload
```
Mandatory Parameters are:
- keyword: The word to search and highlight in the PDF.
- pdf : The PDF file to process.

Example

Here is an example of how to use the API with cURL:

  curl -X POST "http://127.0.0.1:8000/searchlight/upload" -F "keyword=example" -F "pdf=@/path/to/your/document.pdf"

Contribution

Contributions are welcome! Please open an issue or submit a pull request for any changes or improvements.

License

This project is licensed under the MIT License. See the LICENSE file for details

Acknowledgements

This project is built with FastAPI

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
main.py		main.py
models.py		models.py
requirements.txt		requirements.txt
routes.py		routes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Searchlight

Features

Installation

Configuration

Usage

Example

Contribution

License

Acknowledgements

About

Releases

Packages

Contributors 2

Languages

License

tratum/Searchlight

Folders and files

Latest commit

History

Repository files navigation

Searchlight

Features

Installation

Configuration

Usage

Example

Contribution

License

Acknowledgements

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages