🔍 Ambar: Document Search System

Ambar is an open-source document search and management system with automated crawling, OCR, tagging and instant full-text search.

Ambar defines the new way to manage your documents out of the box:

Ingest documents from any source
Find documents and images instantly with Google-like search
Manage your documents with tags, hide irrelevant search results
Auto tagging & named entitites recognition

Features

Search

Tutorial: Mastering Ambar Search Queries

Fuzzy Search (John~3)
Phrase Search ("John Smith")
Search By Author (author:John)
Search By File Path (filename:*.txt)
Search By Date (when: yesterday, today, lastweek, etc)
Search By Size (size>1M)
Search By Tags (tags:ocr)
Search As You Type
Supported language analyzers: English ambar_en, Russian ambar_ru, German ambar_de, Italian ambar_it, Polish ambar_pl, Chinese ambar_cn, CJK ambar_cjk

Crawling

SMB Crawling
FTP/FTPS Crawling
Mail Crawling
Dropbox Crawling
Scheduled Crawling (Cron schedule syntax)

Content Extraction

Extract content from large files (>30M)
ZIP archives
MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
OCR over images
Email messages with attachments
Adobe PDF (with OCR)
OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
OpenOffice documents
RTF, Plaintext
HTML / XHTML
Multithread processing (Only EE)

General

Files Tagging (Auto tagging as well)
Named Entitites
Hiding Irrelevant Search Results
Files Preview
Web UI
REST API
Multiple user accounts (Only EE)

Editions

There are two editions available: Community and Enterprise. Enterprise Edition is a full featured document search and management system that can handle terabytes of data.

Community Edition is a scaled down, single user version of Enterprise Edition with limited number of pipelines and crawlers, though preserving the full functionality. You are welcome to use Ambar Community Edition for both personal and commercial purposes, at no cost.

Installation

Installation is straightforward. Turn on your Linux machine and follow our step-by-step installation guide.

Docker images can be found on Docker Hub

How it Works

Under the Hood
REST API Documentation
Management Script
The Source Code is freely available under Fair Source License 1. (Frontend, Crawler, ElasticSearch, Rabbit, Mongo, Installer)

FAQ

Is it open-source?

Yes, almost every Ambar's module is published on GitHub under Fair Source License 1

Is it free?

Yes, Community Edition is forever free. We will NOT charge a penny from you to use it.

Does it perform OCR?

Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr query

Which languages are supported for OCR?

Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld. If you miss your language, please create a new issue and we'll add it ASAP.

Does it support tagging?

Yes!

What about searching in PDF?

Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.

I miss XXX language analyzer. Can you add it?

Yes, please create an issue on GitHub.

Are you going to add UI localizations?

As for now there are two options: Russian and English, change uiLang in your config.json. If you want to add your own localization, please contact us on [email protected].

What is the maximum file size it can handle?

It's limited by amount of RAM on your machine, typically 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.

What is the difference between Ambar CE and Ambar EE?

Basically Ambar CE is a downscaled Ambar EE. Check comparison on our landing page.

Can anyone else see my documents?

Nope, check our Privacy Policy.

I have a problem what should I do?

Submit an issue

Change Log

Contributors

Privacy Policy

License

Fair Source 1 License v0.9

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Crawler		Crawler
ElasticSearch		ElasticSearch
FrontEnd		FrontEnd
Install		Install
MongoDB		MongoDB
Proxy		Proxy
Rabbit		Rabbit
Redis		Redis
.DS_Store		.DS_Store
.gitmodules		.gitmodules
API_DOC.md		API_DOC.md
CHANGELOG.md		CHANGELOG.md
License.txt		License.txt
Privacy Policy.md		Privacy Policy.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Ambar: Document Search System

Features

Search

Crawling

Content Extraction

General

Editions

Installation

How it Works

FAQ

Is it open-source?

Is it free?

Does it perform OCR?

Which languages are supported for OCR?

Does it support tagging?

What about searching in PDF?

I miss XXX language analyzer. Can you add it?

Are you going to add UI localizations?

What is the maximum file size it can handle?

What is the difference between Ambar CE and Ambar EE?

Can anyone else see my documents?

I have a problem what should I do?

Change Log

Contributors

Privacy Policy

License

Ambar

About

Releases

Packages

Languages

License

prerak-patel/Ambar

Folders and files

Latest commit

History

Repository files navigation

🔍 Ambar: Document Search System

Features

Search

Crawling

Content Extraction

General

Editions

Installation

How it Works

FAQ

Is it open-source?

Is it free?

Does it perform OCR?

Which languages are supported for OCR?

Does it support tagging?

What about searching in PDF?

I miss XXX language analyzer. Can you add it?

Are you going to add UI localizations?

What is the maximum file size it can handle?

What is the difference between Ambar CE and Ambar EE?

Can anyone else see my documents?

I have a problem what should I do?

Change Log

Contributors

Privacy Policy

License

Ambar

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages