Ambar is an open-source document search and management system with automated crawling, OCR, tagging and instant full-text search.
Ambar defines the new way to manage your documents out of the box:
- Ingest documents from any source
- Find documents and images instantly with Google-like search
- Manage your documents with tags, hide irrelevant search results
- Auto tagging & named entitites recognition
Tutorial: Mastering Ambar Search Queries
- Fuzzy Search (John~3)
- Phrase Search ("John Smith")
- Search By Author (author:John)
- Search By File Path (filename:*.txt)
- Search By Date (when: yesterday, today, lastweek, etc)
- Search By Size (size>1M)
- Search By Tags (tags:ocr)
- Search As You Type
- Supported language analyzers: English
ambar_en
, Russianambar_ru
, Germanambar_de
, Italianambar_it
, Polishambar_pl
, Chineseambar_cn
, CJKambar_cjk
- SMB Crawling
- FTP/FTPS Crawling
- Mail Crawling
- Dropbox Crawling
- Scheduled Crawling (Cron schedule syntax)
- Extract content from large files (>30M)
- ZIP archives
- MS Office documents (Word, Excel, Powerpoint, Visio, Publisher)
- OCR over images
- Email messages with attachments
- Adobe PDF (with OCR)
- OCR languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld
- OpenOffice documents
- RTF, Plaintext
- HTML / XHTML
- Multithread processing (Only EE)
- Files Tagging (Auto tagging as well)
- Named Entitites
- Hiding Irrelevant Search Results
- Files Preview
- Web UI
- REST API
- Multiple user accounts (Only EE)
There are two editions available: Community and Enterprise. Enterprise Edition is a full featured document search and management system that can handle terabytes of data.
Community Edition is a scaled down, single user version of Enterprise Edition with limited number of pipelines and crawlers, though preserving the full functionality. You are welcome to use Ambar Community Edition for both personal and commercial purposes, at no cost.
Installation is straightforward. Turn on your Linux machine and follow our step-by-step installation guide.
Docker images can be found on Docker Hub
- Under the Hood
- REST API Documentation
- Management Script
- The Source Code is freely available under Fair Source License 1. (Frontend, Crawler, ElasticSearch, Rabbit, Mongo, Installer)
Yes, almost every Ambar's module is published on GitHub under Fair Source License 1
Yes, Community Edition is forever free. We will NOT charge a penny from you to use it.
Yes, it performs OCR on images (jpg, tiff, bmp, etc) and PDF's. OCR is perfomed by well-known open-source library Tesseract. We tuned it to achieve best perfomance and quality on scanned documents. You can easily find all files on which OCR was perfomed with tags:ocr
query
Supported languages: Eng, Rus, Ita, Deu, Fra, Spa, Pl, Nld. If you miss your language, please create a new issue and we'll add it ASAP.
Yes!
Yes, it can search through any PDF, even badly encoded or with scans inside. We did our best to make search over any kind of pdf document smooth.
Yes, please create an issue on GitHub.
As for now there are two options: Russian and English, change uiLang
in your config.json
. If you want to add your own localization, please contact us on [email protected].
It's limited by amount of RAM on your machine, typically 500MB. It's an awesome result, as typical document managment systems offer 30MB maximum file size to be processed.
Basically Ambar CE is a downscaled Ambar EE. Check comparison on our landing page.
Nope, check our Privacy Policy.
Submit an issue