PanoOCR

PanoOCR is a Python library for performing Optical Character Recognition (OCR) on equirectangular panorama images. It automatically handles the conversion between flat and spherical coordinates, making it ideal for OCR tasks involving 360° panoramic content.

Demo

This is a demo using the built-in preview tool with the test results in /assets folder.

panoocr-demo.mp4

The test image is taken by the author himself and is copyright-free. Feel free to use it as you wish.

Features

Support for multiple OCR engines:
- macOCR (macOS native OCR)
- PaddleOCR (with optional V4 server model)
- EasyOCR
- Florence2
- TrOCR
Automatic perspective generation from equirectangular panoramas
Spherical coordinate conversion
Duplication detection and removal across perspectives
Multi-language support (depending on OCR engine)
Interactive preview tool for visualization

Installation

Clone the repository:
```
git clone [repository-url]
cd panoocr
```
Install dependencies:
```
pip install -r requirements.txt
```

Install ocr-engine-specific requirements based on your needs:

# For PaddleOCR
pip install -r requirements-paddle.txt

# For EasyOCR
pip install -r requirements-easyocr.txt

# For macOCR (macOS native ocr, aka. Apple Vision Framework)
pip install -r requirements-mac.txt

# For Florence2
pip install -r requirements-huggingface.txt

Usage

Basic Usage

The basic usage is showcased in run_panoocr.py

To run the script, simply execute:

python run_panoocr.py --ocr-engine [ocr-engine-name] --image-path [path-to-panorama-image]

The result will be saved in the same folder of the original image, with the same filename but with a different extension: .[ocr-engine-name].json

Core Components

The core building blocks of panoocr are:

OCREngine: Represents an OCR engine.
SphereOCRDuplicationDetectionEngine: Represents a duplication detection engine.
PanoramaImage: Represents an equirectangular panorama image.
PerspectiveMetadata: Represents a perspective view of the panorama.

For the two engines, OCREngine and SphereOCRDuplicationDetectionEngine, you can bring your own implementations via inheritance. Their APIs are defined in engine.py and duplication_detection.py respectively.

By default, run_panoocr.py uses po.DEFAULT_IMAGE_PERSPECTIVES which is 16 perspectives with the following settings:

pixel_width: 2048
pixel_height: 2048
45° horizontal field of view
0° yaw offset
0° pitch offset
22.5° yaw interval

You can also specify different perspective settings by directly constructing PerspectiveMetadata objects as such:

perspective = po.PerspectiveMetadata(
  pixel_width=1024,
  pixel_height=512,
  horizontal_fov=45,
  vertical_fov=45,
  yaw_offset=0,
  pitch_offset=0,
)

Interactive Preview Tool

I also built a web-based interactive preview tool that allows you to visualize the OCR results on the panorama image. It's located in preview/index.html. To run it, run a http server in the preview folder:

cd preview && python -m http.server

Then, open your browser and navigate to http://localhost:8000, you should see the preview tool.

Simply drag and drop the JSON result file and your panorama image to the interface, and you should see the OCR results overlaid on the panorama image.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PanoOCR

Demo

Features

Installation

Usage

Basic Usage

Core Components

Interactive Preview Tool

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
preview		preview
src/panoocr		src/panoocr
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements-easyocr.txt		requirements-easyocr.txt
requirements-huggingface.txt		requirements-huggingface.txt
requirements-mac.txt		requirements-mac.txt
requirements-paddle.txt		requirements-paddle.txt
requirements.txt		requirements.txt
run_panoocr.py		run_panoocr.py

License

yz3440/panoocr

Folders and files

Latest commit

History

Repository files navigation

PanoOCR

Demo

Features

Installation

Usage

Basic Usage

Core Components

Interactive Preview Tool

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages