Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REQ] Allow injection of config parameters/files for tesseract #33

Open
mlieberman85 opened this issue Apr 17, 2021 · 3 comments
Open

Comments

@mlieberman85
Copy link

Unless I'm missing something there doesn't seem to be a way to put either config parameters e.g. --psm 3 --oem0 or refer to a config file in the high level Leptess API.

@ccouzens
Copy link
Collaborator

You're right. It's not there today.

It looks like PSM (PageSegMode) would correspond to https://tesseract-ocr.github.io/tessapi/5.x/a00008.html#a7393e8cb70161c588eff1dbb5e97e4d5

And OEM (OCR engine mode) would need to use a different init function https://tesseract-ocr.github.io/tessapi/5.x/a00008.html#a75e22aabb144f06f07741188df3cc41a

I don't know if there is a c function to use a config file.


I'll work on these, but I'm planning to do some re-factoring first. #34

@mlieberman85
Copy link
Author

Thanks! I am not very experienced in Rust, especially with unsafe, but if you pointed me in the right direction I could take a crack at it after you've started your refactor.

From my understanding reading through the C++ "configs" parameter in the init functions actually refers to config files, i.e. config files under TESSDATA_PREFIX. I traced "configs" to: https://github.com/tesseract-ocr/tesseract/blob/2dfa38a0728b30485a7137d140724f014dc6b5d6/src/ccmain/tessedit.cpp#L365-L380

The API also does have: https://tesseract-ocr.github.io/tessapi/5.x/a00008.html#a19e00633eb5ea36356fa02b3f3b694a3

I couldn't find the ability to just pass in other config values like you would via the command line like: tessedit_char_whitelist.

@mlieberman85
Copy link
Author

For the time being I worked around it using https://github.com/antimatter15/tesseract-rs:

let mut tba = TessBaseAPI::new();
    tba.init_4(None, Some(&CString::new("eng")?), tesseract_sys::TessOcrEngineMode_OEM_TESSERACT_ONLY)?;

ccouzens added a commit to ccouzens/tesseract-plumbing that referenced this issue Apr 18, 2021
This is a requirement of both tesseract-rs and leptess.

houqp/leptess#33
antimatter15/tesseract-rs#25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants