-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-page support (TIFF) #43
Comments
Hey, I might be able to look at this but it wouldn't be until next weekend I think this might be possible today using set_image_from_mem and the image crate but I haven't tried it. Some notes for myself: |
Thanks a ton for the reply. Looking at the linked image crate / Leptonica does provide the required functionality already, right? My armchair idea - and I would be willing to help where I can - is therefore that
In this case there would be no need for another crate and it would probably avoid re-reading (and potentially copying) the image(s) around? |
My comment about tiff and windows is because of this documentation https://tpgit.github.io/Leptonica/leptprotos_8h.html#a027a927dc3438192e3bdae8c219d7f6a > On windows, this will only read tiff formatted files from memory. For other formats, it requires fmemopen(3). Attempts to read those formats will fail at runtime. (3) Whilst it won't resolve the issue, this is my first step at tackling #43. The next step will be to add and use the leptonica methods that support tiff from disk and `PixA`.
My comment about tiff and windows is because of this documentation https://tpgit.github.io/Leptonica/leptprotos_8h.html#a027a927dc3438192e3bdae8c219d7f6a > On windows, this will only read tiff formatted files from memory. For other formats, it requires fmemopen(3). Attempts to read those formats will fail at runtime. (3) Whilst it won't resolve the issue, this is my first step at tackling #43. The next step will be to add and use the leptonica methods that support tiff from disk and `PixA`.
My comment about tiff and windows is because of this documentation https://tpgit.github.io/Leptonica/leptprotos_8h.html#a027a927dc3438192e3bdae8c219d7f6a > On windows, this will only read tiff formatted files from memory. For other formats, it requires fmemopen(3). Attempts to read those formats will fail at runtime. (3) Whilst it won't resolve the issue, this is my first step at tackling #43. The next step will be to add and use the leptonica methods that support tiff from disk and `PixA`.
Hi, I haven't forgotten about this. I'm going to try and get to this step tonight
|
I'm someone suspicious that calling `pixaDestroy` after doesn't change a lot according to valgrind ``` valgrind --leak-check=yes --error-exitcode=1 --trace-children=yes cargo test read_multipage_tiff_test 2>&1 ``` I believe I'm doing the right thing even if the tooling doesn't confirm it. houqp/leptess#43
I'm someone suspicious that calling `pixaDestroy` after doesn't change a lot according to valgrind ``` valgrind --leak-check=yes --error-exitcode=1 --trace-children=yes cargo test read_multipage_tiff_test 2>&1 ``` I believe I'm doing the right thing even if the tooling doesn't confirm it. houqp/leptess#43 Evince (Gnome PDF viewer) says the tiff file has 3 pages. GIMP (and leptonica) say that the tiff file has 2 pages. I created it with 2, so I believe this is correct. I noticed an off by one error in Pixa::get_pix which I also corrected in Boxa::get. If an array has n elements, we can't access the nth element. It looks like Leptonica follows the c string convention of adding a null element at the end, so the array has the space, but we can't dereference it.
You may be interested in this PR. Github won't let me assign you as a reviewer. |
Hey.
Most OCR work I've seen so far uses (b/w, CCITT compressed) multi-page documents. I'd like to make these work with leptess, but it seems (unless I'm missing something?) that there's only support for
Pix
(not:PixA
), nor a mapping for direct TIFF I/O (saypixaReadMultipageTiff
from Leptonica). The high level wrapper (leptess:LepTess) also doesn't expose a method to directlyset_image
aPix
, but that would be the most trivial thing to change.In other words: I was hoping for a Rust (leptess) workflow that allows
PixA
Pix
and collecting the recognition resultsIs that something you'd be willing to support? Am I missing a way how this would work today already? I could offer to look into this, but I admit that I'm a Rust beginner at this point in time.
The text was updated successfully, but these errors were encountered: