Pause while adding/exporting papers #37

Evildoor · 2017-07-21T14:41:42Z

Current procedure for adding new papers is roughly as following:

Create a directory for a paper.
Extract a page into txt.
Repeat 2 until all pages are processed.
Perform finishing steps (create metadata.json, extract 1st page into xml, etc).
Repeat 1-4 until all papers are added.

During this procedure the manager more or less "hangs", only changing the status bar to display the paper being processed. This means that to "pause" and "continue" the procedure (for example, mining a big corpus of papers over several days), one needs to:

Remember the name of current paper.
Terminate the PDFAnalyzer process.
Delete the current paper folder - all the previous ones will be fully functional, but processed pages in current one will be lost.
Start the PDFAnalyzer again.
Add the papers again, starting from the current one (or from the beginning to receive messages about existing papers).

This is somewhat inconvenient, so it would be handy to:

Implement proper pausing and resuming of extraction procedure without process termination.
Implement resuming from the middle of a paper. One should keep in mind that some parameters (number of pages, rotated pages, etc) are not stored anywhere until metadata.json is created.

Update: exporting works differently, but suffers from pretty much same problems.

Evildoor added Future plans PDF Analyzer labels Jul 21, 2017

Evildoor changed the title ~~Pause while adding papers~~ Pause while adding/exporting papers Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pause while adding/exporting papers #37

Pause while adding/exporting papers #37

Evildoor commented Jul 21, 2017 •

edited

Loading

Pause while adding/exporting papers #37

Pause while adding/exporting papers #37

Comments

Evildoor commented Jul 21, 2017 • edited Loading

Evildoor commented Jul 21, 2017 •

edited

Loading