Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pause while adding/exporting papers #37

Open
2 tasks
Evildoor opened this issue Jul 21, 2017 · 0 comments
Open
2 tasks

Pause while adding/exporting papers #37

Evildoor opened this issue Jul 21, 2017 · 0 comments

Comments

@Evildoor
Copy link
Contributor

Evildoor commented Jul 21, 2017

Current procedure for adding new papers is roughly as following:

  1. Create a directory for a paper.
  2. Extract a page into txt.
  3. Repeat 2 until all pages are processed.
  4. Perform finishing steps (create metadata.json, extract 1st page into xml, etc).
  5. Repeat 1-4 until all papers are added.

During this procedure the manager more or less "hangs", only changing the status bar to display the paper being processed. This means that to "pause" and "continue" the procedure (for example, mining a big corpus of papers over several days), one needs to:

  1. Remember the name of current paper.
  2. Terminate the PDFAnalyzer process.
  3. Delete the current paper folder - all the previous ones will be fully functional, but processed pages in current one will be lost.
  4. Start the PDFAnalyzer again.
  5. Add the papers again, starting from the current one (or from the beginning to receive messages about existing papers).

This is somewhat inconvenient, so it would be handy to:

  • Implement proper pausing and resuming of extraction procedure without process termination.
  • Implement resuming from the middle of a paper. One should keep in mind that some parameters (number of pages, rotated pages, etc) are not stored anywhere until metadata.json is created.

Update: exporting works differently, but suffers from pretty much same problems.

@Evildoor Evildoor changed the title Pause while adding papers Pause while adding/exporting papers Jan 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant