pip install -r requirements.txt
Windows system meets pdf2image
related errors, see issue_markdown for solution.
Download the latest n day research paper PDF, generate corresponding markdown and preview images.
python arxiv_spider.py --category cs.CL --root_dir /your/papers/dir/path --days n
If you want to overwrite previous markdowns, use
python arxiv_spider.py --category cs.CL --root_dir /your/papers/dir/path --days n --overwrite
If your disk space is not enogh and you just want to have a first preview of the main idea of the paper, you can just keep the 8 first pages of the paper.
Just use --keep_eight_pages
to keep the first eight pages.
python arxiv_spider.py --category cs.CL --root_dir /your/papers/dir/path --days n --keep_eight_pages
- Highlight target conferences and authors.
- Highlight custom keyword list.
- Use VPN Arg Parse.
- Customize the specific areas of interest, such as CS.CL and CS.CV.
- Maintain an index database of all papers for quick search and reference.
- Demo videos.
This repo also support acl anthology papaer download.
Step 1: download all papers, default is ACL 2023 papers, you can specify your acl url.
python acl_spider.py
Step 2: filter papers according to your keywords, and get the abstract of every paper.
python filter_papers.py --keywords keyword1 keyword2 keyword3