ArXiv Paper Downloader

This code downloads papers in parallel from ArXiv given search parameters. The papers are saved in the pdf format. The script used the ArXiv API to access the database. The code is written in Python3.

Disclaimer

A large number of papers downloading is prohibited. Please see ArXiv API rules. Besides, using a third-party scaper (such as Scrapy library) to download papers from ArXiv is strictly prohibited. Please see the ArXiv site rule

Installation

Create a virtualenv - How to create virtualenv
Activate the virtualenv - 'source path/to/bin/activate'
Run 'pip install -r requirements.txt'

Running

After crating virtual environment, clone this repo and run:

python arXiv_pdf_downloader.py --search <searchQuery> --search_by <searchBy> --sort_by <sortBy> --max_result <maxResult> --folder_name<folderName>

Query parameters

Query parameters are defined below

searchQuery: Search query, could be any string (required)
searchBy: Seach by, choose one from (optional)
- 'ti': Title
- 'au': Author
- 'abs': Abstract
- 'co': Comment
- 'jr': Journal Reference
- 'cat': Subject Category
- 'rn': Report Number
- 'all': All of the above (default)
sortBy: Parameter to use for sorting, choose one from (optional)
- 'relevance' (default)
- 'lastUpdatedDate'
- 'submittedDate'
maxResult: Number of requested paper. It is possible to receive papers less than requested number is not enough papers matched with the query (optional, default=1).
folderName: Name of the folder where downloaded paper will be stored (optional, default='fetched_pdf').

Example of valid parameters and format:

searchQuery: 'nlp transformer'
searchBy: 'all'
sortBy: 'relevance'
maxResult: 5
folderName: 'arXiv papers'

Unit Testing

Run testing via command

python test_pdf_downloader.py

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
__pycache__		__pycache__
.DS_Store		.DS_Store
Readme.md		Readme.md
arXiv_pdf_downloader.py		arXiv_pdf_downloader.py
requirements.txt		requirements.txt
test_pdf_downloader.py		test_pdf_downloader.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArXiv Paper Downloader

Disclaimer

Installation

Running

Query parameters

Unit Testing

About

Releases

Packages

Languages

ndenStanford/arXiv_downloader

Folders and files

Latest commit

History

Repository files navigation

ArXiv Paper Downloader

Disclaimer

Installation

Running

Query parameters

Unit Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages