Werbcorp-crawler

This is a set of simple tools for postprocessing results acquired, for instance, from a web corpus. The tools included here are especially relevant for cases when

concordance results provided by the corpus's interface cannot be easily fetched as json, csv etc.
the corpus doesn't contain all the kinds of annotations you would like to have and you want to add these annotations to individual concordance results afterwards.

Installation

Via pip:

pip3 install git+https://github.com/tunicorpora/webcorpcrawler

Alternatively, you can just manually clone the repository, cd into the directory and run pip3 install -e . (note the dot at the end of the command)

Examples

See the docs folder. (dircet link)

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
docs		docs
testdata		testdata
webcorp_crawler.egg-info		webcorp_crawler.egg-info
webcorpcrawler		webcorpcrawler
.gitignore		.gitignore
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
test_updater.py		test_updater.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Werbcorp-crawler

Installation

Examples

About

Releases

Packages

Languages

tunicorpora/webcorpcrawler

Folders and files

Latest commit

History

Repository files navigation

Werbcorp-crawler

Installation

Examples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages