EuropeanaHarvester

A script for harvesting metadata from Wikimedia Commons for the use in Europeana

Given a (set of) categories on Commons along with templates and matching patterns for external links in a json file (see examples in projects folder); it queries the Commons API for metadata about the images and follows up by investigating the templates used and external links on each filepage. The resulting information is outputted to an xml file, per Europeana specifications.

Additionally the data is outputed (along with a few unused fields) as a csv to allow for easier analysis/post-processing together with an analysis of used categories and a logfile detailing potential problems in the data.

For lazy/frequent use stick username/password on Wikimedia Commons into config.py as variables user/password (in unicode). If not pressent then getpass is imported and used to prompt for username and password.

Usage: python Europeana.py filename option where:

filename (required): the (unicode)string relative pathname to the json file for the project
option (optional): if set to:
- verbose: toggles on verbose mode with additional output to the terminal
- test: toggles on testing (a verbose and limited run)

Requires WikiApi

WikiApi is based on PyCJWiki Version 1.31 (C) by Smallman12q GPL, see http://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
output		output
projects		projects
.gitignore		.gitignore
Europeana.py		Europeana.py
LICENSE		LICENSE
README.md		README.md
WikiApi.py		WikiApi.py
creditStrings.json		creditStrings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EuropeanaHarvester

About

Releases

Packages

Languages

License

Wikimedia-Sverige/EuropeanaHarvest

Folders and files

Latest commit

History

Repository files navigation

EuropeanaHarvester

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages