This repository contains results of the manuscripts clustering.
Using the JSON file, export.json, produced by the script extractor_json.py, we want to cluster all the entries and, doing so, know exactly how many manuscripts have been sold multiple times.
The entries are clustered are clustered based on their traits :
- author
- format
- number of pages
- price
- date
If you want to cluster all the entries of export.json
, try this :
* git clone https://github.com/katabase/soldMss.git
* cd soldMss
* python3 -m venv my_env
* source my_env/bin/activate
* pip install -r requirements.txt
* cd scripts
* python3 reconciliator_all.py ../export.json
Note that the output file of this clustering is available here.
Now you can try some data analysis, being in the scripts
folder :
- about the price with
python3 price.py
- about the authors with
python3 author.py
- about the number of sales of each manuscript with
python3 mss_list.py
All the results will be in the output
folder.
You can test the script with :
python3 test.py
- The scripts were created by Alexandre Bartz and Matthias Gille Levenson with the help of Simon Gabay.
Alexandre Bartz, Simon Gabay, Matthias Gille Levenson, Ljudmila Petkovic and Lucie Rondeau du Noyer, Manuscript sale catalogues : clustering, Neuchâtel: Université de Neuchâtel, 2020, https://github.com/katabase/soldMss.
This work is licensed under a Creative Commons Attribution 4.0 International Licence.