This workshop focuses on using Python programming tools for collaborative development, web data extraction, and information retrieval from documents. Participants will learn to:
- Use GitHub for collaborative projects.
- Develop effective Web Scraping tools.
- Retrieve and process relevant information from text documents.
You can find the workshop syllabus here
-
sessions
: Contains workshop sessions. It includes subfolders:1. Web Scraping
: Includes Jupyter notebooks for classes.2. Text Information Retrieval
: Includes Jupyter notebooks for classes.data
: Stores data used in the workshop.
-
proposals
: Folder for students to upload their application proposals of the learned tools: web scraping and text information retrieval. Each student should create a folder with their name following the format:branch_{name}
.
The link for the synchronous sessions is here
- Meeting ID: 952 2258 5367
- Access code: 636549
You can log in with your PUCP account. The link for the recordings YouTube playlist is here
The Text Information Retrieval section involves installing and managing multiple dependencies if you choose to work locally. For those who prefer a cloud-based approach for this section, Colab-adapted scripts are available. You can access them here. To use these scripts, download the folder, unzip it, and upload it to your Google Drive. Alternatively, if you decide to work locally, follow the instructions provided in the notebooks for session 3
and session 4
.