Author: Teresa Davison
- email: [email protected]
- date: Apr 27, 2024
Description: Investigation into colexification of concepts within language families or geographical areas based on semantic field and ontological catgory using machine learning algorithms like Naive Bayes, Random Forest classification, SVC and K-means clustering.
Data: Data was sourced from the SQL database underlying the Database of Cross-Linguistic Colexifications(CLICS3).
Directory:
- final_report.md: Final synthesis of the project process as well as results and analysis.
- notebooks: folder of jupyter notebooks showing the progress of the project at certain intervals.
- progress report 1: initial data exploration.
- progress report 2: create feature df for a few languages to test process.
- progress report 3: sample languages and create features in order to build models.
- final progress report: create visualizations for final presentation and report and get top features for some models.
- progress_report.md: includes summaries of what was accomplished in each of the notebook files.
- project_plan.md: original motivation for project and plan for analysis.
- LICENSE.md: licensing information for project.
- LING1340FinalPresentation.pdf: pdf version of final presentation slides.
- data-samples: folder of sampled data for each of the relevant dataframes in csv and pkl format as well as csv format feature dataframes created for Russian, Tamil, and German during the development phase.
- figures: a folder of any figures, graphs or images used in the final report.
Guest book: Please follow this link to visit my guestbook!