This repository contains the code of our work presented on the 26th Euromicro Conference Series on Digital System Design (DSD) in Durres, Albania, in September of 2023: "Novel Approach for AI-based Risk Calculator Development using Transfer Learning Suitable for Embedded Systems". This works presents a methodology for the preliminary design of a risk calculator using medical tabular databases based on Machine Learning (ML), combining the knowledge of different clinically validated cardiovascular risk calculators using Transfer Learning (TL). This aims a more personalized NCD risk estimation than the current regression-based approaches. This work is enclosed in the WARIFA European Project, whose main ojective is to develop an AI-based application aiming chronic conditions prevention and management, such as Diabetes Mellitus or Cardiovascular Diseases (CVD), by providing personalized recommendations depending on the subject and the variables that are collected from him/her. Besides, a preliminary basic high-level performance profiling has been also done to estimate the feasibility of implementing this ML-based calculator in a micro-controller.
The content of the scripts are described below:
Framingham_utils.py
andSteno_utils.py
: data curation and preparation of the datasets.exploratory_data_analysis.py
: exploratory data analysis.model_evaluation.py
: model evaluation functions for the selected ML models.train_utils.py
: functions to train the models.profiling.py
: profiling functions extracted from this exampleconstants.py
: file with the name of the directories, file names, dataset names, and numerical and categorical features. MUST BE CHANGED WITH YOUR OWN PATHS, FILES, ETC!!!steno2fram.ipynb
andfram2steno.ipynb
are the Python Notebooks that contain the framework itself. The former taking Steno database as reference, and the latter taking Framingham dataset.
Please cite our paper if this framework somehow helped you in your research and/or development work, or if you used this piece of code:
A. J. Rodríguez-Almeida, H. Fabelo, C. Soguero-Ruiz, R. M. Sanchez-Hernandez, A. M. Wägner and G. M. Callico, "Novel Approach for AI-Based Risk Calculator Development Using Transfer Learning Suitable for Embedded Systems," 2023 26th Euromicro Conference on Digital System Design (DSD), Golem, Albania, 2023, pp. 103-110, doi: 10.1109/DSD60849.2023.00024.
Both datasets are avilable under request to their authors (see [5] and [6] references in the paper to check Steno and Framingham availability, respectively).
This code was developed with Python 3.8.13, with ipykernel
installed to run the framework using Jupyter Notebooks, so this feature must be supported by your software development tool.
After changing the paths, filenames, etc. from constants.py
to the corresponding ones of your paths, you just have to run one of the .ipynb
files in the development environment you use.
The execution of each .ipynb
file generates the EDA
and results
folders. Mainly, in the EDA
folder, the histograms of the different continous variables of the datasets are stored, to visually demonstrate the heterogeneity of both datasets. In the results
folder, an Excel file containing the results pre-TL and post-TL are placed, including also the classification confusion matrices. Please, refer to our paper for a more detailed analysis of the obtained results.
For any other questions related with the code or the proposed framework itself, you can post an issue on this repository or contact me via email.