Skip to content

AI-READI/dataset-documentation-paper-code

Repository files navigation

Contributors Stargazers Issues MIT License DOI

Code: Dataset Documentation for AI Paper

About

This is the code associated with our paper where we analyzed various dataset documentation approaches that can help with the responsible development of AI models. See this inventory for all related resources, including the paper.

Standards followed

The overall code is structured according to the FAIR-BioRS guidelines. The Python code in the various Jupyter notebooks follows the PEP8 guidelines. All the dependencies are documented in the environment.yml file.

Using the Jupyter notebooks

Prerequisites

We recommend using Anaconda to create and manage your development environment and using JupyterLab to run the notebook. All the subsequent instructions are provided assuming you are using Anaconda (Python 3 version) and JupyterLab.

Clone repo

Clone the repo or download as a zip and extract.

cd into the code folder

Open Anaconda prompt (Windows) or the system Command line interface then naviguate to the code

cd .dataset-documentation-paper-code

Setup conda env

$ conda env create -f environment.yml

Setup kernell for Jupyter lab

$ conda activate dataset-documentation-env
$ conda install ipykernel
$ ipython kernel install --user --name=dataset-documentation
$ conda deactivate

Setup env vars

The environment variables required are listed in the table below along with information on how to get them

Suggested name Value or instructions for obtaining it Purpose
GITHUB_ACCESS_TOKEN https://docs.github.com/en/rest/authentication/authenticating-to-the-rest-api Required to run the GitHub search code in real-world-usage.ipynb

Launch Jupyter lab

Launch Jupyter lab and naviguate to open the Jupyter notebook of interest. Make sure to change the kernel to the one created above called "dataset-documentation" (e.g., see here). We recommend to use the JupyterLab code formatter along with the Black and isort formatters to facilitate compliance with PEP8 if you are editing the notebook.

Inputs/outputs

The Jupyter notebook makes use of files in the dataset associated with the paper (see here). You will need to download the dataset at add it in the inputs folder (call the dataset folder 'dataset' after downloading it).

Outputs of the code include plots and tables displayed in the notebook but also saved as files. These saved plot files are included in the outputs folder.

License

This work is licensed under MIT. See LICENSE for more information.

Feedback and contribution

Use the GitHub issues for submitting feedback or making suggestions. You can also work the repository and submit a pull request with suggestions.

How to cite

If you use this code, please cite the related paper (it will be listed here when available) and also cite this repository as:

Simpkins, Kyongmi, Patel, Bhavesh. Code: Dataset Documentation for AI Paper [Software]. Zenodo. https://doi.org/10.5281/zenodo.14583673

About

Code associated with the data documentation paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published