Code: Dataset Documentation for AI Paper

About

This is the code associated with our paper where we analyzed various dataset documentation approaches that can help with the responsible development of AI models. See this inventory for all related resources, including the paper.

Standards followed

The overall code is structured according to the FAIR-BioRS guidelines. The Python code in the various Jupyter notebooks follows the PEP8 guidelines. All the dependencies are documented in the environment.yml file.

Using the Jupyter notebooks

Prerequisites

We recommend using Anaconda to create and manage your development environment and using JupyterLab to run the notebook. All the subsequent instructions are provided assuming you are using Anaconda (Python 3 version) and JupyterLab.

Clone repo

Clone the repo or download as a zip and extract.

cd into the code folder

Open Anaconda prompt (Windows) or the system Command line interface then naviguate to the code

cd .dataset-documentation-paper-code

Setup conda env

$ conda env create -f environment.yml

Setup kernell for Jupyter lab

$ conda activate dataset-documentation-env
$ conda install ipykernel
$ ipython kernel install --user --name=dataset-documentation
$ conda deactivate

Setup env vars

The environment variables required are listed in the table below along with information on how to get them

Suggested name	Value or instructions for obtaining it	Purpose
GITHUB_ACCESS_TOKEN	https://docs.github.com/en/rest/authentication/authenticating-to-the-rest-api	Required to run the GitHub search code in real-world-usage.ipynb

Launch Jupyter lab

Launch Jupyter lab and naviguate to open the Jupyter notebook of interest. Make sure to change the kernel to the one created above called "dataset-documentation" (e.g., see here). We recommend to use the JupyterLab code formatter along with the Black and isort formatters to facilitate compliance with PEP8 if you are editing the notebook.

Inputs/outputs

The Jupyter notebook makes use of files in the dataset associated with the paper (see here). You will need to download the dataset at add it in the inputs folder (call the dataset folder 'dataset' after downloading it).

Outputs of the code include plots and tables displayed in the notebook but also saved as files. These saved plot files are included in the outputs folder.

License

This work is licensed under MIT. See LICENSE for more information.

Feedback and contribution

Use the GitHub issues for submitting feedback or making suggestions. You can also work the repository and submit a pull request with suggestions.

How to cite

If you use this code, please cite the related paper (it will be listed here when available) and also cite this repository as:

Simpkins, Kyongmi, Patel, Bhavesh. Code: Dataset Documentation for AI Paper [Software]. Zenodo. https://doi.org/10.5281/zenodo.14583673

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.ipynb_checkpoints		.ipynb_checkpoints
inputs		inputs
outputs		outputs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
comparison.ipynb		comparison.ipynb
environment.yml		environment.yml
real-world-usage.ipynb		real-world-usage.ipynb
scan_for_words.py		scan_for_words.py
survey-analysis.ipynb		survey-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Code: Dataset Documentation for AI Paper

About

Standards followed

Using the Jupyter notebooks

Prerequisites

Clone repo

cd into the code folder

Setup conda env

Setup kernell for Jupyter lab

Setup env vars

Launch Jupyter lab

Inputs/outputs

License

Feedback and contribution

How to cite

About

Releases

Packages

Contributors 2

Languages

License

AI-READI/dataset-documentation-paper-code

Folders and files

Latest commit

History

Repository files navigation

Code: Dataset Documentation for AI Paper

About

Standards followed

Using the Jupyter notebooks

Prerequisites

Clone repo

cd into the code folder

Setup conda env

Setup kernell for Jupyter lab

Setup env vars

Launch Jupyter lab

Inputs/outputs

License

Feedback and contribution

How to cite

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages