Skip to content

Commit

Permalink
Update docs (#40)
Browse files Browse the repository at this point in the history
  • Loading branch information
fcogidi authored Jan 22, 2025
1 parent ed2fd59 commit 0893db7
Show file tree
Hide file tree
Showing 8 changed files with 2,546 additions and 1,703 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/docs_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,5 @@ jobs:
python3 -m pip install --upgrade pip && python3 -m pip install poetry
poetry env use '3.10'
source $(poetry env info --path)/bin/activate
poetry install --with docs,test
poetry install --with docs,test,dev,peft
cd docs && rm -rf source/reference/api && make html
60 changes: 30 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,30 @@
# mmlearn

[![code checks](https://github.com/VectorInstitute/mmlearn/actions/workflows/code_checks.yml/badge.svg)](https://github.com/VectorInstitute/mmlearn/actions/workflows/code_checks.yml)
[![integration tests](https://github.com/VectorInstitute/mmlearn/actions/workflows/integration_tests.yml/badge.svg)](https://github.com/VectorInstitute/mmlearn/actions/workflows/integration_tests.yml)
[![license](https://img.shields.io/github/license/VectorInstitute/mmlearn.svg)](https://github.com/VectorInstitute/mmlearn/blob/main/LICENSE)

This project aims at enabling the evaluation of existing multimodal representation learning methods, as well as facilitating
*mmlearn* aims at enabling the evaluation of existing multimodal representation learning methods, as well as facilitating
experimentation and research for new techniques.

## Quick Start

### Installation

#### Prerequisites

The library requires Python 3.10 or later. We recommend using a virtual environment to manage dependencies. You can create
a virtual environment using the following command:

```bash
python3 -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate
```

#### Installing binaries

To install the pre-built binaries, run:

```bash
python3 -m pip install mmlearn
```
Expand Down Expand Up @@ -73,13 +80,15 @@ Uses the <a href=https://huggingface.co/docs/peft/index>PEFT</a> library to enab
</table>

For example, to install the library with the `vision` and `audio` extras, run:

```bash
python3 -m pip install mmlearn[vision,audio]
```

</details>

#### Building from source

To install the library from source, run:

```bash
Expand All @@ -89,6 +98,7 @@ python3 -m pip install -e .
```

### Running Experiments

We use [Hydra](https://hydra.cc/docs/intro/) and [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/) to manage configurations
in the library.

Expand All @@ -97,9 +107,11 @@ have an `__init__.py` file to make it a Python package and an `experiment` folde
This format allows the use of `.yaml` configuration files as well as Python modules (using [structured configs](https://hydra.cc/docs/tutorials/structured_config/intro/) or [hydra-zen](https://mit-ll-responsible-ai.github.io/hydra-zen/)) to define the experiment configurations.

To run an experiment, use the following command:

```bash
mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name
```

Hydra will compose the experiment configuration from all the configurations in the specified directory as well as all the
configurations in the `mmlearn` package. *Note the dot-separated path to the directory containing the experiment configuration
files.*
Expand All @@ -109,23 +121,38 @@ One can add a path to `hydra.searchpath` either as a package (`pkg://path.to.con
Hence, please refrain from using the `file://` notation.

Hydra also allows for overriding configuration parameters from the command line. To see the available options and other information, run:

```bash
mmlearn_run 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> --help
```

By default, the `mmlearn_run` command will run the experiment locally. To run the experiment on a SLURM cluster, we use
the [submitit launcher](https://hydra.cc/docs/plugins/submitit_launcher/) plugin built into Hydra. The following is an example
of how to run an experiment on a SLURM cluster:

```bash
mmlearn_run --multirun hydra.launcher.mem_gb=32 hydra.launcher.qos=your_qos hydra.launcher.partition=your_partition hydra.launcher.gres=gpu:4 hydra.launcher.cpus_per_task=8 hydra.launcher.tasks_per_node=4 hydra.launcher.nodes=1 hydra.launcher.stderr_to_stdout=true hydra.launcher.timeout_min=60 '+hydra.launcher.additional_parameters={export: ALL}' 'hydra.searchpath=[pkg://path.to.config.directory]' +experiment=<name_of_experiment_yaml_file> experiment=your_experiment_name
mmlearn_run --multirun \
hydra.launcher.mem_per_cpu=5G \
hydra.launcher.qos=your_qos \
hydra.launcher.partition=your_partition \
hydra.launcher.gres=gpu:4 \
hydra.launcher.cpus_per_task=8 \
hydra.launcher.tasks_per_node=4 \
hydra.launcher.nodes=1 \
hydra.launcher.stderr_to_stdout=true \
hydra.launcher.timeout_min=720 \
'hydra.searchpath=[pkg://path.to.my_project.configs]' \
+experiment=my_experiment \
experiment_name=my_experiment_name
```

This will submit a job to the SLURM cluster with the specified resources.

**Note**: After the job is submitted, it is okay to cancel the program with `Ctrl+C`. The job will continue running on
the cluster. You can also add `&` at the end of the command to run it in the background.


## Summary of Implemented Methods

<table>
<tr>
<th style="text-align: left; width: 250px"> Pretraining Methods </th>
Expand Down Expand Up @@ -181,33 +208,6 @@ Binary and multi-class classification tasks are supported.
</tr>
</table>

## Components
### Datasets
Every dataset object must return an instance of `Example` with one or more keys/attributes corresponding to a modality name
as specified in the `Modalities` registry. The `Example` object must also include an `example_index` attribute/key, which
is used, in addition to the dataset index, to uniquely identify the example.

<details>
<summary><b>CombinedDataset</b></summary>

The `CombinedDataset` object is used to combine multiple datasets into one. It accepts an iterable of `torch.utils.data.Dataset`
and/or `torch.utils.data.IterableDataset` objects and returns an `Example` object from one of the datasets, given an index.
Conceptually, the `CombinedDataset` object is a concatenation of the datasets in the input iterable, so the given index
can be mapped to a specific dataset based on the size of the datasets. As iterable-style datasets do not support random access,
the examples from these datasets are returned in order as they are iterated over.

The `CombinedDataset` object also adds a `dataset_index` attribute to the `Example` object, corresponding to the index of
the dataset in the input iterable. Every example returned by the `CombinedDataset` will have an `example_ids` attribute,
which is instance of `Example` containing the same keys/attributes as the original example, with the exception of the
`example_index` and `dataset_index` attributes, with values being a tensor of the `dataset_index` and `example_index`.
</details>

### Dataloading
When dealing with multiple datasets with different modalities, the default `collate_fn` of `torch.utils.data.DataLoader`
may not work, as it assumes that all examples have the same keys/attributes. In that case, the `collate_example_list`
function can be used as the `collate_fn` argument of `torch.utils.data.DataLoader`. This function takes a list of `Example`
objects and returns a dictionary of tensors, with all the keys/attributes of the `Example` objects.

## Contributing

If you are interested in contributing to the library, please see [CONTRIBUTING.MD](CONTRIBUTING.MD). This file contains
Expand Down
20 changes: 11 additions & 9 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,14 @@
"sphinx_copybutton",
"sphinx_design",
"sphinxcontrib.apidoc",
"myst_parser",
]
add_module_names = False
apidoc_module_dir = "../../mmlearn"
apidoc_output_dir = "reference/api"
apidoc_excluded_paths = ["tests"]
apidoc_separate_modules = True
apidoc_module_first = True
autoclass_content = "class"
autodoc_default_options = {
"members": True,
Expand All @@ -47,13 +53,6 @@
autosummary_generate = True
copybutton_prompt_text = r">>> |\.\.\. "
copybutton_prompt_is_regexp = True
napoleon_google_docstring = False
napoleon_numpy_docstring = True
napoleon_include_init_with_doc = True
napoleon_attr_annotations = True
set_type_checking_flag = True


intersphinx_mapping = {
"python": ("https://docs.python.org/3.10/", None),
"numpy": ("http://docs.scipy.org/doc/numpy/", None),
Expand All @@ -67,9 +66,12 @@
"torchmetrics": ("https://lightning.ai/docs/torchmetrics/stable/", None),
"Pillow": ("https://pillow.readthedocs.io/en/latest/", None),
"transformers": ("https://huggingface.co/docs/transformers/en/", None),
"peft": ("https://huggingface.co/docs/peft/en/", None),
}

napoleon_google_docstring = False
napoleon_numpy_docstring = True
napoleon_include_init_with_doc = True
napoleon_attr_annotations = True
set_type_checking_flag = True
templates_path = ["_templates"]

# -- Options for HTML output -------------------------------------------------
Expand Down
2 changes: 2 additions & 0 deletions docs/source/contributing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.. include:: ../../CONTRIBUTING.md
:parser: myst_parser.sphinx_
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ Contents
:maxdepth: 2

installation
getting_started
user_guide
contributing
api
Loading

0 comments on commit 0893db7

Please sign in to comment.