Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

v0.6.7

Compare
Choose a tag to compare
@gmastrapas gmastrapas released this 25 Nov 11:31
· 91 commits to main since this release
44d2ed6

Release Note Finetuner 0.6.7

This release contains 4 new features.

🆕 Features

Add support for cross-modal evaluation in the EvaluationCallback (#615)

In previous versions of Finetuner, when using the EvaluationCallback to calculate IR metrics, you could only use a single model to encode both the query and the index data.
This means that for training multiple models at the same time, like in CLIP fine-tuning, you could only use one encoder for evaluation.
It is now possible to do cross-modal evaluation, where you use one model for encoding the query data and a second model for encoding the index data.
This is useful in multi-modal tasks like text-to-image.

For doing the cross-modal evaluation, all you need to do is specify the model and index_model arguments in the EvaluationCallback, like so:

import finetuner
from finetuner.callback import EvaluationCallback

run = finetuner.fit(
    model='openai/clip-vit-base-patch32',
    train_data=train_data,
    eval_data=eval_data,
    loss='CLIPLoss',
    callbacks=[
        EvaluationCallback(
            query_data=query_data,
            index_data=index_data,
            model='clip-text',
            index_model='clip-vision'
        )
    ]
)

See the EvaluationCallback section of the Finetuner documentation for details on using this callback.
See also the sections Text-to-Image Search via CLIP and Multilingual Text-to-Image search with MultilingualCLIP for concrete examples of cross-modal evaluation.

Add support for Multilingual CLIP (#611)

Finetuner now supports a Multilingual CLIP model from the OpenCLIP project.
Multilingual CLIP models are trained on large text and image datasets from different languages using the CLIP constrastive learning approach.

They are a good fit for text-to-image applications where texts are in languages other than English.

The currently supported Multilingual CLIP model - xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k - uses a ViT Base32 image encoder and an XLM Roberta Base text encoder.

You can find details on how to fine-tune this specific model in the Multilingual Text-to-Image search with MultilingualCLIP section of the documentation.

import finetuner
run = finetuner.fit(
    model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',
    train_data=train_data,
    eval_data=eval_data,
    epochs=5,
    learning_rate=1e-6,
    loss='CLIPLoss',
    device='cuda',
)

Filter models by task in finetuner.describe_models() (#610)

The finetuner.describe_models() function, which provides an overview of supported model backbones, now accepts an optional task argument that filters the models by task.

To display all models you can omit the argument.

import finetuner
finetuner.describe_models()

To filter based on task, you need to provide a valid task name. For example:

finetuner.describe_models(task='image-to-image')

or

finetuner.describe_models(task='text-to-image')

Currently valid task names are text-to-text, text-to-image and image-to-image.

Configure the num_items_per_class argument in finetuner.fit() (#614)

The finetuner.fit() method now includes a new argument num_items_per_class that allows you to set the number of items per label that will be included in each batch.
This gives the user the ability to further tailor batch construction to their liking. If not set, this argument has a default value of 4, compatible with the previous versions of Finetuner.

You can easily set this when calling finetuner.fit():

import finetuner
run = finetuner.fit(
    model='efficient_b0',
    train_data=train_data,
    eval_data=eval_data,
    batch_size=128,
    num_items_per_class=8,
)

⚠️ The batch size needs to be a multiple of the number of items per class, in other words batch_size % num_items_per_class == 0.
Otherwise Finetuner cannot respect the given num_items_per_class and throws an error.

🤟 Contributors

We would like to thank all contributors to this release: