Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update dependency sentence-transformers to v3 #140

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented May 28, 2024

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
sentence-transformers ^2.2 -> ^3.0.0 age adoption passing confidence

Release Notes

UKPLab/sentence-transformers (sentence-transformers)

v3.3.1: - Patch private model loading without environment variable

Compare Source

This patch release fixes a small issue with loading private models from Hugging Face using the token argument.

Install this version with

##### Training + Inference
pip install sentence-transformers[train]==3.3.1
##### Inference only, use one of:
pip install sentence-transformers==3.3.1
pip install sentence-transformers[onnx-gpu]==3.3.1
pip install sentence-transformers[onnx]==3.3.1
pip install sentence-transformers[openvino]==3.3.1
Details

If you're loading model under this scenario:

  • Your model is hosted on Hugging Face.
  • Your model is private.
  • You haven't set the HF_TOKEN environment variable via huggingface-cli login or some other approach.
  • You're passing the token argument to SentenceTransformer to load the model.

Then you may have encountered a crash in v3.3.0. This should be resolved now.

All Changes

Full Changelog: UKPLab/sentence-transformers@v3.3.0...v3.3.1

v3.3.0: - Massive CPU speedup with OpenVINO int8 quantization; Training with Prompts for stronger models; NanoBEIR IR evaluation; PEFT compatibility; Transformers v4.46.0 compatibility

Compare Source

4x speedup for CPU with OpenVINO int8 static quantization, training with prompts for a free performance boost, convenient evaluation on NanoBEIR: a subset of a strong Information Retrieval benchmark, PEFT compatibility by easily adding/loading adapters, Transformers v4.46.0 compatibility, and Python 3.8 deprecation.

Install this version with:

##### Training + Inference
pip install sentence-transformers[train]==3.3.0

##### Inference only, use one of:
pip install sentence-transformers==3.3.0
pip install sentence-transformers[onnx-gpu]==3.3.0
pip install sentence-transformers[onnx]==3.3.0
pip install sentence-transformers[openvino]==3.3.0
OpenVINO int8 static quantization (https://github.com/UKPLab/sentence-transformers/pull/3025)

We introduce int8 static quantization using OpenVINO, a highly performant solution that outperforms all other current backends by a mile, at a minimal loss in performance. Here are the updated benchmarks:

Quantizing directly to the Hugging Face Hub
from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model

##### 1. Load a model with the OpenVINO backend
model = SentenceTransformer("all-MiniLM-L6-v2", backend="openvino")

##### 2. Quantize the model to int8, push the model to https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
##### as a pull request:
export_static_quantized_openvino_model(
    model,
    quantization_config=None,
    model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
    push_to_hub=True,
    create_pr=True,
)

You can immediately use the model, even before it's merged, by using the revision argument:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
    revision=f"refs/pr/{pull_request_nr}"
)

And once it's merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    backend="openvino",
    model_kwargs={"file_name": "openvino/openvino_model_qint8_quantized.xml"},
)
Quantizing locally

You can also quantize a model and save it locally:

from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model
from optimum.intel import OVQuantizationConfig

model = SentenceTransformer("all-mpnet-base-v2", backend="openvino")
model.save_pretrained("path/to/all-mpnet-base-v2-local")
quantization_config = OVQuantizationConfig() # <- You can update settings here
export_static_quantized_openvino_model(model, quantization_config, "path/to/all-mpnet-base-v2-local")

And after quantizing, you can load it like so:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "path/to/all-mpnet-base-v2-local",
    backend="openvino",
    model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
)

All original Sentence Transformer models already have these new openvino_model_qint8_quantized.xml files, so you can load them without exporting directly! I would recommend making pull requests for other models on Hugging Face that you'd like to see quantized.

Learn more about how to Speed up Inference in the documentation: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

Training with Prompts (https://github.com/UKPLab/sentence-transformers/pull/2964)

Many modern embedding models are trained with “instructions” or “prompts” following the INSTRUCTOR paper. These prompts are strings, prefixed to each text to be embedded, allowing the model to distinguish between different types of text.

For example, the mixedbread-ai/mxbai-embed-large-v1 model was trained with Represent this sentence for searching relevant passages: as the prompt for all queries. This prompt is stored in the model configuration under the prompt name "query", so users can specify that prompt_name in model.encode:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
query_embedding = model.encode("What are Pandas?", prompt_name="query")

##### or
##### query_embedding = model.encode("What are Pandas?", prompt="Represent this sentence for searching relevant passages: ")
document_embeddings = model.encode([
    "Pandas is a software library written for the Python programming language for data manipulation and analysis.",
    "Pandas are a species of bear native to South Central China. They are also known as the giant panda or simply panda.",
    "Koala bears are not actually bears, they are marsupials native to Australia.",
])
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)

##### => tensor([[0.7594, 0.7560, 0.4674]])

Various papers (INSTRUCTOR, BGE) show that including prompts or instructions both during training and inference results in stronger performance. As of this release, it's now possible to easily train with prompts in Sentence Transformers with just one extra training argument: prompts. There are 4 accepted formats for it:

  1. str: A single prompt to use for all columns in all datasets. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts="text: ",
        ...,
    )
  2. Dict[str, str]: A dictionary mapping column names to prompts, applied to all datasets. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "query": "query: ",
            "answer": "document: ",
        },
        ...,
    )
  3. Dict[str, str]: A dictionary mapping dataset names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "stsb": "Represent this text for semantic similarity search: ",
            "nq": "Represent this text for retrieval: ",
        },
        ...,
    )
  4. Dict[str, Dict[str, str]]: A dictionary mapping dataset names to dictionaries mapping column names to prompts. This should only be used if your training/evaluation/test datasets are a DatasetDict or a dictionary of Dataset. For example:
    args = SentenceTransformerTrainingArguments(
        ...,
        prompts={
            "stsb": {
                "sentence1": "sts: ",
                "sentence2": "sts: ",
            },
            "nq": {
                "query": "query: ",
                "document": "document: ",
            },
        },
        ...,
    )

I've trained models with and without prompts for 2 base models: mpnet-base and bert-base-uncased:

For both base models, the model with prompts consistently outperformed the baseline model. After training, the models with prompts resulted in a 0.66% and 0.90% relative improvement on NDCG@10 at no extra cost.

mpnet-base tests bert-base-uncased tests
mpnet_base_nq_nanobeir bert_base_nq_nanobeir
NanoBEIR Evaluator integration (https://github.com/UKPLab/sentence-transformers/pull/2966)

This update introduced a new simple NanoBEIREvaluator, evaluating your model against NanoBEIR: a collection of subsets of the 13 BEIR datasets. BEIR corresponds to the retrieval tab of MTEB, and is commonly seen as a valuable indicator of general-purpose information retrieval performance.

With the NanoBEIREvaluator, you can easily evaluate your models on a much faster benchmark that should give similar insights in performance as BEIR. You can use it like so:

from sentence_transformers.evaluation import NanoBEIREvaluator
from sentence_transformers import SentenceTransformer
import logging

##### Optional, but nice to get human-readable results in the terminal
logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO
)

##### 1. Load a model
model = SentenceTransformer("all-mpnet-base-v2", backend="onnx")

##### 2. Initialize the evaluator
evaluator = NanoBEIREvaluator()

##### 3. Call the evaluator to get a dictionary of metric names to values
results = evaluator(model)
"""
NanoBEIR Evaluation of the model on ['climatefever', 'dbpedia', 'fever', 'fiqa2018', 'hotpotqa', 'msmarco', 'nfcorpus', 'nq', 'quoraretrieval', 'scidocs', 'arguana', 'scifact', 'touche2020'] dataset:
Evaluating NanoClimateFEVER
Information Retrieval Evaluation of the model on the NanoClimateFEVER dataset:
Queries: 50
Corpus: 3408

Score-Function: cosine
Accuracy@1: 24.00%
Accuracy@3: 36.00%
Accuracy@5: 44.00%
Accuracy@10: 66.00%
Precision@1: 24.00%
Precision@3: 14.00%
Precision@5: 10.40%
Precision@10: 9.00%
Recall@1: 9.50%
Recall@3: 17.33%
Recall@5: 22.90%
Recall@10: 36.07%
MRR@10: 0.3311
NDCG@10: 0.2618
MAP@100: 0.1982
Evaluating NanoDBPedia
Information Retrieval Evaluation of the model on the NanoDBPedia dataset:
Queries: 50
Corpus: 6045

Score-Function: cosine
Accuracy@1: 66.00%
Accuracy@3: 88.00%
Accuracy@5: 88.00%
Accuracy@10: 88.00%
Precision@1: 66.00%
Precision@3: 58.00%
Precision@5: 52.00%
Precision@10: 43.60%
Recall@1: 6.87%
Recall@3: 14.70%
Recall@5: 20.30%
Recall@10: 27.62%
MRR@10: 0.7533
NDCG@10: 0.5384
MAP@100: 0.3796
Evaluating NanoFEVER
Information Retrieval Evaluation of the model on the NanoFEVER dataset:
Queries: 50
Corpus: 4996

... (truncated for brevity)

Aggregated for Score Function: cosine
Accuracy@1: 52.87%
Accuracy@3: 71.35%
Accuracy@5: 78.45%
Accuracy@10: 85.07%
Precision@1: 52.87%
Recall@1: 30.28%
Precision@3: 33.78%
Recall@3: 47.93%
Precision@5: 26.23%
Recall@5: 55.04%
Precision@10: 18.07%
Recall@10: 62.54%
MRR@10: 0.6334
NDCG@10: 0.5758
"""

##### 4. Print the results
print(evaluator.primary_metric)

##### => "NanoBEIR_mean_cosine_ndcg@10"
print(results[evaluator.primary_metric])

##### => 0.5758124378869705
Advanced Usage

You can also specify a subset of datasets, and you can specify query and/or corpus prompts, if your model uses them. For example:

import logging
from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

##### Optional, but nice to get human-readable results in the terminal
logging.basicConfig(
    format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO
)

model = SentenceTransformer('intfloat/multilingual-e5-large-instruct')

datasets = ["QuoraRetrieval", "MSMARCO"]
query_prompts = {
    "QuoraRetrieval": "Instruct: Given a question, retrieve questions that are semantically equivalent to the given question\\nQuery: ",
    "MSMARCO": "Instruct: Given a web search query, retrieve relevant passages that answer the query\\nQuery: "
}

evaluator = NanoBEIREvaluator(
    dataset_names=datasets,
    query_prompts=query_prompts,
)

results = evaluator(model)
'''
NanoBEIR Evaluation of the model on ['QuoraRetrieval', 'MSMARCO'] dataset:
Evaluating NanoQuoraRetrieval
Information Retrieval Evaluation of the model on the NanoQuoraRetrieval dataset:
Queries: 50
Corpus: 5046

Score-Function: cosine
Accuracy@1: 92.00%
Accuracy@3: 98.00%
Accuracy@5: 100.00%
Accuracy@10: 100.00%
Precision@1: 92.00%
Precision@3: 40.67%
Precision@5: 26.00%
Precision@10: 14.00%
Recall@1: 81.73%
Recall@3: 94.20%
Recall@5: 97.93%
Recall@10: 100.00%
MRR@10: 0.9540
NDCG@10: 0.9597
MAP@100: 0.9395

Evaluating NanoMSMARCO
Information Retrieval Evaluation of the model on the NanoMSMARCO dataset:
Queries: 50
Corpus: 5043

Score-Function: cosine
Accuracy@1: 40.00%
Accuracy@3: 74.00%
Accuracy@5: 78.00%
Accuracy@10: 88.00%
Precision@1: 40.00%
Precision@3: 24.67%
Precision@5: 15.60%
Precision@10: 8.80%
Recall@1: 40.00%
Recall@3: 74.00%
Recall@5: 78.00%
Recall@10: 88.00%
MRR@10: 0.5849
NDCG@10: 0.6572
MAP@100: 0.5892
Average Queries: 50.0
Average Corpus: 5044.5

Aggregated for Score Function: cosine
Accuracy@1: 66.00%
Accuracy@3: 86.00%
Accuracy@5: 89.00%
Accuracy@10: 94.00%
Precision@1: 66.00%
Recall@1: 60.87%
Precision@3: 32.67%
Recall@3: 84.10%
Precision@5: 20.80%
Recall@5: 87.97%
Precision@10: 11.40%
Recall@10: 94.00%
MRR@10: 0.7694
NDCG@10: 0.8085
'''
print(evaluator.primary_metric)

##### => "NanoBEIR_mean_cosine_ndcg@10"
print(results[evaluator.primary_metric])

##### => 0.8084508771660436
PEFT compatibility (https://github.com/UKPLab/sentence-transformers/pull/3000, https://github.com/UKPLab/sentence-transformers/pull/2980, https://github.com/UKPLab/sentence-transformers/pull/3046)

Sentence Transformers has been integrated much more closely with PEFT. Notably, we introduce new methods:

These methods allow you to add new PEFT adapters or load pretrained ones, for example:

Adding a adapter
from sentence_transformers import SentenceTransformer

##### 1. Load a model to finetune with 2. (Optional) model card data
model = SentenceTransformer(
    "all-MiniLM-L6-v2",
    model_card_data=SentenceTransformerModelCardData(
        language="en",
        license="apache-2.0",
        model_name="all-MiniLM-L6-v2 adapter finetuned on GooAQ pairs",
    ),
)

##### 2. Create a LoRA adapter for the model & add it
peft_config = LoraConfig(
    task_type=TaskType.FEATURE_EXTRACTION,
    inference_mode=False,
    r=8,
    lora_alpha=32,
    lora_dropout=0.1,
)
model.add_adapter(peft_config)

##### Proceed as usual... See https://sbert.net/docs/sentence_transformer/training_overview.html
Loading a pretrained adapter

Given sentence-transformers-testing/stsb-bert-tiny-lora as a small adapter model (the adapter_model.safetensors file is only 33.8kB!) on top of sentence-transformers-testing/stsb-bert-tiny-safetensors, you can either load this adapter directly:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-lora")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
print(embeddings.shape)

##### (2, 128)

Or you can load the original model and load the adapter into it:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-safetensors")
model.load_adapter("sentence-transformers-testing/stsb-bert-tiny-lora")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
print(embeddings.shape)

##### (2, 128)
Transformers v4.46.0 compatibility (https://github.com/UKPLab/sentence-transformers/pull/3026, https://github.com/UKPLab/sentence-transformers/pull/3035, https://github.com/UKPLab/sentence-transformers/pull/3037, https://github.com/UKPLab/sentence-transformers/pull/3038)

The recent transformers v4.46.0 update introduced a few changes that were incompatible with Sentence Transformers. For example:

  • Use "processing_class" argument instead of "tokenizers"
  • Add a num_items_in_batch argument to the compute_loss method in the Trainer
  • Adding a ValueError if eval_dataset is None while eval_strategy is not "no" (this should be possible in Sentence Transformers, as we accept evaluating with just an evaluator as well)

These issues and deprecation warnings have been resolved.

Drop Python 3.8 support (https://github.com/UKPLab/sentence-transformers/pull/3033)

Given that Python 3.8 has now reached it's end of life, Sentence Transformers will no longer support it.

All Changes
New Contributors
Special Thanks

Big thanks to @​ArthurCamara for leading the work on both 1) training with prompts and 2) NanoBEIR.

Full Changelog: UKPLab/sentence-transformers@v3.2.1...v3.3.0

v3.2.1: - Patch CLIP loading, small ONNX fix, compatibility with other libraries

Compare Source

This patch release fixes some small bugs, such as related to loading CLIP models, automatic model card generation issues, and ensuring compatibility with third party libraries.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==3.2.1

### Inference only, use one of:
pip install sentence-transformers==3.2.1
pip install sentence-transformers[onnx-gpu]==3.2.1
pip install sentence-transformers[onnx]==3.2.1
pip install sentence-transformers[openvino]==3.2.1

Fixing Loading non-Transformer models

In v3.2.0, a non-Transformer based model (e.g. CLIP) would not load correctly if the model was saved in the root of the model repository/directory. This has been resolved in #​3007.

Throw error if StaticEmbedding-based model is finetuned with incompatible losses

The following losses are not compatible with StaticEmbedding-based models:

  • CachedGISTEmbedLoss
  • CachedMultipleNegativesRankingLoss
  • CachedMultipleNegativesSymmetricRankingLoss
  • DenoisingAutoEncoderLoss
  • GISTEmbedLoss

An error is now thrown when one of these are used with a StaticEmbedding-based model. I recommend using MultipleNegativesRankingLoss to finetune these models, e.g. as in https://huggingface.co/tomaarsen/static-bert-uncased-gooaq.
Note: to get good performance, you must use much higher learning rates than otherwise. In my experiments, 2e-1 worked well.

Patch ONNX model when the model uses output_hidden_states

For example, this script used to fail, but passes now:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "distiluse-base-multilingual-cased",
    backend="onnx",
    model_kwargs={"provider": "CPUExecutionProvider"},
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings.shape)

All changes

New Contributors

Full Changelog: UKPLab/sentence-transformers@v3.2.0...v3.2.1

v3.2.0: - ONNX and OpenVINO backends offering 2-3x speedup; Static Embeddings offering 50x-500x speedups at ~10-20% performance cost

Compare Source

This release introduces 2 new efficient computing backends for SentenceTransformer models: ONNX and OpenVINO + optimization & quantization, allowing for speedups up to 2x-3x; static embeddings via Model2Vec allowing for lightning-fast models (i.e., 50x-500x speedups) at a ~10%-20% performance cost; and various small improvements and fixes.

Install this version with

### Training + Inference
pip install sentence-transformers[train]==3.2.0

### Inference only, use one of:
pip install sentence-transformers==3.2.0
pip install sentence-transformers[onnx-gpu]==3.2.0
pip install sentence-transformers[onnx]==3.2.0
pip install sentence-transformers[openvino]==3.2.0

Faster ONNX and OpenVINO Backends for SentenceTransformer (#​2712)

Introducing a new backend keyword argument to the SentenceTransformer initialization, allowing values of "torch" (default), "onnx", and "openvino".
These come with new installations:

pip install sentence-transformers[onnx-gpu]

### or ONNX for CPU only:
pip install sentence-transformers[onnx]

### or
pip install sentence-transformers[openvino]

It's as simple as:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

If you specify a backend and your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember to model.push_to_hub or model.save_pretrained into the same model repository or directory to avoid having to re-export the model every time.

All keyword arguments passed via model_kwargs will be passed on to ORTModel.from_pretrained or OVBaseModel.from_pretrained. The most useful arguments are:

  • provider: (Only if backend="onnx") ONNX Runtime provider to use for loading the model, e.g. "CPUExecutionProvider" . See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g. "CUDAExecutionProvider") will be used.
  • file_name: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.
  • export: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.

For example:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
    "all-MiniLM-L6-v2",
	backend="onnx",
	model_kwargs={
		"file_name": "model_O3.onnx",
		"provider": "CPUExecutionProvider",
	}
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
Benchmarks

We ran benchmarks for CPU and GPU, averaging findings across 4 models of various sizes, 3 datasets, and numerous batch sizes. Here are the findings:

These findings resulted in these recommendations:
image

For GPU, you can expect 2x speedup with fp16 at no cost, and for CPU you can expect ~2.5x speedup at a cost of 0.4% accuracy.

ONNX Optimization and Quantization

In addition to exporting default ONNX and OpenVINO models, we also introduce 2 helper methods for optimizing and quantizing ONNX models:

Optimization

export_optimized_onnx_model: This function uses Optimum to implement several optimizations in the ONNX model, ranging from basic optimizations to approximations and mixed precision. Read about the 4 default options here. This function accepts:

  • model A SentenceTransformer model loaded with backend="onnx".
  • optimization_config: "O1", "O2", "O3", or "O4" from 🤗 Optimum or a custom OptimizationConfig instance.
  • model_name_or_path: The directory or model repository where the optimized model will be saved.
  • push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
  • create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for optimizing models of repositories that you don't have write access to.
  • file_suffix: The suffix to add to the optimized model file name. Will use the optimization_config string or "optimized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_optimized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_optimized_onnx_model(
	model=onnx_model,
	optimization_config="O4",
	model_name_or_path="BAAI/bge-large-en-v1.5",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_O4.onnx"},
   revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_O4.onnx"},
)
Quantization

export_dynamic_quantized_onnx_model: This function uses Optimum to quantize the ONNX model to int8, also allowing for hardware-specific optimizations. This results in impressive speedups for CPUs. In my findings, each of the default quantization configuration options gave approximately the same performance improvements. This function accepts

  • model A SentenceTransformer model loaded with backend="onnx".
  • quantization_config: "arm64", "avx2", "avx512", or "avx512_vnni" representing quantization configurations from AutoQuantizationConfig, or an QuantizationConfig instance.
  • model_name_or_path: The directory or model repository where the optimized model will be saved.
  • push_to_hub: Whether the push the exported model to the hub with model_name_or_path as the repository name. If False, the model will be saved in the directory specified with model_name_or_path.
  • create_pr: If push_to_hub, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for quantizing models of repositories that you don't have write access to.
  • file_suffix: The suffix to add to the optimized model file name. Will use the quantization_config string or e.g. "int8_quantized" if not set.

The usage is like this:

from sentence_transformers import SentenceTransformer, export_quantized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_quantized_onnx_model(
	model=onnx_model,
	quantization_config="avx512",
	model_name_or_path="BAAI/bge-large-en-v1.5",
	push_to_hub=True,
	create_pr=True,
)

After which you can load the model with:

from sentence_transformers import SentenceTransformer

pull_request_nr = 2 # TODO: Update this to the number of your pull request
model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
   revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
   "BAAI/bge-large-en-v1.5",
   backend="onnx",
   model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
)

Lightning-Fast Static Embeddings via Model2Vec (#​2961)

If ONNX or OpenVINO isn't fast enough for you yet, then perhaps you'll enjoy Static Embeddings. These embeddings are a bit akin to GLoVe or Word2vec, i.e. they're bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks.

However, these Static Embeddings are created in different ways. For example:

  1. Distillation via the Model2Vec technique. This projects allows you to distill any Sentence Transformer model into Static Embeddings. For example, distilling BAAI/bge-base-en-v1.5 resulted in a Static Embeddings Sentence Transformer model that reaches 87.5% of the performance of all-MiniLM-L6-v2 on MTEB (+ PEARL & WordSim) and 97.4% of the performance of all-MiniLM-L6-v2 on various classification benchmarks.
    You can initialize Static Embeddings via Model2Vec in two ways:

note: pip install model2vec is needed, but not for inference

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model

static_embedding = StaticEmbedding.from_model2vec("minishlab/M2V_multilingual_output")
model = SentenceTransformer(modules=[static_embedding])

Encode some texts

queries = ["What is the capital of France?", "How many people live in the Netherlands?"]
documents = ["Paris is the capital of France", "The Netherlands has 17 million inhabitants"]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

Compute similarities

scores = model.similarity(query_embeddings, document_embeddings)
print(scores)
"""
tensor([[0.8170, 0.3843],
        [0.3929, 0.5818]])
"""
```
* [`from_distillation`](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.StaticEmbedding.from_distillation): You can use the name of any Sentence Transformer model alongside some parameters (See [this docs](https://redirect.github.com/MinishLab/model2vec#distilling-a-model2vec-model) for more information) to perform the distillation yourself, without needing any dataset. On my device, this takes ~4s on a GPU and ~2 minutes on a CPU:
```python

note: pip install model2vec is needed, but not for inference

from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

Initialize a Sentence Transformer model with a static embedding by distilling via model2vec

static_embedding = StaticEmbedding.from_distillation(
    "mixedbread-ai/mxbai-embed-large-v1",
    device="cuda",
    pca_dims=256,
    apply_zipf=True,
)
model = SentenceTransformer(modules=[static_embedding])

Encode some texts

queries = ["What is the capital of France?", "How many people live in the Netherlands?"]
documents = ["Paris is the capital of France", "The Netherlands has 17 million inhabitants"]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

Compute similarities

scores = model.similarity(query_embeddings, document_embeddings)
print(scores)
"""
tensor([[0.8430, 0.3271],
        [0.3213, 0.5861]])
"""
```
  1. Random initialization: Although this initialization needs finetuning, finetuning a Sentence Transformers model backed by StaticEmbedding is extremely fast. For example, I was able to finetune tomaarsen/static-bert-uncased-gooaq with MatryoshkaLoss & MultipleNegativesRankingLoss on the entire (3 million pairs) gooaq dataset in just 7 minutes. This model reaches a NDCG@10 of 79.33 on a hold-out set of 10k samples from gooaq, whereas e.g. BAAI/bge-base-en-v1.5 reaches 85.01 NDCG@10. In short, only 6.6% less performance for a model that's about 500x faster.
    That's not a typo: I can compute embeddings for about 14000 stsb sentences from per second on CPU, compared to about ~24 with BAAI/bge-base-en-v1.5, a.k.a. 625x faster.

[!NOTE]
You can save_pretrained and load these models like any other Sentence Transformer models, the StaticEmbedding initialization is only necessary when you're creating a new model.

  • Creation:
    from sentence_transformers import SentenceTransformer
    from sentence_transformers.models import StaticEmbedding
    
    # Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model
    static_embedding = StaticEmbedding.from_distillation(
        "mixedbread-ai/mxbai-embed-large-v1",
        device="cuda",
        pca_dims=256,
        apply_zipf=True,
    )
    model = SentenceTransformer(modules=[static_embedding])
    model.save_pretrained("static-mxbai-embed-large-v1")
    # or
    # model.push_to_hub("tomaarsen/static-mxbai-embed-large-v1")
  • Inference:
    from sentence_transformers import SentenceTransformer
    
    # Initialize a Sentence Transformer model with a static embedding
    model = SentenceTransformer("static-mxbai-embed-large-v1")
    
    model.encode([...])

Small changes

  • The InformationRetrievalEvaluator now accepts query_prompt, query_prompt_name, corpus_prompt, and corpus_prompt_name arguments, useful if your model requires specific prompts for queries and/or documents for the best performance. (#​2951)
  • The mine_hard_negatives function now accepts anchor_column_name and positive_column_name for specifying which dataset columns will be used. If not specified, the first two columns are used, respectively. Additionally, the min_score parameter is added, ensuring that all mined negatives have a similarity score of at least min_score according to the chosen SentenceTransformer or CrossEncoder model. (#​2977)
  • If you're using multiple evaluators during training via SequentialEvaluator, e.g. multiple evaluators for different Matryoshka dimensions, then the order is now preserved in the training logs in the model card. Previously, they were sorted by name, resulting in weird orderings (e.g. "gooaq-1024", "gooaq-128", "gooaq-256", "gooaq-32", "gooaq-512", "gooaq-64") (#​2963)
  • CachedGISTEmbedLoss has been improved to support multiple negatives per sample, i.e. the loss now accepts data in the (anchor, positive, negative_1, …, negative_n) format. It is the third loss to support this format (see docs):

image

All changes

New Contributors

Special thanks to @​echarlaix for making the new backends possible due to some last-minute changes in optimum and optimum-intel.

Full Changelog: UKPLab/sentence-transformers@v3.1.1...v3.2.0

v3.1.1: - Patch hard negative mining & remove numpy&lt;2 restriction

Compare Source

This patch release fixes hard negatives mining for models that don't automatically normalize their embeddings and it lifts the numpy<2 restriction that was previously required.

Install this version with

##### Full installation:
pip install sentence-transformers[train]==3.1.1

##### Inference only:
pip install sentence-transformers==3.1.1
Hard Negatives Mining Patch (#​2944)

The mine_hard_negatives utility introduced in the previous release would fail if use_faiss=True & the model does not automatically normalize its embeddings. This release patches that, allowing the utility to work with all Sentence Transformer models:

from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

##### Load a Sentence Transformer model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1").bfloat16()

##### Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train[:10000]")
print(dataset)
"""
Dataset({
    features: ['query', 'answer'],
    num_rows: 10000
})
"""

##### Mine hard negatives
dataset = mine_hard_negatives(
    dataset=dataset,
    model=model,
    range_min=10,
    range_max=50,
    max_score=0.8,
    margin=0.1,
    num_negatives=5,
    sampling_strategy="random",
    batch_size=128,
    use_faiss=True,
)
'''
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:21<00:00,  3.51it/s]
Batches: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 79/79 [0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/recap-utr/nlp-service).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy4zNzcuOCIsInVwZGF0ZWRJblZlciI6IjM5LjE5LjAiLCJ0YXJnZXRCcmFuY2giOiJtYWluIiwibGFiZWxzIjpbXX0=-->

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant