Implemented BLEU score, wrote unit tests and documentation for it. #1006

kadamrahul18 · 2025-01-09T04:11:54Z

LEU Metric Implementation:

Added a new BLEU class under sdks/python/src/opik/evaluation/metrics/heuristics/bleu.py.
Implemented the BLEU algorithm to calculate scores based on n-gram precision between the generated text and a reference text.
Included methods for handling both single sentences and corpus-level scoring.
Implemented smoothing techniques (methods 0, 1, 2, 3 from the Chen & Cherry paper) to address zero n-gram matches.
Added configuration options for n-gram order, smoothing method, and weights.

Unit Tests:

Added comprehensive unit tests in sdks/python/tests/unit/evaluation/metrics/test_heuristics.py to validate the BLEU metric's behavior in various scenarios:

Exact match, partial match, and no match cases.
Empty candidate and reference strings.
Different smoothing methods.
Corpus-level scoring.
Edge cases and error handling.

Integration with Evaluation Framework:

Added the BLEU class to the all list in sdks/python/src/opik/evaluation/metrics/heuristics/init.py to make it discoverable by the evaluate function.

Documentation:

Added a new documentation page for the BLEU metric (bleu.md) in the evaluation/metrics section of the documentation, detailing its purpose and usage.

Testing:

Thorough unit tests have been included to cover different aspects of the BLEU metric implementation, including edge cases and different smoothing methods.
All tests in the Python SDK, including the new tests for the BLEU metric, pass successfully when running pytest tests/ from the sdks/python directory.
pre-commit run --all-files has been executed successfully from the sdks/python directory, ensuring code style and formatting consistency.

Request for Review:

Please review the following aspects of this pull request:

Correctness of the BLEU metric implementation, including n-gram precision, brevity penalty, and smoothing.
Clarity and completeness of the unit tests.
Thoroughness of the documentation.
Adherence to Opik's coding standards and best practices.

Any feedback or suggestions for improvement are greatly appreciated.

alexkuzmik · 2025-01-09T15:12:33Z

Hi @kadamrahul18!
I can see that the code is based on the nltk library implementation, which is one of the most popular libraries when it comes to BLEU score calculation.
I prefer to just use NLTK as well, it's likely not the last heuristic metric we will add, and I don't think that populating the code base with non-trivial mathematical calculations is the right thing to do when there are already pretty stable and specialized tools for that.
What I suggest doing is something like that:

try:
    import nltk  # we won't add nltk as a package dependency, but we can add it to a separate requirements file for unit tests
except ImportError:
    nltk = None

...
class BLEU:
    def __init__(...):
        if nltk is None:
            raise ImportError("`nltk` library is required for BLEU score calculation, please install it via `pip install nltk`")

Under the hood of the metric implementation you can use nltk.translate.bleu_score.sentence_bleu or nltk.translate.bleu_score.corpus_bleu.

That way we'll be able to have a stable implementation and avoid a big chunk of mathematical code (which is almost always hard to read and easy to break :) )

Implemented BLEU score, wrote unit tests and documentation for it.

e7ff1dc

kadamrahul18 requested review from a team as code owners January 9, 2025 04:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented BLEU score, wrote unit tests and documentation for it. #1006

Implemented BLEU score, wrote unit tests and documentation for it. #1006

kadamrahul18 commented Jan 9, 2025

alexkuzmik commented Jan 9, 2025

Implemented BLEU score, wrote unit tests and documentation for it. #1006

Are you sure you want to change the base?

Implemented BLEU score, wrote unit tests and documentation for it. #1006

Conversation

kadamrahul18 commented Jan 9, 2025

LEU Metric Implementation:

alexkuzmik commented Jan 9, 2025