Squashed merge of develop for Release 4.1.

Squashed commit of the following: commit fd962ee Author: Greg Chapman <[email protected]> Date: Thu Jan 30 11:16:17 2025 -0800 Update READMEs, setup.py, and LICENSE for Release 4.1. commit 9f0cf12 Author: Greg Chapman <[email protected]> Date: Thu Jan 30 10:51:07 2025 -0800 Update tests to know about new symbol counting changes. commit 08e6185 Author: Greg Chapman <[email protected]> Date: Thu Jan 30 10:39:39 2025 -0800 Tests: add SER output to command line tests. commit 2188808 Author: Greg Chapman <[email protected]> Date: Thu Jan 30 10:14:47 2025 -0800 AnnExtra symbol count has len(content) now and AnnExtra symbol error count includes Levenshtein distance of content. AnnStaffGroups are sorted now, and instead of comparing all the part indices, we compare lowest and highest. commit 7dc9ef0 Author: Greg Chapman <[email protected]> Date: Tue Jan 28 16:45:47 2025 -0800 Make sure metadata item value ends up being a string. commit 3f78626 Author: Greg Chapman <[email protected]> Date: Tue Jan 28 16:37:47 2025 -0800 More symbol count (notation_size) and symbol error count (cost) changes. Trying to make them match eachother better, and make more sense. commit a58acf3 Author: Greg Chapman <[email protected]> Date: Tue Jan 28 12:33:35 2025 -0800 Release Notes again. commit a757ad7 Author: Greg Chapman <[email protected]> Date: Tue Jan 28 12:32:04 2025 -0800 Stop assuming that the two different extras are both either a Spanner or not. They could be one of each. commit 3e08d22 Author: Greg Chapman <[email protected]> Date: Mon Jan 27 15:57:25 2025 -0800 ReleaseNotes update. commit 1904528 Author: Greg Chapman <[email protected]> Date: Sun Jan 26 12:10:21 2025 -0800 Update ReleaseNotes 4.1 commit b060304 Author: Greg Chapman <[email protected]> Date: Sat Jan 25 14:01:25 2025 -0800 musicdiff text output expected results have changed a little due to symbol counting (notation_size and cost) changes. commit b618d74 Author: Greg Chapman <[email protected]> Date: Fri Jan 24 14:31:06 2025 -0800 AnnLyric.notation_size: identifiers are only worth 1, not len(identifier). Print SER even if cost == 0. Handle numSymbolsInGroundTruth being 0 without dividing by 0. commit a6f76e9 Author: Greg Chapman <[email protected]> Date: Fri Jan 24 14:27:30 2025 -0800 New release notes for v4.1.0 commit d05187d Author: Greg Chapman <[email protected]> Date: Thu Jan 23 15:18:44 2025 -0800 Another lyrics and extras adjustment (lower the costs). commit 09e92e1 Author: Greg Chapman <[email protected]> Date: Thu Jan 23 15:03:42 2025 -0800 For extras and lyrics, notation_size does not include offset/duration, and diff cost is incremented by only 1 for differences in each of those fields. commit 80a630a Author: Greg Chapman <[email protected]> Date: Thu Jan 23 12:33:15 2025 -0800 Better notation_size and comparison cost for extras and lyrics. commit 45b54e6 Author: Greg Chapman <[email protected]> Date: Tue Jan 21 14:14:03 2025 -0800 Ignore SenzaMisuraTimeSignature (since it is displayed as no timesig at all). commit 1d08dd9 Author: Greg Chapman <[email protected]> Date: Tue Jan 21 11:37:49 2025 -0800 Refactor SER output into Visualization, and return a dict[str, str]. To print it as text, we convert to JSON and print that. commit a176105 Author: Greg Chapman <[email protected]> Date: Thu Jan 16 09:03:42 2025 -0800 Compute SER = symbolic errors/num symbols in ground truth (i.e. file2). commit 067a96d Author: Greg Chapman <[email protected]> Date: Thu Jan 16 08:56:17 2025 -0800 Add to cost any syntax errors fixed by converter21 parse code. Some lint, too. commit bfdcf32 Author: Greg Chapman <[email protected]> Date: Mon Dec 2 17:19:53 2024 -0800 New output format "ser" that prints num errors/max num syms of the two scores. commit 16d2603 Author: Greg Chapman <[email protected]> Date: Mon Dec 2 12:23:37 2024 -0800 Always return cost in symbol errors from diff() and from musicdiff command. commit 31b31e7 Author: Greg Chapman <[email protected]> Date: Sun Dec 1 21:46:53 2024 -0800 First cut at fixing Humdrum syntax errors. commit e54c259 Author: Greg Chapman <[email protected]> Date: Sun Dec 1 19:49:01 2024 -0800 Back out that AnnStaffGroup cost change; I don't like the results, and I wasn't convinced to begin with. commit eafb018 Author: Greg Chapman <[email protected]> Date: Sun Dec 1 19:40:38 2024 -0800 More notation_size tweaks: AnnMeasure should include lyric sizes, and AnnStaffGroup should add 1 for each enclosed part/staff. commit 8f903d9 Author: Greg Chapman <[email protected]> Date: Sun Dec 1 19:29:09 2024 -0800 Fix comment typo. commit e6922a7 Author: Greg Chapman <[email protected]> Date: Sun Dec 1 19:22:43 2024 -0800 Don't precompute notation_size, cache it if it is ever computed. Many objects never are asked their notation size, especially if the scores are very similar, so don't pay the price unless you have to (but only pay it once). commit 352bc95 Author: Greg Chapman <[email protected]> Date: Wed Nov 27 17:49:58 2024 -0800 First cut at comparing different number of parts.
gregchapman-dev · Jan 30, 2025 · f7eefb8 · f7eefb8
1 parent 0ba675f
commit f7eefb8
Show file tree

Hide file tree

Showing 27 changed files with 567 additions and 195 deletions.
diff --git a/.pylintrc b/.pylintrc
@@ -325,6 +325,7 @@ exclude-protected=_asdict,_fields,_replace,_source,_make
 
 # Maximum number of arguments for function / method
 max-args=5
+max-positional-arguments=10
 
 # maximum boolean expressions in a line (too-many-boolean-expressions)
 max-bool-expr=10

diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,6 @@
 
 The MIT License (MIT)
-Copyright (c) 2022-2024 Francesco Foscarin, Greg Chapman
+Copyright (c) 2022-2025 Francesco Foscarin, Greg Chapman
 
 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
 

diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@ musicdiff is derived from: [music-score-diff](https://github.com/fosfrancesco/mu
     by [Francesco Foscarin](https://github.com/fosfrancesco).
 
 ## Setup
-Depends on [music21](https://pypi.org/project/music21) (version 9.1+),  [numpy](https://pypi.org/project/numpy), and [converter21](https://pypi.org/project/converter21) (version 3.2+). You also will need to configure music21 (instructions [here](https://web.mit.edu/music21/doc/usersGuide/usersGuide_01_installing.html)) to display a musical score (e.g. with MuseScore).  Requires Python 3.10+.
+Depends on [music21](https://pypi.org/project/music21) (version 9.1+),  [numpy](https://pypi.org/project/numpy), and [converter21](https://pypi.org/project/converter21) (version 3.3+). You also will need to configure music21 (instructions [here](https://web.mit.edu/music21/doc/usersGuide/usersGuide_01_installing.html)) to display a musical score (e.g. with MuseScore).  Requires Python 3.10+.
 
 ## Usage
 On the command line:
@@ -26,9 +26,10 @@ On the command line:
                     default this is ignored).
       -x/--exclude  one or more named details to exclude from comparison.  Can be any of the
                     named details accepted by -i/--include.
-      -o/--output   one or both of two output formats: text (or t) or visual (or v); the default
-                    is visual). visual (or v) requests production of marked-up score PDFs; text
-                    (or t) requests production of diff-like text output.
+      -o/--output   one or more of three output formats: text (or t) or visual (or v) or ser (or s);
+                    the default is visual). visual (or v) requests production of marked-up score
+                    PDFs; text (or t) requests production of diff-like text output; ser (or s)
+                    requests a JSON text output containing Symbolic Error Ratio information.
 
       file1         first music score file to compare (any format music21 or converter21 can parse)
       file2         second music score file to compare (any format music21 or converter21 can parse)

diff --git a/ReleaseNotes_4.1.0.txt b/ReleaseNotes_4.1.0.txt
@@ -0,0 +1,34 @@
+Changes since 4.0.0:
+    Add new output option that prints JSON containing the symbolic error rate (SER =
+        numSymbolErrors / numSymbolsInGroundTruth) to stdout (the JSON actually
+        contains all three numbers).  Ground truth is assumed to be the second file.
+        If numSymbolsInGroundTruth == 0, SER will be numSymbolErrors, to avoid divide
+        by zero.
+    Add new API Visualization.get_ser_output() that returns a dict containing the
+        symbolic error rate.
+    In support of SER, notation_sizes (a.k.a. symbol counts) and diff costs (a.k.a.
+        symbolic error counts) have been reviewed and updated:
+        AnnNote.notation_size(): add 1 symbol for slash on grace note
+        AnnExtra.notation_size(): 1 symbol for the text, add 1 symbol if there is any
+            style specified
+        AnnExtra diff error count: text diff is 1 symbol error, offset diff is 1 symbol
+            error, duration diff is 1 symbol error, style diff is 1 symbol error
+        AnnLyric.notation_size(): use len(text) as symbol count instead of 1;
+            add 1 symbol if there's a verse number;
+            add 1 symbol if there's a verse identifier different from the number;
+            add 1 symbol if styled
+        AnnLyric diff cost: text diff symbol error count is the Levenshtein distance,
+            verse number diff is 1 symbol error,  verse identifier diff is 1 symbol
+            error, offset diff is 1 symbol error, style diff is 1 symbol error
+        AnnMeasure.notation_size(): not just notes' symbols and extras' symbols, add in
+            the lyrics' symbols
+        AnnScore.notation_size(): not just parts' symbols, add in staff_groups' symbols
+            and metadata_items' symbols
+    Add support for comparing scores that have different number of parts (this previously
+        caused a failure). The existing parts are assumed to line up by index (as before,
+        score1 part 0 is compared with score2 part 0), and then we generate edits that
+        either delete the extra parts in score1, or add the extra parts in score2. The
+        number of symbol errors for those edits is simply the notation_size of (the
+        number of symbols in) the added or deleted parts.
+    Several smallish bugfixes.
+
diff --git a/musicdiff/__init__.py b/musicdiff/__init__.py
@@ -14,6 +14,7 @@
 
 import sys
 import os
+import json
 import typing as t
 from pathlib import Path
 
@@ -52,6 +53,8 @@ def diff(
     force_parse: bool = True,
     visualize_diffs: bool = True,
     print_text_output: bool = False,
+    print_ser_output: bool = False,
+    fix_first_file_syntax: bool = False,
     detail: DetailLevel | int = DetailLevel.Default
 ) -> int | None:
     '''
@@ -77,6 +80,16 @@ def diff(
         visualize_diffs (bool): Whether or not to render diffs as marked up PDFs. If False,
             the only result of the call will be the return value (the number of differences).
             (default is True)
+        print_text_output (bool): Whether or not to print diffs in diff-like text to stdout.
+            (default is False)
+        print_ser_output (bool): Whether or not to print the symbolic error rate (SER),
+            which is computed as number of symbolic errors divided by the max number of
+            symbols in the two scores.
+            (default is False)
+        fix_first_file_syntax (bool): Whether to attempt to fix syntax errors in the first
+            file (and add the number of such fixes to the returned number of edits/cost in
+            symbol errors).
+            (default is False)
         detail (DetailLevel | int): What level of detail to use during the diff.
             Can be DecoratedNotesAndRests, OtherObjects, AllObjects, Default (currently
             AllObjects), or any combination (with | or &~) of those or NotesAndRests,
@@ -85,8 +98,9 @@ def diff(
             Style, Metadata, or Voicing.
 
     Returns:
-        int | None: The number of differences found (0 means the scores were identical,
-            None means the diff failed)
+        int | None: The total cost of the edits, i.e. the number of individual symbols
+            that must be added or deleted. (0 means that the scores were identical, and
+            None means that one or more of the input files failed to parse.)
     '''
     # Use the Humdrum/MEI importers from converter21 in place of the ones in music21...
     # Comment out this line to go back to music21's built-in Humdrum/MEI importers.
@@ -130,7 +144,11 @@ def diff(
         if not badArg1:
             # pylint: disable=broad-except
             try:
-                sc = m21.converter.parse(score1, forceSource=force_parse)
+                sc = m21.converter.parse(
+                    score1,
+                    forceSource=force_parse,
+                    acceptSyntaxErrors=fix_first_file_syntax
+                )
                 if t.TYPE_CHECKING:
                     assert isinstance(sc, m21.stream.Score)
                 score1 = sc
@@ -176,11 +194,10 @@ def diff(
     annotated_score2: AnnScore = AnnScore(score2, detail)
 
     diff_list: list
-    _cost: int
-    diff_list, _cost = Comparison.annotated_scores_diff(annotated_score1, annotated_score2)
+    cost: int
+    diff_list, cost = Comparison.annotated_scores_diff(annotated_score1, annotated_score2)
 
-    numDiffs: int = len(diff_list)
-    if numDiffs != 0:
+    if cost != 0:
         if visualize_diffs:
             # you can change these three colors as you like...
             # Visualization.INSERTED_COLOR = 'red'
@@ -194,10 +211,21 @@ def diff(
             # 'score1 ' and 'score2 ', respectively, so you can see which is which.
             Visualization.show_diffs(score1, score2, out_path1, out_path2)
 
-        if print_text_output:
-            text_output: str = Visualization.get_text_output(
-                score1, score2, diff_list, score1Name=score1Name, score2Name=score2Name
-            )
+    if print_ser_output:
+        ser_output: dict = Visualization.get_ser_output(
+            cost, annotated_score2
+        )
+        jsonStr: str = json.dumps(ser_output, indent=4)
+        print(jsonStr)
+
+    if print_text_output:
+        text_output: str = Visualization.get_text_output(
+            score1, score2, diff_list, score1Name=score1Name, score2Name=score2Name
+        )
+        if text_output:
+            if print_ser_output and print_text_output:
+                # put a blank line between them
+                print('')
             print(text_output)
 
-    return numDiffs
+    return cost
diff --git a/musicdiff/__main__.py b/musicdiff/__main__.py
@@ -106,10 +106,23 @@
         "--output",
         default=["visual"],
         nargs="*",
-        choices=["visual", "v", "text", "t"],
+        choices=["visual", "v", "text", "t", "ser", "s"],
         help="'visual'/'v' is marked up scores, rendered to PDFs;"
-        + " 'text'/'t' is diff-like, written to stdout."
-        + " Either, both, or neither can be requested."
+        + " 'text'/'t' is diff-like, written to stdout;"
+        + " 'ser'/'s is the symbolic error rate (symbol errors/total symbols),"
+        + " written to stdout."
+        + " Any, all, or none of these can be requested."
+    )
+
+    parser.add_argument(
+        "--fix_first_file_syntax",
+        action='store_true',
+        help="If set, syntax errors in the first input file will be fixed"
+        + " (if possible) so the diff can continue. Any fixes will be"
+        + " added to the returned cost in symbol errors). Note that errors"
+        + " in the second file (assumed to be the ground truth) are never"
+        + " corrected.  Note also that this currently only works for Humdrum"
+        + " **kern files."
     )
 
     args = parser.parse_args()
@@ -222,16 +235,20 @@
 
     visualize_diffs: bool = "visual" in args.output or "v" in args.output
     print_text_output: bool = "text" in args.output or "t" in args.output
+    print_ser_output: bool = "ser" in args.output or "s" in args.output
+    fix_first_file_syntax: bool = args.fix_first_file_syntax is True
 
-    numDiffs: int | None = diff(
+    cost: int | None = diff(
         args.file1,
         args.file2,
         detail=detail,
         visualize_diffs=visualize_diffs,
-        print_text_output=print_text_output
+        print_text_output=print_text_output,
+        print_ser_output=print_ser_output,
+        fix_first_file_syntax=fix_first_file_syntax,
     )
 
-    if numDiffs is None:
+    if cost is None:
         print('musicdiff failed.', file=sys.stderr)
-    elif numDiffs == 0:
+    elif cost == 0:
         print(f'Scores in {args.file1} and {args.file2} are identical.', file=sys.stderr)