Allow filtering samples by compound expressions including multiple scorers #1073

andrei-apollo · 2025-01-03T22:12:52Z

This is a reincarnation of #911

This PR contains:

What is the current behavior?

Samples can be filtered based on simple conditions including one scorer.

What is the new behavior?

Samples can be filtered by compound expressions like result == "C" and steps <= 10. Additionally, samples can be filtered based on input and target texts.

Expression parsing via filtrex. Supports arithmetic, basic math functions, Python-style boolean operations, chained comparisons.
Filter input via CodeMirror. Supports syntax highlighting and autocompletion.
The filter expression can include any scorer, not just the selected one.
Clicking on a score adds it to the filter. Moreover, for simple categorical scores the UI will automatically suggest expressions like result == "C".

Auxiliary changes:

Merged scorer and score selectors. Did this to keep tool panel width in check now that the filter field is wider.
Moved filter and scorer list to the right so that they are nicely aligned.
Made scorer list collision-proof. Now if two scorers define scores with the same name, the scorers panel will use dot notation to disambiguate, e.g. score.foo vs other_score.foo.

Does this PR introduce a breaking change?

No.

Other information:

Next steps:

I would like to also allow filtering by sample metadata and full-text search over the transcript. This is now easy to do from the UI perspective, but this would require loading the entire samples, not just the summaries.
Consider if the tool order could be improved. I find it a little confusing that the filter is to the right of the scorer selector, yet does not depend on it. Not sure how best to fix this, because I want to keep the filter aligned with the scorer list.

This PR is a work in progress. In particular, I remember that @dragonstyle suggested to only apply filter on Enter. This haven't been done yet. I'm also still figuring out some corner cases with different score types. Still, @dragonstyle, if you could take a look at the current state I would appreciate your feedback. Do you this is moving the right direction?

dragonstyle · 2025-01-06T15:32:39Z

This is looking really great to me, and I love that we'll be able to support much more robust filtering! I believe that the right autocomplete experience can make this nearly as easy to use as the simple selector. I have some suggestions to get there:

When the control is focused on(and empty), we should show autocomplete suggestions for the first 'segment' (the user is still free to type, but will typically see the scorer names as suggestions, for sample). Each time the user completes a 'segment' of the expression, I think we should automatically prompt for the next segment. e.g. Once I have a scorer name, we should suggest the various "==", "<", etc.. Once that is selected, if the scorer is categorical, we could suggest values). This will make the simple case of filtering by scorer just about as simple as it is now. It also will make learning very discoverable as users will see options each step of the way. (Sometimes showing autocomplete for the next 'segment' be possible, but when we can do so reasonably, I think we should).
The click scorer name is a pretty obscure affordance and I don't think we should rely on it (I'd like to see this removed and just solve for discovery using the filter input itself).
If you agree we can remove the scorer link affordance, I think we could move the filter to the left side of the scorer selector- I don't think its proximity to the scorer list is important in this case.
As you noted, filtering as each key is pressed is very disruptive (since all samples will always just 'disappear' until the expression is complete). I very much think we need to use enter, a wait / debounce, background evaluation to ensure it is a 'complete expression' or some other affordance to 'accept' the filter rather than filtering immediately upon key down.
I would make the (i) icon a (?) icon.
For advanced feature (awesome!) like input_contains, I think we should autocomplete to the function with the cursor in between parens if possible (e.g. input_contains(|))

Don't mean to flood with feedback, but this is definitely getting there and I'm looking forward to merging it!!

andrei-apollo · 2025-01-09T17:36:50Z

Thank for the feedback! Implemented all suggestions, please take a look.

dragonstyle · 2025-01-09T18:10:00Z

This is looking amazing! A few suggestions that are hopefully just minor tweaks...

Can we delay showing the expression error until the user has pressed enter or in some indicated that they are done? I think we're showing the red error wrapper too aggressively (and the red squiggles seems like plenty unless the user actually runs an expression which results in an error).
Currently, selecting the entry adds the selected text, but then the user needs to press space between steps to get the next suggestion, would be sweet if user selects options from menu if we just offer next step directly. Do you think its possible to enable the equivalent of this:

focus

select choice (press enter)

select equal (press enter)

select "I" (arrow, then enter)
Question - could we make the delete case just apply immediately since we 'know' that is a complete expression? This too inconsistent?
For long expressions, I am seeing a scroll bars (using safari) which are disruptive...

dragonstyle · 2025-01-09T18:11:33Z

(These check failures are not related to this PR and are related to ruff dependency version changes. They are now fixed on main so if you rebase against main they should go away - sorry!)

Uses filtrex to support compound expressions that allow to filter samples by multiple scores at a time.

… likely to continue

andrei-apollo · 2025-01-09T19:25:01Z

Good idea! That error message was annoying. Done.
Done.
Feels rather inconsistent too me, to be honest.
Hmm. Weird. For me it works fine in Safari as well:

What version do you have? Does the scrollbar always look like this or only sometimes?

dragonstyle · 2025-01-09T19:26:38Z

What version do you have? Does the scrollbar always look like this or only sometimes?

Version 18.1.1 (20619.2.8.11.12).

I only see it once I make the expression long and scroll with the mouse...

dragonstyle · 2025-01-09T19:49:39Z

One other question - rather than show the green feedback treatment once an expression is complete, maybe we should just apply the expression at that point since we know it will work?

It would still result in some changes to filtering in cases where the filters didn't narrow the set (e.g or) but I think that would be worth getting an even smoother experience.

andrei-apollo · 2025-01-09T22:21:08Z

My Safari version is slightly different (Version 18.2 (20620.1.16.11.8)), don't know if it's related or not. To be honest, debugging this kind of failure without being able to reproduce it would be quite hard. I decided to just remove the scroll bar. It's quite unusual for single-line text inputs to have scroll bars anyway.

andrei-apollo · 2025-01-09T22:24:09Z

Agreed. Applying the filter expression immediately but only when it's valid seems like a good approach. Changed the behavior and added color-coding, which is hopefully noticeable enough to make the current state clear, but not so much as to be distracting.

dragonstyle · 2025-01-09T22:34:32Z

This is a great improvement over our current filtering.

Nits:

I personally find the green outline maybe a bit overkill (perhaps we could just reflect the error or incomplete states and treat success states as just having no feedback). That said, I can definitely live with this approach if you feel strongly that the green is needed helpful.
I noticed that the popup options can sometimes be a bit aggressive. I'm not sure what the rule to filter this would be (or if there is a consistent rule to be applied, but I notice for an expression like: choice == "I" or input_contains("parallel") if I go back to edit the 'or' to 'and', it will popup choices after I complete the 'and'.
One tiny thing that might be a side effect of the clickable scorers - I think the duration can go back to the far right and the scorers in the middle now that they don't proximity to the filter.

I haven't looked closely at the code itself - LMK if you think that is ready to go and I can take a look (or just ping me whenever you think its good to go).

andrei-apollo force-pushed the main branch from c768a95 to 86736cc Compare January 3, 2025 22:19

jjallaire requested a review from dragonstyle January 4, 2025 00:47

andrei-apollo closed this Jan 9, 2025

andrei-apollo force-pushed the main branch from c879e4e to faf3b7d Compare January 9, 2025 16:27

andrei-apollo reopened this Jan 9, 2025

andrei-apollo marked this pull request as ready for review January 9, 2025 17:32

andrei-apollo added 10 commits January 9, 2025 19:12

Combine scorer and score selectors

3525aac

Filter expressions for samples

20902d8

Uses filtrex to support compound expressions that allow to filter samples by multiple scores at a time.

Add input_contains and target_contains samples filter predicates

a4b26f6

Use CodeMirror for sample filter input

55d905e

Revert score clickability

d3b9454

Smarter filter autocompletion and better score selector

780200f

More robust logic to ensure single line

d4c6042

Apply filter on Enter

90989a3

Automatically insert space after completion when filter expression is…

d7fb76a

… likely to continue

Show filter error text only on Enter

05a4221

andrei-apollo force-pushed the main branch from 3617f14 to 05a4221 Compare January 9, 2025 19:18

andrei-apollo added 3 commits January 9, 2025 22:17

Don't insert space after completing booleans in filter expression

d7511e1

Disable scroll bar on the filter input

cce0268

Apply valid filter expression immediately

cf485a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow filtering samples by compound expressions including multiple scorers #1073

Allow filtering samples by compound expressions including multiple scorers #1073

andrei-apollo commented Jan 3, 2025 •

edited

Loading

dragonstyle commented Jan 6, 2025

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 9, 2025 •

edited

Loading

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

andrei-apollo commented Jan 9, 2025

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

Allow filtering samples by compound expressions including multiple scorers #1073

Are you sure you want to change the base?

Allow filtering samples by compound expressions including multiple scorers #1073

Conversation

andrei-apollo commented Jan 3, 2025 • edited Loading

This PR contains:

What is the current behavior?

What is the new behavior?

Does this PR introduce a breaking change?

Other information:

dragonstyle commented Jan 6, 2025

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 9, 2025 • edited Loading

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

andrei-apollo commented Jan 9, 2025

andrei-apollo commented Jan 9, 2025

dragonstyle commented Jan 9, 2025

andrei-apollo commented Jan 3, 2025 •

edited

Loading

dragonstyle commented Jan 9, 2025 •

edited

Loading