Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow filtering samples by compound expressions including multiple scorers #1073

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

andrei-apollo
Copy link
Contributor

@andrei-apollo andrei-apollo commented Jan 3, 2025

This is a reincarnation of #911

This PR contains:

  • New features
  • Changes to dev-tools e.g. CI config / github tooling
  • Docs
  • Bug fixes
  • Code refactor

What is the current behavior?

Samples can be filtered based on simple conditions including one scorer.

What is the new behavior?

Samples can be filtered by compound expressions like result == "C" and steps <= 10. Additionally, samples can be filtered based on input and target texts.

  • Expression parsing via filtrex. Supports arithmetic, basic math functions, Python-style boolean operations, chained comparisons.
  • Filter input via CodeMirror. Supports syntax highlighting and autocompletion.
  • The filter expression can include any scorer, not just the selected one.
  • Clicking on a score adds it to the filter. Moreover, for simple categorical scores the UI will automatically suggest expressions like result == "C".

Auxiliary changes:

  • Merged scorer and score selectors. Did this to keep tool panel width in check now that the filter field is wider.
  • Moved filter and scorer list to the right so that they are nicely aligned.
  • Made scorer list collision-proof. Now if two scorers define scores with the same name, the scorers panel will use dot notation to disambiguate, e.g. score.foo vs other_score.foo.

Does this PR introduce a breaking change?

No.

Other information:

Next steps:

  • I would like to also allow filtering by sample metadata and full-text search over the transcript. This is now easy to do from the UI perspective, but this would require loading the entire samples, not just the summaries.
  • Consider if the tool order could be improved. I find it a little confusing that the filter is to the right of the scorer selector, yet does not depend on it. Not sure how best to fix this, because I want to keep the filter aligned with the scorer list.

This PR is a work in progress. In particular, I remember that @dragonstyle suggested to only apply filter on Enter. This haven't been done yet. I'm also still figuring out some corner cases with different score types. Still, @dragonstyle, if you could take a look at the current state I would appreciate your feedback. Do you this is moving the right direction?

@dragonstyle
Copy link
Collaborator

This is looking really great to me, and I love that we'll be able to support much more robust filtering! I believe that the right autocomplete experience can make this nearly as easy to use as the simple selector. I have some suggestions to get there:

  1. When the control is focused on(and empty), we should show autocomplete suggestions for the first 'segment' (the user is still free to type, but will typically see the scorer names as suggestions, for sample). Each time the user completes a 'segment' of the expression, I think we should automatically prompt for the next segment. e.g. Once I have a scorer name, we should suggest the various "==", "<", etc.. Once that is selected, if the scorer is categorical, we could suggest values). This will make the simple case of filtering by scorer just about as simple as it is now. It also will make learning very discoverable as users will see options each step of the way. (Sometimes showing autocomplete for the next 'segment' be possible, but when we can do so reasonably, I think we should).

  2. The click scorer name is a pretty obscure affordance and I don't think we should rely on it (I'd like to see this removed and just solve for discovery using the filter input itself).

  3. If you agree we can remove the scorer link affordance, I think we could move the filter to the left side of the scorer selector- I don't think its proximity to the scorer list is important in this case.

  4. As you noted, filtering as each key is pressed is very disruptive (since all samples will always just 'disappear' until the expression is complete). I very much think we need to use enter, a wait / debounce, background evaluation to ensure it is a 'complete expression' or some other affordance to 'accept' the filter rather than filtering immediately upon key down.

  5. I would make the (i) icon a (?) icon.

  6. For advanced feature (awesome!) like input_contains, I think we should autocomplete to the function with the cursor in between parens if possible (e.g. input_contains(|))

Don't mean to flood with feedback, but this is definitely getting there and I'm looking forward to merging it!!

@andrei-apollo andrei-apollo reopened this Jan 9, 2025
@andrei-apollo andrei-apollo marked this pull request as ready for review January 9, 2025 17:32
@andrei-apollo
Copy link
Contributor Author

Thank for the feedback! Implemented all suggestions, please take a look.

@dragonstyle
Copy link
Collaborator

This is looking amazing! A few suggestions that are hopefully just minor tweaks...

  1. Can we delay showing the expression error until the user has pressed enter or in some indicated that they are done? I think we're showing the red error wrapper too aggressively (and the red squiggles seems like plenty unless the user actually runs an expression which results in an error).

  2. Currently, selecting the entry adds the selected text, but then the user needs to press space between steps to get the next suggestion, would be sweet if user selects options from menu if we just offer next step directly. Do you think its possible to enable the equivalent of this:

    focus
    Screenshot 2025-01-09 at 1 00 28 PM

    select choice (press enter)
    Screenshot 2025-01-09 at 1 00 43 PM

    select equal (press enter)
    Screenshot 2025-01-09 at 1 01 05 PM

    select "I" (arrow, then enter)
    Screenshot 2025-01-09 at 1 07 10 PM

  3. Question - could we make the delete case just apply immediately since we 'know' that is a complete expression? This too inconsistent?

  4. For long expressions, I am seeing a scroll bars (using safari) which are disruptive...

    Screenshot 2025-01-09 at 12 56 46 PM

@dragonstyle
Copy link
Collaborator

dragonstyle commented Jan 9, 2025

(These check failures are not related to this PR and are related to ruff dependency version changes. They are now fixed on main so if you rebase against main they should go away - sorry!)

@andrei-apollo
Copy link
Contributor Author

  1. Good idea! That error message was annoying. Done.
  2. Done.
  3. Feels rather inconsistent too me, to be honest.
  4. Hmm. Weird. For me it works fine in Safari as well:
    image
    What version do you have? Does the scrollbar always look like this or only sometimes?

@dragonstyle
Copy link
Collaborator

What version do you have? Does the scrollbar always look like this or only sometimes?

Version 18.1.1 (20619.2.8.11.12).

I only see it once I make the expression long and scroll with the mouse...

@dragonstyle
Copy link
Collaborator

One other question - rather than show the green feedback treatment once an expression is complete, maybe we should just apply the expression at that point since we know it will work?

It would still result in some changes to filtering in cases where the filters didn't narrow the set (e.g or) but I think that would be worth getting an even smoother experience.

@andrei-apollo
Copy link
Contributor Author

My Safari version is slightly different (Version 18.2 (20620.1.16.11.8)), don't know if it's related or not. To be honest, debugging this kind of failure without being able to reproduce it would be quite hard. I decided to just remove the scroll bar. It's quite unusual for single-line text inputs to have scroll bars anyway.

@andrei-apollo
Copy link
Contributor Author

Agreed. Applying the filter expression immediately but only when it's valid seems like a good approach. Changed the behavior and added color-coding, which is hopefully noticeable enough to make the current state clear, but not so much as to be distracting.

@dragonstyle
Copy link
Collaborator

This is a great improvement over our current filtering.

Nits:

  • I personally find the green outline maybe a bit overkill (perhaps we could just reflect the error or incomplete states and treat success states as just having no feedback). That said, I can definitely live with this approach if you feel strongly that the green is needed helpful.

  • I noticed that the popup options can sometimes be a bit aggressive. I'm not sure what the rule to filter this would be (or if there is a consistent rule to be applied, but I notice for an expression like: choice == "I" or input_contains("parallel") if I go back to edit the 'or' to 'and', it will popup choices after I complete the 'and'.

  • One tiny thing that might be a side effect of the clickable scorers - I think the duration can go back to the far right and the scorers in the middle now that they don't proximity to the filter.

I haven't looked closely at the code itself - LMK if you think that is ready to go and I can take a look (or just ping me whenever you think its good to go).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants