Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Levenshtein Automaton #5

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Levenshtein Automaton #5

wants to merge 5 commits into from

Conversation

Toflar
Copy link
Owner

@Toflar Toflar commented Oct 24, 2023

Implements a Levenshtein-Automaton which allows for faster trie searches by early aborting on prefixes that will never match the string. Basically, if the string is mysuperduperexample and you want to not know the distance of foobar to mysuperduperexample but just whether it matches a max distance of 2 for example, you can abort after foo because at this point, you know that foo can never match mysuperduperexample anymore.

What would be really nice to have on top:

  • Different costs for insertion, replacement and deletion
  • Damerau-Levenshtein support (so cost for transposition on top)
  • Configurable cost for a mistake on the very first character of a string. It's common in search engines to count a typo in the first character as 2 typos.

@Toflar Toflar marked this pull request as ready for review October 24, 2023 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant