fuzzy-neon: Collection of fuzzy string matching algorithms written in rust
Created using neon.
Requires Node
and Cargo
to install.
Node
In order to install fuzzy-neon you may need to first install cargo-cp-artifact
:
$ npm i cargo-cp-artifcat
Installing the Module
$ npm i fuzzy-neon
const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1
- Hamming distance
hamming(str1, str2)
- Levenshtein (recursive impl)
levenshtein(str1, str2)
- Wagner-Fischer (dynamic programming impl of levenshtein)
wagnerFischer(str1, str2)
- Damerau-Levenshtein
damerauLevenshtein(str1, str2)
- Jaro-Winkler Distance
jaroWinkler(str1, str2)
- Longest Common Subsequence
lcs(str1, str2)
- n-gram based distance metric
ngram(str1, str2, ngramSize)
- Jensen-Shannon Vector Distance
jensonshannonVector(str1, str2)
Computes number of positions between two strings where characters differ. Expanded to allow strings of different lengths Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1
The Levenshtein distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) to convert one word to the other. For efficient implementation use Wagner-Fischer Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.levenshtein('nick', 'nit');
console.log(score);
// 2
Implementation of Levenshtein using a faster, dynamic programming implementation - interesting to compare performance between the two. Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.wagnerFischer('nick', 'nit');
console.log(score);
// 2
Extension of the Levenshtein distance metric which allows for adjacent character transpositions Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.damerauLevenshtein('taco', 'atco');
console.log(score);
// 1
Extension of the Jaro distance between two strings; the Jaro distance uses the relative probability of each character in a string to calculate an edit score between two strings (see here for formula details). Winkler extended this to boost the scores of words which shared prefixes of some length. Returns a normalised score between 0 and 1. Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.jaroWinkler('nice', 'nick');
console.log(score);
// 0.11549999999999994
n-gram based string distance metric implemented based on the work from this paper. Extremely good at integrating context into producing the metric score, O(n^2) complexity. Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.ngram('rupert', 'robert', 2); // last arg is size of ngram
console.log(score);
// 0.16666666666666666
Computes the relatively probability of events in the string (events being either single characters or bi-grams) porducing a probabilty vector. Then computes the Jensen Shannon distance metric over the two probability vectors. Produced from the work by Richard Connor et al in this paper. O(n) complexity. Example
const fuzzy = require('fuzzy-neon');
let score = fuzzy.jensonshannonVector('hi their', 'hi there');
console.log(score);
// 0.36638698518165513