Skip to content

Node module containing an array of string matching algorithms written in rust, binded with Neon-bindings

Notifications You must be signed in to change notification settings

rupert648/fuzzy-neon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

fuzzy-neon

fuzzy-neon: Collection of fuzzy string matching algorithms written in rust

Created using neon.

Installation

Requires Node and Cargo to install.

Node In order to install fuzzy-neon you may need to first install cargo-cp-artifact:

$ npm i cargo-cp-artifcat

Installing the Module

$ npm i fuzzy-neon

Usage

const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1

Available String Matching Metrics

  • Hamming distance hamming(str1, str2)
  • Levenshtein (recursive impl) levenshtein(str1, str2)
  • Wagner-Fischer (dynamic programming impl of levenshtein) wagnerFischer(str1, str2)
  • Damerau-Levenshtein damerauLevenshtein(str1, str2)
  • Jaro-Winkler Distance jaroWinkler(str1, str2)
  • Longest Common Subsequence lcs(str1, str2)
  • n-gram based distance metric ngram(str1, str2, ngramSize)
  • Jensen-Shannon Vector Distance jensonshannonVector(str1, str2)

Algorithm Explanation/Useful Links

Hamming

Computes number of positions between two strings where characters differ. Expanded to allow strings of different lengths Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.hamming('nick', 'nice');
console.log(score);
// 1

Levenshtein

The Levenshtein distance between two strings is the minimum number of single-character edits (insertions, deletions or substitutions) to convert one word to the other. For efficient implementation use Wagner-Fischer Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.levenshtein('nick', 'nit');
console.log(score);
// 2

Wagner-Fischer

Implementation of Levenshtein using a faster, dynamic programming implementation - interesting to compare performance between the two. Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.wagnerFischer('nick', 'nit');
console.log(score);
// 2

Damerau-Levenshtein

Extension of the Levenshtein distance metric which allows for adjacent character transpositions Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.damerauLevenshtein('taco', 'atco');
console.log(score);
// 1

Jaro Winkler Distance

Extension of the Jaro distance between two strings; the Jaro distance uses the relative probability of each character in a string to calculate an edit score between two strings (see here for formula details). Winkler extended this to boost the scores of words which shared prefixes of some length. Returns a normalised score between 0 and 1. Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.jaroWinkler('nice', 'nick');
console.log(score);
// 0.11549999999999994

N-gram Based distance metric

n-gram based string distance metric implemented based on the work from this paper. Extremely good at integrating context into producing the metric score, O(n^2) complexity. Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.ngram('rupert', 'robert', 2); // last arg is size of ngram
console.log(score);
// 0.16666666666666666

Jensen Shannon Distance

Computes the relatively probability of events in the string (events being either single characters or bi-grams) porducing a probabilty vector. Then computes the Jensen Shannon distance metric over the two probability vectors. Produced from the work by Richard Connor et al in this paper. O(n) complexity. Example

const fuzzy = require('fuzzy-neon');
let score = fuzzy.jensonshannonVector('hi their', 'hi there');
console.log(score);
// 0.36638698518165513

About

Node module containing an array of string matching algorithms written in rust, binded with Neon-bindings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages