kodealltag-spelling-corrector

Preprocessing

simple regex pretokenizer: "[\w|�]+|[^\w\s]+"
tokens are then processed by BadCharReplacer (modified SpellingCorrection with EditDistance and GoogleBooks Unigram Corpus)

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
pickled_dicts/german		pickled_dicts/german
.gitignore		.gitignore
BCRPipeline.py		BCRPipeline.py
BadCharReplacer.py		BadCharReplacer.py
LICENSE		LICENSE
README.md		README.md
Windows-1252.txt		Windows-1252.txt
WordDictObject.py		WordDictObject.py
kodealltag.yml		kodealltag.yml
testingNotebook.ipynb		testingNotebook.ipynb