Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #26 (modernize) #29

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
on: [push, pull_request]
name: Test
jobs:
test:
strategy:
matrix:
go-version: [1.16.x, 1.18.x, 1.20.x]
os: [ubuntu-latest, macos-latest, windows-latest]
runs-on: ${{ matrix.os }}
steps:
- name: Setup
uses: actions/setup-go@v3
with:
go-version: ${{ matrix.go-version }}
- name: Checkout
uses: actions/checkout@v3
- name: Test
run: go test ./...
- name: Update Coverage
uses: ncruces/go-coverage-report@main
if: |
matrix.os == 'ubuntu-latest' &&
matrix.go-version == '1.20.x' &&
github.event_name == 'push'
continue-on-error: true
54 changes: 0 additions & 54 deletions Godeps/Godeps.json

This file was deleted.

5 changes: 0 additions & 5 deletions Godeps/Readme

This file was deleted.

14 changes: 8 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,10 @@
Unfortunately it is not possible for me to continue maintaining this library at the moment.
Please feel free to make pull requests and I will do my best to merge them.

[![wercker status](https://app.wercker.com/status/9e2a695f35c1cf5e1cac46035e4ca7a6/s/master "wercker status")](https://app.wercker.com/project/byKey/9e2a695f35c1cf5e1cac46035e4ca7a6)
[![Coverage Status](https://img.shields.io/coveralls/chrisport/go-lang-detector.svg)](https://coveralls.io/r/chrisport/go-lang-detector?branch=master)
[![Go Reference](https://pkg.go.dev/badge/github.com/chrisport/go-lang-detector.svg)](https://pkg.go.dev/github.com/chrisport/go-lang-detector)
[![BuildStatus](https://github.com/chrisport/go-lang-detector/actions/workflows/test.yml/badge.svg)
[![Coverage](https://github.com/chrisport/go-lang-detector/wiki/coverage.svg)](https://raw.githack.com/wiki/chrisport/go-lang-detector/coverage.html)
[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)

Breaking changes in v0.2: please see chapter "Migration" below.
Previous version is available under Release v0.1: https://github.com/chrisport/go-lang-detector/releases/tag/v0.1
Expand All @@ -13,10 +15,10 @@ Previous version is available under Release v0.1: https://github.com/chrisport/g

This golang library provides functionality to analyze and recognize language based on text.

The implementation is based on the following paper:
N-Gram-Based Text Categorization
William B. Cavnar and John M. Trenkle
Environmental Research Institute of Michigan P.O. Box 134001
The implementation is based on the following paper:
N-Gram-Based Text Categorization
William B. Cavnar and John M. Trenkle
Environmental Research Institute of Michigan P.O. Box 134001
Ann Arbor MI 48113-4001

### Detection by Language profile
Expand Down
11 changes: 11 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
module github.com/chrisport/go-lang-detector

go 1.20

require github.com/smartystreets/goconvey v1.7.2

require (
github.com/gopherjs/gopherjs v0.0.0-20181017120253-0766667cb4d1 // indirect
github.com/jtolds/gls v4.20.0+incompatible // indirect
github.com/smartystreets/assertions v1.2.0 // indirect
)
13 changes: 13 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
github.com/gopherjs/gopherjs v0.0.0-20181017120253-0766667cb4d1 h1:EGx4pi6eqNxGaHF6qqu48+N2wcFQ5qg5FXgOdqsJ5d8=
github.com/gopherjs/gopherjs v0.0.0-20181017120253-0766667cb4d1/go.mod h1:wJfORRmW1u3UXTncJ5qlYoELFm8eSnnEO6hX4iZ3EWY=
github.com/jtolds/gls v4.20.0+incompatible h1:xdiiI2gbIgH/gLH7ADydsJ1uDOEzR8yvV7C0MuV77Wo=
github.com/jtolds/gls v4.20.0+incompatible/go.mod h1:QJZ7F/aHp+rZTRtaJ1ow/lLfFfVYBRgL+9YlvaHOwJU=
github.com/smartystreets/assertions v1.2.0 h1:42S6lae5dvLc7BrLu/0ugRtcFVjoJNMC/N3yZFZkDFs=
github.com/smartystreets/assertions v1.2.0/go.mod h1:tcbTF8ujkAEcZ8TElKY+i30BzYlVhC/LOxJk7iOWnoo=
github.com/smartystreets/goconvey v1.7.2 h1:9RBaZCeXEQ3UselpuwUQHltGVXvdwm6cv1hgR6gDIPg=
github.com/smartystreets/goconvey v1.7.2/go.mod h1:Vw0tHAZW6lzCRk3xgdin6fKYcG+G3Pg9vgXWeJpQFMM=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/tools v0.0.0-20190328211700-ab21143f2384/go.mod h1:LCzVGOaR6xXOjkQ3onu1FJEFr0SW1gC7cKk1uF8kGRs=
72 changes: 37 additions & 35 deletions langdet/analyzing.go
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package langdet

import (
"bufio"
"bytes"
"sort"
"strings"
Expand Down Expand Up @@ -47,11 +48,13 @@ func CreateRankLookupMap(input map[string]int) map[string]int {
// CreateOccurenceMap creates a map[token]occurrence from a given text and up to a given gram depth
// gramDepth=1 means only 1-letter tokens are created, gramDepth=2 means 1- and 2-letters token are created, etc.
func CreateOccurenceMap(text string, gramDepth int) map[string]int {
text = cleanText(text)
tokens := strings.Split(text, " ")
text = cleaner.Replace(text)
scanner := bufio.NewScanner(strings.NewReader(text))
scanner.Split(bufio.ScanWords)

result := make(map[string]int)
for _, token := range tokens {
analyseToken(result, token, gramDepth)
for scanner.Scan() {
analyseToken(result, scanner.Text(), gramDepth)
}
return result
}
Expand All @@ -73,7 +76,7 @@ func generateNthGrams(resultMap map[string]int, text string, n int) {
text = padding + text + padding
upperBound := utf8.RuneCountInString(text) - (n - 1)
for p := 0; p < upperBound; p++ {
currentToken := text[p: p+n]
currentToken := text[p : p+n]
resultMap[currentToken] += 1
}
}
Expand All @@ -88,33 +91,32 @@ func createPadding(length int) string {
return buffer.String()
}

// cleanText removes newlines, special characters and numbers from a input text
func cleanText(text string) string {
text = strings.Replace(text, "\n", " ", -1)
text = strings.Replace(text, ",", " ", -1)
text = strings.Replace(text, "#", " ", -1)
text = strings.Replace(text, "/", " ", -1)
text = strings.Replace(text, "\\", " ", -1)
text = strings.Replace(text, ".", " ", -1)
text = strings.Replace(text, "!", " ", -1)
text = strings.Replace(text, "?", " ", -1)
text = strings.Replace(text, ":", " ", -1)
text = strings.Replace(text, ";", " ", -1)
text = strings.Replace(text, "-", " ", -1)
text = strings.Replace(text, "'", " ", -1)
text = strings.Replace(text, "\"", " ", -1)
text = strings.Replace(text, "_", " ", -1)
text = strings.Replace(text, "*", " ", -1)
text = strings.Replace(text, "1", "", -1)
text = strings.Replace(text, "2", "", -1)
text = strings.Replace(text, "3", "", -1)
text = strings.Replace(text, "4", "", -1)
text = strings.Replace(text, "5", "", -1)
text = strings.Replace(text, "6", "", -1)
text = strings.Replace(text, "7", "", -1)
text = strings.Replace(text, "8", "", -1)
text = strings.Replace(text, "9", "", -1)
text = strings.Replace(text, "0", "", -1)
text = strings.Replace(text, " ", " ", -1)
return text
}
// cleaner removes newlines, special characters and numbers from an input text
var cleaner = strings.NewReplacer(
"\n", " ",
",", " ",
"#", " ",
"/", " ",
`\`, " ",
".", " ",
"!", " ",
"?", " ",
":", " ",
";", " ",
"-", " ",
"'", " ",
`"`, " ",
"_", " ",
"*", " ",
"1", "",
"2", "",
"3", "",
"4", "",
"5", "",
"6", "",
"7", "",
"8", "",
"9", "",
"0", "",
" ", " ",
)
235 changes: 3 additions & 232 deletions langdet/internal/default_languages.go

Large diffs are not rendered by default.

File renamed without changes.
17 changes: 7 additions & 10 deletions langdet/langdetdef/languages.go
Original file line number Diff line number Diff line change
@@ -1,23 +1,20 @@
package langdetdef

import (
"encoding/json"
"fmt"
"unicode"

"github.com/chrisport/go-lang-detector/langdet"
"github.com/chrisport/go-lang-detector/langdet/internal"
"log"
"encoding/json"
)

func init() {
def, err := internal.Asset("default_languages.json")
if err != nil {
log.Println("Could not initialize default languages")
}

lan := []langdet.Language{}

//TODO handle error case?
_ = json.Unmarshal(def, &lan)
if err := json.Unmarshal(internal.DefaultLanguageDefs, &lan); err != nil {
panic(fmt.Sprintf("unable to initialize default languages - corrupt embedded asset: %v", err))
}

for i := range lan {
switch lan[i].Name {
Expand Down Expand Up @@ -58,7 +55,7 @@ func DefaultLanguages() []langdet.LanguageComparator {
func NewWithDefaultLanguages() langdet.Detector {
return langdet.Detector{Languages: DefaultLanguages(),
MinimumConfidence: langdet.DefaultMinimumConfidence,
NDepth: langdet.DEFAULT_NDEPTH}
NDepth: langdet.DEFAULT_NDEPTH}
}

var defaultLanguages = make(map[string]langdet.LanguageComparator)
2 changes: 1 addition & 1 deletion langdet/models.go
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ type DetectionResult struct {
Confidence int
}

//ResByConf represents an array of DetectionResult and can be sorted by Confidence.
// ResByConf represents an array of DetectionResult and can be sorted by Confidence.
type ResByConf []DetectionResult

func (a ResByConf) Len() int { return len(a) }
Expand Down
7 changes: 0 additions & 7 deletions makefile

This file was deleted.

24 changes: 0 additions & 24 deletions vendor/github.com/gopherjs/gopherjs/LICENSE

This file was deleted.

Loading