Error in ngram(text, n = 4) : input 'str' has nwords=3 and n=4; must have nwords >= n #10

arthur0421 · 2022-12-28T09:26:02Z

text <- scan("ca10.txt", what = "char", sep = "\n") # ca10.txt is a file in the Brown corpus
text <- tolower(text)
text <- gsub("[^a-z- ]", "", text, perl = T)
quad <- get.phrasetable(ngram(text, n = 4))

This last line croaks the error msg. I don't understand why it says nwords=3 which is obviously untrue. Guess it's because one line in the file contains only three tokens? How can I work around this issue? (BTW, I work with R 3.6.3 on Linux Mint 19.3.)
ca10.txt

heckendorfc · 2022-12-28T20:12:57Z

I think you're right. To bypass, you could pass text[-50] to exclude that 3-word line from your input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in ngram(text, n = 4) : input 'str' has nwords=3 and n=4; must have nwords >= n #10

Error in ngram(text, n = 4) : input 'str' has nwords=3 and n=4; must have nwords >= n #10

arthur0421 commented Dec 28, 2022 •

edited

Loading

heckendorfc commented Dec 28, 2022

Error in ngram(text, n = 4) : input 'str' has nwords=3 and n=4; must have nwords >= n #10

Error in ngram(text, n = 4) : input 'str' has nwords=3 and n=4; must have nwords >= n #10

Comments

arthur0421 commented Dec 28, 2022 • edited Loading

heckendorfc commented Dec 28, 2022

arthur0421 commented Dec 28, 2022 •

edited

Loading