Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in ngram(text, n = 4) : input 'str' has nwords=3 and n=4; must have nwords >= n #10

Open
arthur0421 opened this issue Dec 28, 2022 · 1 comment

Comments

@arthur0421
Copy link

arthur0421 commented Dec 28, 2022

text <- scan("ca10.txt", what = "char", sep = "\n") # ca10.txt is a file in the Brown corpus
text <- tolower(text)
text <- gsub("[^a-z- ]", "", text, perl = T)
quad <- get.phrasetable(ngram(text, n = 4))

This last line croaks the error msg. I don't understand why it says nwords=3 which is obviously untrue. Guess it's because one line in the file contains only three tokens? How can I work around this issue? (BTW, I work with R 3.6.3 on Linux Mint 19.3.)
ca10.txt

@heckendorfc
Copy link
Collaborator

I think you're right. To bypass, you could pass text[-50] to exclude that 3-word line from your input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants