Releases: oduwsdl/sumgram
Releases · oduwsdl/sumgram
sumgram-v1.0.1
Major updates:
- Support for reading text from STDIN:
$ cat path/to/collection/of/text/files/*.txt | sumgram -
- Sumgram uses an English stopwords list by default (switch off with
--no-default-stopwords
). To include additional stopwords--add-stopwords
- may be used to include additional stopwords:
$ sumgram --add-stopwords stopword1 stopword2 -t 10 path/to/collection/of/text/files/
- may be used to include additional stopwords in a text file (1 stopword per line):
$ sumgram --add-stopwords my_stopwords_file.txt -t 10 path/to/collection/of/text/files/
- may be used to include additional stopwords:
- Extracting/processing text from URLs:
$ sumgram "http://example.com/news/article-1.html" "http://example.com/news/article-1.html"
.
To change the default new article boilerplate removal method (boilerpy3.ArticleExtractor
), set--boilerplate-rm-method
with one of the following'boilerpy3.DefaultExtractor', 'boilerpy3.ArticleSentencesExtractor', 'boilerpy3.LargestContentExtractor', 'boilerpy3.CanolaExtractor', 'boilerpy3.KeepEverythingExtractor', 'boilerpy3.NumWordsRulesExtractor', and 'nltk'
(regular expression for stripping all HTML tags)
sumgram-v0.0.19
Minor update to address Unable to install numpy 1.17.0 with setup script issue
sumgram-v0.0.18
Minor changes
- Added
-v --version
command-line option - Made
regex
default--sentence-tokenizer