-
Notifications
You must be signed in to change notification settings - Fork 859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Splitting code base in multiple packages #228
Comments
what is the goal? If all you want to do is load tfidf, you can do that, see https://github.com/NaturalNode/natural/blob/master/lib/natural/index.js If the goal is not to have to download everything, that would probably be pretty tricky short of moving all the code into separate npm packages which doesnt really seem great to me. |
@kkoch986 that is exactly what I was thinking. I was thinking about writing up an interactive tutorial in the browser and I felt downloading the whole library is an overkill. I am not sure about this being a great idea, yet. Thanks for considering this |
no problem, so you are using browserify? I'm really interested in an online interactive tutorial, actually @chrisumbel and I have been working on plans for naturalnode.com and i think that could be a pretty cool feature. Im not super familiar with how browserify and friends work but i image you could download the package locally and just include the things you need directly. i.e. rather than include let me know if that makes any sense of if you have any ideas on how it would work better. |
Ok, a quick fix for now could be I think in browserify, you still want to do Please do ping me for naturalnode.com |
I would love to see this modularized for similar reasons as @nicola. Doing interactive NLP in browsers opens opens up a lot of options that you just can't do on the server. For example, you can interactively train Bayes Classifiers for a subset of text to build intuition for a project you're working on (my current need). Or building a linguistic and machine learning teaching tool for a wider audience. I know there are already GH issues for running this in the browser (#253, #25, #189) with some Browserify hacks, but this is another gentle lobby for reducing the barrier to getting it running on a browser by modularizing. To start though, I tried the suggestion of // Doesn't work: Cannot find module 'natural/BayesClassifier'
// var BayesClassifier = require('natural/BayesClassifier')
// var classifier = new BayesClassifier()
// Doesn't work: Cannot find module 'natural/tfidf'
// var tfidf = require('natural/tfidf')
// Works fine
var natural = require('natural')
var classifier = new natural.BayesClassifier()
classifier.addDocument('i am long qqqq', 'buy')
classifier.addDocument('buy the q\'s', 'buy')
classifier.addDocument('short gold', 'sell')
classifier.addDocument('sell gold', 'sell')
classifier.train()
console.log('classifier: ', classifier.classify('i am short silver'))
console.log('classifier: ', classifier.classify('i am long copper'))
// output: classifier: sell
// output: classifier: buy I traced the exports from here https://github.com/NaturalNode/natural/blob/master/lib/natural/index.js As far as full-fledged modularizing goes, there are some popular examples in the wild to look at. D3 did it by splitting it up into their own separate repos (as @kkoch986 suggested): https://github.com/d3 Lodash is all in one repo: https://github.com/lodash/lodash/ This is really nicely done. This allows you to do: var merge = require('lodash/merge')
// or in ES6:
import {merge} from 'lodash'
// Now you can use merge instead of _.merge if that's all you want in your project
var ab = merge({a:1},{b:2}) I think the ideal browser setup would be to be able to import just the component you need, similar to the Unfortunately, at the moment I have almost no bandwidth to contribute to this effort. This is just a suggestion and I realize this is a ton of work. But I do think getting it working well in a browser and doing a couple browser-based demos would give it a lot of visibility. Either way, it's still a great project. Thank you. |
For those coming to this who want a workaround using webpack I have it working based off this these threads: http://stackoverflow.com/a/27275791 var webpack = require('webpack')
var ignore = new webpack.IgnorePlugin(new RegExp('^(lapack|WNdb)$'))
module.exports = {
// lots of other options go here
plugins: [ignore],
node: {
fs: "empty" // loads an empty 'fs' module. Should work as long as you don't call functions that depend on 'fs'
}
} |
@jefffriesen all interesting points, im also interested the the language stuff you mentioned (i mentioned it breifly in #159). i dont think splitting it into different repos is the best option but id definitely be open to moving towards the lodash approach. I unfortunately dont have much bandwidth at the moment either. I still have a somewhat big project to wrap up which might make the browserify thing a little less simple, i think the basic problem is that as we get into more complicated algorithms external files become more and more important. As a matter of fact the code im working on now will introduce a corpus-type interface (think nltk http://www.nltk.org/data.html). So i think maybe we need to plan a way to divide the browser-compatible code from the not but i dont think deprecating/removing will be it. I would be happy to use this issue to figure out a solid plan for reorganizing things to make it more browser-friendly while keeping it flexible and keep moving towards getting more algorithms implemented. For starters i think ill merge #159 into this issue, i think both problems can be solved at once. |
Makes sense to merge #159 into this thread. Just a clarification:
Do you mean external large files become more and more important? That is definitely true. I think external files are important for both server and browser environments though. (of course the browser can only handle smaller files). I'm not just talking about file uploads through the UI (although that is possible for public tools). I'm loading CSVs as the app starts up. The value that I'm seeing by doing this is that I can build interactive tools and visualizations to get a feel for the data and the algorithms. Once I'm confident in the approach, then I can point it to really big sets of files server-side. I think there are two problems to solve for the browser:
|
I successfully loaded natural into webpack to use in the front-end. I just copied index.js and commented out the exports that require 'fs'. Here is the code. Hopefully it helps other with this. |
Separate packages would be nice since it would mean that we also don’t need to download giant e.g. |
If the goal is better tree-shaking this is also a good alternative #608 (it won't help the npm package size but at least we can have smaller bundles in the browser/lambda) |
Some time ago I created an
I think this addresses the issue. |
This is a few years after the fact, but it would be nice if the above information was in the documentation. I'm just using natural for the distance calculators right now and only importing those saves A LOT of space. |
I see what you mean. I will update the documentation for the use of separate modules. |
Awesome, thank you! Also a side note: Using this technique to bring the distance measurement functions in to a SvelteKit app hosted on Vercel. Built using Vite. Thank you for all your hard work! |
I think this could be useful, so that one could just import
tfidf
it that's the only bit needed.In other words, modularization the node way.
The text was updated successfully, but these errors were encountered: