Skip to content
This repository has been archived by the owner on Nov 10, 2017. It is now read-only.

stdout maxBuffer exceeded #23

Open
luckydonald opened this issue Oct 12, 2016 · 8 comments
Open

stdout maxBuffer exceeded #23

luckydonald opened this issue Oct 12, 2016 · 8 comments

Comments

@luckydonald
Copy link

luckydonald commented Oct 12, 2016

stdout maxBuffer exceeded
Sorry to bother you,
but I have no idea if that is coming form the proxy or the Readable.js lib,
or if I can raise that buffer limit somehow.
The website I tried: http://www.equestriadaily.com/2016/10/music-intersekt-twilight-says-bass-house.html

What can I do?

@n1k0
Copy link
Owner

n1k0 commented Oct 12, 2016

We need to allow a specifying a maxBuffer option to execFile.

Would you want to work on a patch?

@luckydonald
Copy link
Author

luckydonald commented Oct 19, 2016

I am actually a python guy and have never touched node before.

So the solution would be to just higher that value?
Is there a way to have that set dynamically or even remove any limit?

From search I figured you may mean this line, scrape.js:19

Why do we need to spawn a child process in the first place?

@luckydonald
Copy link
Author

luckydonald commented Oct 19, 2016

The question still stays: Why do we need to spawn a child process in the first place?

Edit:

  • figured it opens phantom-scrape.js
    • Why can't we import that file traditionally (require)?
  • phantom-scrape.js:
    • gets the url from system.args[1]
    • readabilityPath (the firefox js) is system.args[2]
    • and a user agent at system.args[3]

@luckydonald
Copy link
Author

luckydonald commented Oct 19, 2016

To have a bit context, I am trying to let it run in a docker container, to use the api:

This is the Dockerfile file:

FROM node:latest


RUN apt-get update && apt-get install -y git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN mkdir -p /app/proxy
RUN mkdir -p /app/lib
WORKDIR /app/proxy


ENV READABILITY_LIB_PATH /app/lib/Readability.js
ENV PORT 80
EXPOSE 80

RUN git clone https://github.com/n1k0/readable-proxy /app/proxy
RUN git clone https://github.com/mozilla/readability /app/lib

RUN npm install

CMD ["npm", "start"]

Is the subprocess stuff from initially being a CLI application, which was not changed to be importable by other projects?

@luckydonald
Copy link
Author

luckydonald commented Oct 24, 2016

Maybe @n1k0, can you tell me,
why do we need to spawn a child process in the first place?
Is that because it was a CLI app before, so kinda legacy code?
Can we maybe just import (include) it directly instead of calling it via shell?
See complete question above
Kinda looking forward to get this api working :D

@n1k0
Copy link
Owner

n1k0 commented Oct 24, 2016

why do we need to spawn a child process in the first place?

Because we need to run a phantomjs script, which isnt based on node but on QtWebKit, which therefore can't share the same js execution runtime & event loop as the CLI node script.

@luckydonald
Copy link
Author

luckydonald commented Oct 24, 2016

The readable html should be less then original page?
So we could just use the length of the website +1024 as maxBuffer?
Would that work?

@luckydonald
Copy link
Author

So, something like this. I have no idea how to put that into the program, because I never did work with that async approach...

var length = new Promise(function(fulfill, reject) {
      http.get(url, function(res) {
        res.on('data', function(d) {
          fulfill(Buffer.byteLength(d, 'utf-8'));
        });
      });
    });
});

manabu added a commit to manabu/readable-proxy that referenced this issue Feb 21, 2017
default maxBuffer is 200K byte
MAX_BUFFER=5242880 npm start
take 5MB for maxBuffer

related to n1k0#23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants