-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented first version of the asReader interface on IString #273
Conversation
The tests are very good because many random values will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have progress on this! Thanks. I think this demonstrates exactly what is needed.
The reader for indented values requires rethinking without the use of the IntStream. The reader should at least be able to stream entire lines, up too and including the newline character. But larger buffers are preferable. One typical use case here is large json objects serialized as strings with indentation. So if we can cram several lines into buffers up to 8k that will surely help with server response times.
With respect to tests; I'm pretty sure the random string generator does not play with indent yet, or with deep concatenation of long strings. To test this better we could either generate some examples ourselves, or extend the default random generator for strings..
@Override | ||
public int read(char[] cbuf, int off, int len) throws IOException { | ||
int result = currentReader.read(cbuf, off, len); | ||
if (result == -1 && !readingRight) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use some comments to explain that read is allowed to return fewer chars than Len.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah okay, I didn't want to repeat the manual of read
, but indeed, at least 1 char is enough.
@@ -1380,6 +1438,72 @@ public void write(Writer w) throws IOException { | |||
assert indents.isEmpty(); | |||
} | |||
|
|||
@Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is correct, although I see we need to work hard with the surrogate pairs. Maybe If we would always read until len - 1
unless the last character is a newline, and the int iterator always provides codepoints, there are no further special cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But before we merge we should find an implemention that can say least stream line by line. I suspect that the current experiment with the int iterator will slow down the stream to the point that it doesn't make sense to stream anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we need to do work with surrogate pairs since every int the iterator returns can be either 1 or 2 chars. I initially had some code that would write 2 chars if there was room, but that just lead to duplication of logic with the queedLowSurrogate
field. So I rewrote that to be a bit less wordy.
To make indented write easier i distributed indentedWrite methods over all implementation classes. Possibly an indentedRead method would help as well in this case. For larger buffers than 1. |
@@ -1146,6 +1163,47 @@ public void write(Writer w) throws IOException { | |||
right.write(w); | |||
} | |||
|
|||
@Override | |||
public Reader asReader() { | |||
return new Reader() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we allocate a reader at every level in the tree and every node in every level. For the writer this design didn't hold up because of the enormous amounts of concat nodes in some applications. Typically we have at least as many concat nodes as lines in the output.
So let's have a look at this and also use the benchmarks in the main method so we can fine-tune it. Typically a recursion is ok for the concat nodes and maximally log(n) allocations (a spine of reader objects that moves over the tree).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we have to keep state, the writer was easier as it's push, while the reader is pull. I tried to do it in a similar way as the writer, but could not in a reasonable timeframe figure it out. We need co-routintes ;)
This algorithm is what we ended up with to avoid allocation of iterator or stream objects for every node in the balanced binary tree:
If we turn this into an |
@jurgenvinju the new implementation uses the exact same style as the int iterators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fast and useful. Hope we can apply it in the webservers to improve visuals and other features like salix diffs.
This implements the addition discussed in #71
I've added some tests, hope they are enough.