-
Notifications
You must be signed in to change notification settings - Fork 485
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ORC-1528: Fix readBytes potential overflow in RecordReaderUtils.ChunkReader#create #1662
Conversation
Thank you for making a PR, @yebukong |
@@ -770,18 +770,18 @@ void readRanges(FSDataInputStream file, boolean allocateDirect, double extraByte | |||
} | |||
|
|||
static ChunkReader create(BufferChunk from, BufferChunk to) { | |||
long f = Integer.MAX_VALUE; | |||
long e = Integer.MIN_VALUE; | |||
long f = Long.MAX_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommendation: This is already existing so please feel free to ignore.
Since we are touching the code, Can we rename the variables for easy reading?
Couldn't understand the meaning of f, e, cf, ce 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, let me try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm opposite to @mystic-lama 's comment.
We don't want to mix a bug fix and a style fix (or refactorying) in the same PR. It's because it increases the review complexity indeed.
If we want to rename this, it should be done before or after this PR.
Please don't change the existing variable name in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, @dongjoon-hyun has a point. @yebukong we can do that in follow up PR or may be leave it as is :)
Apologies for the noise.
java/core/src/test/org/apache/orc/impl/TestRecordReaderUtils.java
Outdated
Show resolved
Hide resolved
java/core/src/test/org/apache/orc/impl/TestRecordReaderUtils.java
Outdated
Show resolved
Hide resolved
@@ -770,18 +770,18 @@ void readRanges(FSDataInputStream file, boolean allocateDirect, double extraByte | |||
} | |||
|
|||
static ChunkReader create(BufferChunk from, BufferChunk to) { | |||
long f = Integer.MAX_VALUE; | |||
long e = Integer.MIN_VALUE; | |||
long f = Long.MAX_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm opposite to @mystic-lama 's comment.
We don't want to mix a bug fix and a style fix (or refactorying) in the same PR. It's because it increases the review complexity indeed.
If we want to rename this, it should be done before or after this PR.
Please don't change the existing variable name in this PR.
Thank you for reporting a bug and making a PR, @yebukong . |
reqBytes += ef - cf; | ||
return new ChunkReader(from, to, (int) (e - f), reqBytes); | ||
reqBytes += currentEnd - currentStart; | ||
if (reqBytes >= Integer.MAX_VALUE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* The maximum size of array to allocate, value being the same as {@link java.util.Hashtable}, | |
* given the fact that the stripe size could be increased to larger value by configuring "orc.stripe.size". | |
*/ | |
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8; |
I think it might be more accurate to extract and reuse this
MAX_ARRAY_SIZE
constant, the java maximum array length is not actually Integer.MAX_VALUE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I define a MAX_ARRAY_SIZE
constant in RecordReaderUtils
? I noticed it is marked as private
in StringHashTableDictionary
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tend to extract it into a utility class (e.g. IOUtils) and reuse the same constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove MAX_ARRAY_SIZE
from StringHashTableDictionary.java
in this pr, the same concept should be kept in a single source (IOUtils).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree "the same concept should be kept in a single source".
Should we include the changes to remove the MAX_ARRAY_SIZE
definition from StringHashTableDictionary.java
in this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guiyanakuang Considering that this PR has been closed, is it still necessary to continue the discussion on the issues mentioned above?
I've already maked a renaming PR, #1664. @dongjoon-hyun PTAL. |
To @yebukong , I repled on your new PR about why we cannot proceed that renaming PR first. |
I have pushed the latest code modifications. @dongjoon-hyun @guiyanakuang @mystic-lama PTAL. |
@@ -30,6 +30,11 @@ public final class IOUtils { | |||
|
|||
public static final int DEFAULT_BUFFER_SIZE = 8192; | |||
|
|||
/** | |||
* The maximum size of array to allocate, value being the same as {@link java.util.Hashtable} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add info like max array to allocate for what purpose? Please only add if the specific context is known, else we leave it as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the comment for java.util.Hashtable#MAX_ARRAY_SIZE
is detailed, I did not add excessive description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Changes look good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
…Reader#create - I have adjusted the calculation of `reqBytes` and `readBytes` ranges in the `org.apache.orc.impl.RecordReaderUtils.ChunkReader#create` method from `Integer` to `Long`, and added validation for `Integer` overflow - Adds more test cases This PR aims to fix [ORC-1528](https://issues.apache.org/jira/browse/ORC-1528) org.apache.orc.impl.TestRecordReaderUtils#testBufferChunkOffsetExceedsMaxInt Closes #1662 from yebukong/main. Lead-authored-by: yebukong <[email protected]> Co-authored-by: yebukong <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit e554765) Signed-off-by: Dongjoon Hyun <[email protected]>
…Reader#create - I have adjusted the calculation of `reqBytes` and `readBytes` ranges in the `org.apache.orc.impl.RecordReaderUtils.ChunkReader#create` method from `Integer` to `Long`, and added validation for `Integer` overflow - Adds more test cases This PR aims to fix [ORC-1528](https://issues.apache.org/jira/browse/ORC-1528) org.apache.orc.impl.TestRecordReaderUtils#testBufferChunkOffsetExceedsMaxInt Closes #1662 from yebukong/main. Lead-authored-by: yebukong <[email protected]> Co-authored-by: yebukong <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]> (cherry picked from commit e554765) Signed-off-by: Dongjoon Hyun <[email protected]>
Merged to main/1.9/1.8. |
### What changes were proposed in this pull request? rename variables in RecordReaderUtils.ChunkReader#create ### Why are the changes needed? for easy reading see [PR 1662](#1662 (comment)) ### How was this patch tested? Closes #1664 from yebukong/ORC-1528_rename_variables. Authored-by: yebukong <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
You're welcome, it's my pleasure. @dongjoon-hyun |
…Reader#create ### What changes were proposed in this pull request? - I have adjusted the calculation of `reqBytes` and `readBytes` ranges in the `org.apache.orc.impl.RecordReaderUtils.ChunkReader#create` method from `Integer` to `Long`, and added validation for `Integer` overflow - Adds more test cases ### Why are the changes needed? This PR aims to fix [ORC-1528](https://issues.apache.org/jira/browse/ORC-1528) ### How was this patch tested? org.apache.orc.impl.TestRecordReaderUtils#testBufferChunkOffsetExceedsMaxInt Closes apache#1662 from yebukong/main. Lead-authored-by: yebukong <[email protected]> Co-authored-by: yebukong <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request? rename variables in RecordReaderUtils.ChunkReader#create ### Why are the changes needed? for easy reading see [PR 1662](apache#1662 (comment)) ### How was this patch tested? Closes apache#1664 from yebukong/ORC-1528_rename_variables. Authored-by: yebukong <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
What changes were proposed in this pull request?
I have adjusted the calculation of
reqBytes
andreadBytes
ranges in theorg.apache.orc.impl.RecordReaderUtils.ChunkReader#create
method fromInteger
toLong
, and added validation forInteger
overflowAdds more test cases
Why are the changes needed?
This PR aims to fix ORC-1528
How was this patch tested?
org.apache.orc.impl.TestRecordReaderUtils#testBufferChunkOffsetExceedsMaxInt