Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORC-1528: Fix readBytes potential overflow in RecordReaderUtils.ChunkReader#create #1662

Closed
wants to merge 8 commits into from

Conversation

yebukong
Copy link
Contributor

@yebukong yebukong commented Nov 24, 2023

What changes were proposed in this pull request?

  • I have adjusted the calculation of reqBytes and readBytes ranges in the org.apache.orc.impl.RecordReaderUtils.ChunkReader#create method from Integer to Long, and added validation for Integer overflow

  • Adds more test cases

Why are the changes needed?

This PR aims to fix ORC-1528

How was this patch tested?

org.apache.orc.impl.TestRecordReaderUtils#testBufferChunkOffsetExceedsMaxInt

@github-actions github-actions bot added the JAVA label Nov 24, 2023
@guiyanakuang
Copy link
Member

Thank you for making a PR, @yebukong

@yebukong yebukong changed the title [ORC-1528]Fix readBytes potential overflow in RecordReaderUtils#create ORC-1528: Fix readBytes potential overflow in RecordReaderUtils#create Nov 24, 2023
@@ -770,18 +770,18 @@ void readRanges(FSDataInputStream file, boolean allocateDirect, double extraByte
}

static ChunkReader create(BufferChunk from, BufferChunk to) {
long f = Integer.MAX_VALUE;
long e = Integer.MIN_VALUE;
long f = Long.MAX_VALUE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommendation: This is already existing so please feel free to ignore.
Since we are touching the code, Can we rename the variables for easy reading?
Couldn't understand the meaning of f, e, cf, ce 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, let me try.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm opposite to @mystic-lama 's comment.

We don't want to mix a bug fix and a style fix (or refactorying) in the same PR. It's because it increases the review complexity indeed.

If we want to rename this, it should be done before or after this PR.

Please don't change the existing variable name in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, @dongjoon-hyun has a point. @yebukong we can do that in follow up PR or may be leave it as is :)
Apologies for the noise.

@dongjoon-hyun dongjoon-hyun added this to the 1.8.7 milestone Nov 25, 2023
@@ -770,18 +770,18 @@ void readRanges(FSDataInputStream file, boolean allocateDirect, double extraByte
}

static ChunkReader create(BufferChunk from, BufferChunk to) {
long f = Integer.MAX_VALUE;
long e = Integer.MIN_VALUE;
long f = Long.MAX_VALUE;
Copy link
Member

@dongjoon-hyun dongjoon-hyun Nov 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm opposite to @mystic-lama 's comment.

We don't want to mix a bug fix and a style fix (or refactorying) in the same PR. It's because it increases the review complexity indeed.

If we want to rename this, it should be done before or after this PR.

Please don't change the existing variable name in this PR.

@dongjoon-hyun
Copy link
Member

Thank you for reporting a bug and making a PR, @yebukong .
Please make a renaming PR separately. I can merge that renaming PR first before this main PR.

reqBytes += ef - cf;
return new ChunkReader(from, to, (int) (e - f), reqBytes);
reqBytes += currentEnd - currentStart;
if (reqBytes >= Integer.MAX_VALUE) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**
* The maximum size of array to allocate, value being the same as {@link java.util.Hashtable},
* given the fact that the stripe size could be increased to larger value by configuring "orc.stripe.size".
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

I think it might be more accurate to extract and reuse this MAX_ARRAY_SIZE constant, the java maximum array length is not actually Integer.MAX_VALUE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I define a MAX_ARRAY_SIZE constant in RecordReaderUtils ? I noticed it is marked as private in StringHashTableDictionary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to extract it into a utility class (e.g. IOUtils) and reuse the same constant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove MAX_ARRAY_SIZE from StringHashTableDictionary.java in this pr, the same concept should be kept in a single source (IOUtils).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree "the same concept should be kept in a single source".

Should we include the changes to remove the MAX_ARRAY_SIZE definition from StringHashTableDictionary.java in this PR?

Copy link
Contributor Author

@yebukong yebukong Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guiyanakuang Considering that this PR has been closed, is it still necessary to continue the discussion on the issues mentioned above?

@yebukong
Copy link
Contributor Author

yebukong commented Nov 25, 2023

I've already maked a renaming PR, #1664. @dongjoon-hyun PTAL.
After the completion of renaming PR, i will push the changes to the main PR.

@yebukong yebukong changed the title ORC-1528: Fix readBytes potential overflow in RecordReaderUtils#create ORC-1528: Fix readBytes potential overflow in RecordReaderUtils.ChunkReader#create Nov 25, 2023
@dongjoon-hyun
Copy link
Member

I've already maked a renaming PR, #1664. @dongjoon-hyun PTAL. After the completion of renaming PR, i will push the changes to the main PR.

To @yebukong , I repled on your new PR about why we cannot proceed that renaming PR first.

@dongjoon-hyun
Copy link
Member

In short,

  • ORC-1528 should land at main/branch-1.9/branch-1.8 branches as a bug fix.
  • ORC-1530 should land at only main branch as an improvement or task PR

@yebukong
Copy link
Contributor Author

I have pushed the latest code modifications. @dongjoon-hyun @guiyanakuang @mystic-lama PTAL.

@@ -30,6 +30,11 @@ public final class IOUtils {

public static final int DEFAULT_BUFFER_SIZE = 8192;

/**
* The maximum size of array to allocate, value being the same as {@link java.util.Hashtable}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add info like max array to allocate for what purpose? Please only add if the specific context is known, else we leave it as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/openjdk/jdk/blob/jdk-17%2B35/src/java.base/share/classes/java/util/Hashtable.java#L391-L397

Because the comment for java.util.Hashtable#MAX_ARRAY_SIZE is detailed, I did not add excessive description.

Copy link
Contributor

@paliwalashish paliwalashish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Changes look good to me.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

dongjoon-hyun pushed a commit that referenced this pull request Nov 27, 2023
…Reader#create

- I have adjusted the calculation of `reqBytes` and `readBytes` ranges in the `org.apache.orc.impl.RecordReaderUtils.ChunkReader#create` method from `Integer` to `Long`, and added validation for `Integer` overflow

- Adds more test cases

This PR aims to fix [ORC-1528](https://issues.apache.org/jira/browse/ORC-1528)

org.apache.orc.impl.TestRecordReaderUtils#testBufferChunkOffsetExceedsMaxInt

Closes #1662 from yebukong/main.

Lead-authored-by: yebukong <[email protected]>
Co-authored-by: yebukong <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e554765)
Signed-off-by: Dongjoon Hyun <[email protected]>
dongjoon-hyun pushed a commit that referenced this pull request Nov 27, 2023
…Reader#create

- I have adjusted the calculation of `reqBytes` and `readBytes` ranges in the `org.apache.orc.impl.RecordReaderUtils.ChunkReader#create` method from `Integer` to `Long`, and added validation for `Integer` overflow

- Adds more test cases

This PR aims to fix [ORC-1528](https://issues.apache.org/jira/browse/ORC-1528)

org.apache.orc.impl.TestRecordReaderUtils#testBufferChunkOffsetExceedsMaxInt

Closes #1662 from yebukong/main.

Lead-authored-by: yebukong <[email protected]>
Co-authored-by: yebukong <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit e554765)
Signed-off-by: Dongjoon Hyun <[email protected]>
@dongjoon-hyun
Copy link
Member

Merged to main/1.9/1.8.

@dongjoon-hyun
Copy link
Member

Welcome to the Apache ORC community, @yebukong .
I added you to the Apache ORC contributor group and assigned ORC-1528 to you.
Thank you again.

dongjoon-hyun pushed a commit that referenced this pull request Nov 28, 2023
### What changes were proposed in this pull request?

rename variables in RecordReaderUtils.ChunkReader#create

### Why are the changes needed?

for easy reading

see [PR 1662](#1662 (comment))

### How was this patch tested?

Closes #1664 from yebukong/ORC-1528_rename_variables.

Authored-by: yebukong <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@yebukong
Copy link
Contributor Author

yebukong commented Nov 29, 2023

You're welcome, it's my pleasure. @dongjoon-hyun

cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
…Reader#create

### What changes were proposed in this pull request?

- I have adjusted the calculation of `reqBytes` and `readBytes` ranges in the `org.apache.orc.impl.RecordReaderUtils.ChunkReader#create` method from `Integer` to `Long`, and added validation for `Integer` overflow

- Adds more test cases

### Why are the changes needed?

This PR aims to fix [ORC-1528](https://issues.apache.org/jira/browse/ORC-1528)

### How was this patch tested?

org.apache.orc.impl.TestRecordReaderUtils#testBufferChunkOffsetExceedsMaxInt

Closes apache#1662 from yebukong/main.

Lead-authored-by: yebukong <[email protected]>
Co-authored-by: yebukong <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
### What changes were proposed in this pull request?

rename variables in RecordReaderUtils.ChunkReader#create

### Why are the changes needed?

for easy reading

see [PR 1662](apache#1662 (comment))

### How was this patch tested?

Closes apache#1664 from yebukong/ORC-1528_rename_variables.

Authored-by: yebukong <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants