-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Piece-s memory usage #35
base: master
Are you sure you want to change the base?
Conversation
…entHash instead of separate byte buffer in every Piece object
…ecessary data duplication can be avoided.
@@ -418,6 +418,12 @@ public void save(OutputStream output) throws IOException { | |||
return md.digest(); | |||
} | |||
|
|||
public static byte[] hash(ByteBuffer data) throws NoSuchAlgorithmException { | |||
MessageDigest md = MessageDigest.getInstance("SHA-1"); | |||
md.update(data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will create the byte array internally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only if it's a ByteBuffer which doesn't backed by an accessible byte array, and in that case, only create a 4K buffer, which is more memory friendly solution, than to create a byte buffer at least 64K, to accommodate a torrent block just for hashing. I could run the torrent client more efficiently this way, YMMV.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The buffer being passed is the data read from the "output" file (this._read(...)
), which is a direct buffer, i.e. is not array backed, no ? The piecesHashes one is array backed.
I couldn't understand the rest of your answer, sorry. Upon reading the source of the MessageDigest I can see this optimization is there, reusing the byte array internally tempArray
, but here md is created and discarded just as well, I think the solution lies more in the lines of #184, of having a "resource store" where you pick it up, use it and relinquish ownership...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you look into here : https://github.com/mpetazzoni/ttorrent/pull/35/files#diff-454b627720e29935f25dab9a660670b1L166 you can see a big memory allocation. I don't know how big is it, if it's 64K or 512K - I've implemented it 3.5 years ago - but I'm fairly certain that one torrent piece can be pretty big, and it is much simpler to pass the ByteBuffer directly to the MessageDigest class, instead of allocating a new byte array, copy everything into it, calculate the hash.
Sorry being not too clear, English is not my native language :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, I think I can sum up the bytes duplication:
- each byte piece is duplicated from the piecesHashes -> this one you got rid of by hitting the piecesHashes using the offset;
- each file piece is read into a ByteBuffer then copied into a byte array -> this one you got rid of copying, by passing the ByteBuffer (which has a byte array internally) to the MessageDigest mehod;
I think that what remains is limiting the number of existing ByteBuffer buffer from file reading, like I tried in #195 , I'll try to limit it then profile it, with this done I think this patch is complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that happened :)
|
||
int torrentHashPosition = getIndex() * Torrent.PIECE_HASH_SIZE; | ||
for (int i = 0; i < Torrent.PIECE_HASH_SIZE; i++) { | ||
byte value = this.torrentHash.get(torrentHashPosition + i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fetching byte per byte on a direct buffer is faster than just fetching the whole thing once ?
Remove a couple of unnecessary fields from the Piece class :