Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Piece-s memory usage #35

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

gzsombor
Copy link

Remove a couple of unnecessary fields from the Piece class :

  • offset -> can be calculated from index * pieceLength
  • seeder -> torrent.isSeeder()
  • bucket -> torrent.getBucket()
  • hash -> which is equivalent with torrent.getPiecesHashes() from index*Torrent.PIECE_HASH_SIZE, so we don't have duplicate this information.

…entHash instead of separate byte buffer in every Piece object
@@ -418,6 +418,12 @@ public void save(OutputStream output) throws IOException {
return md.digest();
}

public static byte[] hash(ByteBuffer data) throws NoSuchAlgorithmException {
MessageDigest md = MessageDigest.getInstance("SHA-1");
md.update(data);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create the byte array internally

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if it's a ByteBuffer which doesn't backed by an accessible byte array, and in that case, only create a 4K buffer, which is more memory friendly solution, than to create a byte buffer at least 64K, to accommodate a torrent block just for hashing. I could run the torrent client more efficiently this way, YMMV.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer being passed is the data read from the "output" file (this._read(...)), which is a direct buffer, i.e. is not array backed, no ? The piecesHashes one is array backed.

I couldn't understand the rest of your answer, sorry. Upon reading the source of the MessageDigest I can see this optimization is there, reusing the byte array internally tempArray, but here md is created and discarded just as well, I think the solution lies more in the lines of #184, of having a "resource store" where you pick it up, use it and relinquish ownership...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look into here : https://github.com/mpetazzoni/ttorrent/pull/35/files#diff-454b627720e29935f25dab9a660670b1L166 you can see a big memory allocation. I don't know how big is it, if it's 64K or 512K - I've implemented it 3.5 years ago - but I'm fairly certain that one torrent piece can be pretty big, and it is much simpler to pass the ByteBuffer directly to the MessageDigest class, instead of allocating a new byte array, copy everything into it, calculate the hash.

Sorry being not too clear, English is not my native language :(

Copy link

@zanella zanella Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I think I can sum up the bytes duplication:

  • each byte piece is duplicated from the piecesHashes -> this one you got rid of by hitting the piecesHashes using the offset;
  • each file piece is read into a ByteBuffer then copied into a byte array -> this one you got rid of copying, by passing the ByteBuffer (which has a byte array internally) to the MessageDigest mehod;

I think that what remains is limiting the number of existing ByteBuffer buffer from file reading, like I tried in #195 , I'll try to limit it then profile it, with this done I think this patch is complete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that happened :)


int torrentHashPosition = getIndex() * Torrent.PIECE_HASH_SIZE;
for (int i = 0; i < Torrent.PIECE_HASH_SIZE; i++) {
byte value = this.torrentHash.get(torrentHashPosition + i);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fetching byte per byte on a direct buffer is faster than just fetching the whole thing once ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants