Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize Piece-s memory usage #35

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 26 additions & 24 deletions src/main/java/com/turn/ttorrent/client/Piece.java
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,10 @@

import com.turn.ttorrent.common.Torrent;
import com.turn.ttorrent.client.peer.SharingPeer;
import com.turn.ttorrent.client.storage.TorrentByteStorage;

import java.io.IOException;
import java.nio.ByteBuffer;
import java.security.NoSuchAlgorithmException;
import java.util.Arrays;
import java.util.concurrent.Callable;

import org.slf4j.Logger;
Expand Down Expand Up @@ -52,12 +50,9 @@ public class Piece implements Comparable<Piece> {
private static final Logger logger =
LoggerFactory.getLogger(Piece.class);

private final TorrentByteStorage bucket;
private final SharedTorrent torrent;
private final int index;
private final long offset;
private final long length;
private final byte[] hash;
private final boolean seeder;

private volatile boolean valid;
private int seen;
Expand All @@ -66,22 +61,15 @@ public class Piece implements Comparable<Piece> {
/**
* Initialize a new piece in the byte bucket.
*
* @param bucket The underlying byte storage bucket.
* @param bucket The parent torrent.
* @param index This piece index in the torrent.
* @param offset This piece offset, in bytes, in the storage.
* @param length This piece length, in bytes.
* @param hash This piece 20-byte SHA1 hash sum.
* @param seeder Whether we're seeding this torrent or not (disables piece
* validation).
*/
public Piece(TorrentByteStorage bucket, int index, long offset,
long length, byte[] hash, boolean seeder) {
this.bucket = bucket;
public Piece(SharedTorrent torrent, int index, long length) {
this.torrent = torrent;
this.index = index;
this.offset = offset;
this.length = length;
this.hash = hash;
this.seeder = seeder;

// Piece is considered invalid until first check.
this.valid = false;
Expand Down Expand Up @@ -150,7 +138,7 @@ public void noLongerAt(SharingPeer peer) {
* meta-info.
*/
public synchronized boolean validate() throws IOException {
if (this.seeder) {
if (this.torrent.isSeeder()) {
logger.trace("Skipping validation of {} (seeder mode).", this);
this.valid = true;
return true;
Expand All @@ -160,18 +148,28 @@ public synchronized boolean validate() throws IOException {
this.valid = false;

try {
// TODO: remove cast to int when large ByteBuffer support is
// implemented in Java.
ByteBuffer buffer = this._read(0, this.length);
byte[] data = new byte[(int)this.length];
buffer.get(data);
this.valid = Arrays.equals(Torrent.hash(data), this.hash);
this.valid = checkHash(buffer);
} catch (NoSuchAlgorithmException nsae) {
logger.error("{}", nsae);
}

return this.isValid();
}

protected boolean checkHash(ByteBuffer data) throws NoSuchAlgorithmException {
byte[] calculatedHash = Torrent.hash(data);

int torrentHashPosition = getIndex() * Torrent.PIECE_HASH_SIZE;
ByteBuffer torrentHash = this.torrent.getPiecesHashes();
for (int i = 0; i < Torrent.PIECE_HASH_SIZE; i++) {
byte value = torrentHash.get(torrentHashPosition + i);
if (value != calculatedHash[i]) {
return false;
}
}
return true;
}

/**
* Internal piece data read function.
Expand Down Expand Up @@ -200,7 +198,7 @@ private ByteBuffer _read(long offset, long length) throws IOException {
// TODO: remove cast to int when large ByteBuffer support is
// implemented in Java.
ByteBuffer buffer = ByteBuffer.allocate((int)length);
int bytes = this.bucket.read(buffer, this.offset + offset);
int bytes = this.torrent.getBucket().read(buffer, this.getBucketOffset() + offset);
buffer.rewind();
buffer.limit(bytes >= 0 ? bytes : 0);
return buffer;
Expand Down Expand Up @@ -262,11 +260,15 @@ public synchronized void record(ByteBuffer block, int offset)
if (block.remaining() + offset == this.length) {
this.data.rewind();
logger.trace("Recording {}...", this);
this.bucket.write(this.data, this.offset);
this.torrent.getBucket().write(this.data, this.getBucketOffset());
this.data = null;
}
}

long getBucketOffset() {
return ((long)this.index) * this.torrent.getPieceLength();
}

/**
* Return a human-readable representation of this piece.
*/
Expand Down
26 changes: 22 additions & 4 deletions src/main/java/com/turn/ttorrent/client/SharedTorrent.java
Original file line number Diff line number Diff line change
Expand Up @@ -321,8 +321,6 @@ public synchronized void init() throws InterruptedException, IOException {
logger.info("Analyzing local data for {} with {} threads ({} pieces)...",
new Object[] { this.getName(), threads, nPieces });
for (int idx=0; idx<nPieces; idx++) {
byte[] hash = new byte[Torrent.PIECE_HASH_SIZE];
this.piecesHashes.get(hash);

// The last piece may be shorter than the torrent's global piece
// length. Let's make sure we get the right piece length in any
Expand All @@ -332,8 +330,7 @@ public synchronized void init() throws InterruptedException, IOException {
this.bucket.size() - off,
this.pieceLength);

this.pieces[idx] = new Piece(this.bucket, idx, off, len, hash,
this.isSeeder());
this.pieces[idx] = new Piece(this, idx, len);

Callable<Piece> hasher = new Piece.CallableHasher(this.pieces[idx]);
results.add(executor.submit(hasher));
Expand Down Expand Up @@ -835,4 +832,25 @@ public synchronized void handlePeerDisconnected(SharingPeer peer) {
@Override
public synchronized void handleIOException(SharingPeer peer,
IOException ioe) { /* Do nothing */ }


/**
* For accessing from Piece.
* @return
*/
TorrentByteStorage getBucket() {
return bucket;
}

/**
* For accessing from Piece.
* @return
*/
int getPieceLength() {
return pieceLength;
}

ByteBuffer getPiecesHashes() {
return piecesHashes;
}
}
6 changes: 6 additions & 0 deletions src/main/java/com/turn/ttorrent/common/Torrent.java
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,12 @@ public static byte[] hash(byte[] data) throws NoSuchAlgorithmException {
return md.digest();
}

public static byte[] hash(ByteBuffer data) throws NoSuchAlgorithmException {
MessageDigest md = MessageDigest.getInstance("SHA-1");
md.update(data);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will create the byte array internally

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only if it's a ByteBuffer which doesn't backed by an accessible byte array, and in that case, only create a 4K buffer, which is more memory friendly solution, than to create a byte buffer at least 64K, to accommodate a torrent block just for hashing. I could run the torrent client more efficiently this way, YMMV.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buffer being passed is the data read from the "output" file (this._read(...)), which is a direct buffer, i.e. is not array backed, no ? The piecesHashes one is array backed.

I couldn't understand the rest of your answer, sorry. Upon reading the source of the MessageDigest I can see this optimization is there, reusing the byte array internally tempArray, but here md is created and discarded just as well, I think the solution lies more in the lines of #184, of having a "resource store" where you pick it up, use it and relinquish ownership...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look into here : https://github.com/mpetazzoni/ttorrent/pull/35/files#diff-454b627720e29935f25dab9a660670b1L166 you can see a big memory allocation. I don't know how big is it, if it's 64K or 512K - I've implemented it 3.5 years ago - but I'm fairly certain that one torrent piece can be pretty big, and it is much simpler to pass the ByteBuffer directly to the MessageDigest class, instead of allocating a new byte array, copy everything into it, calculate the hash.

Sorry being not too clear, English is not my native language :(

Copy link

@zanella zanella Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, I think I can sum up the bytes duplication:

  • each byte piece is duplicated from the piecesHashes -> this one you got rid of by hitting the piecesHashes using the offset;
  • each file piece is read into a ByteBuffer then copied into a byte array -> this one you got rid of copying, by passing the ByteBuffer (which has a byte array internally) to the MessageDigest mehod;

I think that what remains is limiting the number of existing ByteBuffer buffer from file reading, like I tried in #195 , I'll try to limit it then profile it, with this done I think this patch is complete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that happened :)

return md.digest();
}

/**
* Convert a byte string to a string containing an hexadecimal
* representation of the original data.
Expand Down