-
-
Notifications
You must be signed in to change notification settings - Fork 33
Conversation
The `NBTi/oStream`s only provide a boolean flag to tell if the stream is compressed or not. In reality, there are three possible compressions: none, gzip or zlib. The latter one is heavily used to store Chunk NBT data in Minecraft. This commit introduces three constants for the compression. They happen to be matching the ID convention used by Minecraft. This way, no additional case checking is required to handle the stream compression when reading region files. The old methods using a boolean are marked as deprecated from now on, since they are ambiguous about which compression algorithm is meant.
I also stumbled across an InputStream not being closed, and ended up fixing a lot of little warnings like this. Sorry for "hiding" these unrelated changes in a commit, I hope it is okay.
This utility class helps with reading Minecraft Region/Anvil files quickly and easily. (NOT TESTED YET)
Removed a lambda/streams expression
This tag was introduced with MC 1.12.
Getting the palette data of that long array is a real pain to get right so this may be useful to others too
And more documentation
This is a state machine NIGHTMARE! RegionFile#loadAllChunks() ick! The only methods should be: Making RegionChunk immutable-ish might help, but whatever works best. Edit: And don't bother caching RegionChunk's at all. Let the user do that with a wrapper if they really need that. If someone is dumb enough to load the same chunk 1000 times the OS caches hot files anyway. |
@xcube16 You're totally right. I changed those things a few months ago, but forgot to push them to the branch. Sadly, a few other changes have sneaked in and I don't really remember the details, I hope it's still ok. But thanks for reading through my code in the first place :) |
@piegamesde Your welcome. :)
It looks like the last commit to the develop branch was 2 years ago. I hope a maintainer is still active on this project. |
* Each instance of this class represents a Minecraft chunk within a Region file. The data is represented as a binary blob in form of a | ||
* {@link ByteBuffer}. Each object is self-contained and not linked to any physical file or other object. Each object is immutable and | ||
* should be treated as such. The data is set in the constructor</br> | ||
* The data must have a length of a multiple of 4096 bytes (to be able to write it to disk more easily). The four first bytes specify the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal format of a region file should not be exposed to the user. Things like sector alignment, etc.
The blob has a format of
[length | 4 bytes, big endian][compression_flag | byte][data | length - 1 bytes]
'length' is data's length + 1, 'length' == 0 is a special case (a non existing chunk, don't ask my why mojang has this when normally the offset into the region file is 0 when there is a non existing chunk)
'compression_flag' == 1 means GZIP(Input/Output)Stream is used for compression (unused by minecraft to save chunks, but I would support it anyway),
'compression_flag' == 2 means Inflater(Input/Output)Stream is used for compression
A user will not realize this and try to use NBT(Input/Output)Stream on it directly and get a big surprize!
I would instead expose the uncompressed data (I know NBT(Input/Output)Stream supports compression with GZIP(Input/Output)Stream, but lets not force the user to deal with special cases)
@@ -77,7 +78,7 @@ | |||
raf.readFully(data); | |||
ByteArrayInputStream in = new ByteArrayInputStream(data); | |||
InflaterInputStream iis = new InflaterInputStream(in); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See. SimpleRegionFileReader does not force NBTInputStream to implement a compression method that only it (region files) ever use. :)
* Flag indicating that the given data will be compressed with the ZLIB compression algorithm. This is the default compression method used to compress the nbt | ||
* data of the chunks in Minecraft Region/Anvil files, but only if its compression method is {@code 2} (see the respective format documentation), which is default for all newer versions. | ||
*/ | ||
public static final int ZLIB_COMPRESSION = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You did all this work and added all this extra code just to support a compression method that only region files use? o.O
I would just decompress the data in your region file abstraction and feed uncompressed data to NBTInputStream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are three ways chunk data can be compressed. And if the NBTInputStream
provides syntactic sugar to unwrap one compression, I thought that I should add the other one as well (the first time I tried to load region files I didn't know they had custom compression so I thought there was a bug or so, quite painful).
I do agree that this was kind of a poor design decision (the library should not handle compression/endianness at all), but I wanted this change to be backwards-compatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could just require the user to instantiate EndianSwitchableInputStream and make NBTInputStream's constructor require a DataInput instead of InputStream. I would still have EndianSwitchableInputStream in the library. The current helper constructors that support endianness internally are not that bad though... If its not broken don't fix it.
You could create a CompressedInputStream class that supports all three compression types. Wait why would you do that? The only methods common to all three types are already in InputStream. See where this is going?
*/ | ||
@Test | ||
public void testRead() throws IOException, URISyntaxException { | ||
RegionFile file = new RegionFile(Paths.get(getClass().getResource("/r.1.3.mca").toURI())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to include large binary test files in this project? There is a LOT of unknown data in a minecraft region file. This test does not check the integrity of any of that data, all it does is go "yay! It did not crash!".
The best way to test would be to use a custom region file (or, even better! generate one in the test!) with a known and verifiable NBT structure, empty chunks, the works! Than you can test the NBT when you read it and make shore thing is corrupted.
* The chunk to set. Use {@code null} to remove it. | ||
* @author piegames | ||
*/ | ||
public synchronized void setChunk(int x, int z, Chunk chunk) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Annotate chunk
with @Nullable
. Or even better, add a clearChunk(x, z) method
* | ||
* @author piegames | ||
*/ | ||
public synchronized void writeChanges() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just write chunks directly in setChunk(), and remove this method.
Some of this code can be copy n' pasted into setChunk() but if there is anything we can do to make it a little easer to read that would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially thought of this, but if there are frequent changes (or say I modify each chunk a bit), there will be a lot of redundant disk writes and pointless position calculations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let the user handle that. If there are not frequent changes you are just creating a cache invalidation hell for the user for no reason.
If the user wants to cache chunks, they know the best way to do that.
/*not final*/ Chunk cachedChunk = RegionFile#getChunk(x, z);
Lets make some changes to to the chunk
chunk = makeSomeChanges(chunk);
hmmm... another frequent change
chunk = anotherFrequentChange(chunk);
Now lets save all the changes at once
RegionFile#setChunk(x, z, chunk)
If you are not convinced yet, think about this...
What happens if you want to cache chunks on the world level instead of at the region level (like what Minecraft does internally)?
Caching the chunks would not be so bad if Chunk was immutable as it would hide the caching form the user. But it would still be a lot of useless (in my opinion) code that would just be there for the off change that someone spams getChunk(). And even if they do, it still might not be a problem (most modern OS's cache hot files anyway).
@@ -35,6 +35,7 @@ | |||
import com.flowpowered.nbt.Tag; | |||
import com.flowpowered.nbt.stream.NBTInputStream; | |||
|
|||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also delete most of the code in this class! :D
Just make it wrap and or use the new one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bah, this code is so ugly I do not really want to touch this :D
It does not even handle missing chunks or exceptions and has no documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the more reason to get rid of it I would think. An easy one line of code could replace almost that entire file. Doesn't it make make you feel good when you remove lots code?
Maybe I just like seeing code get deleted. :P
As of now this PR adds 1530 new lines of code and only removes 46.
@@ -50,6 +50,11 @@ public ByteArrayTag(String name, byte[] value) { | |||
return value; | |||
} | |||
|
|||
@Override | |||
public void setValue(byte[] value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you make every Tag mutable!? D': cries
I don't care what @kashike says, this is one of my favorite NBT libraries.
Immutable Tags are one reason for that.
If everyone really wants this.... cries some more
Maybe it should be in its own PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you ever tried to change the position of an entity in chunk with the NBT data being immutable?
Maybe it should be in its own PR?
IIRC this was, but I merged it onto my dev branch and I later build the API changes on top of it; sorry.
Let me explain the intended workflow with this classes, this might help you understand my design decisions:
|
This sounds a lot like what SimpleRegionFileReader.readFile(File) does (reads entire file into a list, all be it with limited metadata for each chunk).
SimpleRegionFileReader.readFile(File).stream()...
All we would need to do is add SimpleRegionFileReader.writeFile(List<Tag>) (we might want to add an immutable region chunk metadata class to encapsulate Tag providing the key (x, y, z) and timestamp) This simple approach of just upgrading SimpleRegionFileReader would be easy to do and far less new code. When writing back to disk it would also on average create a more compact region file. But does that really match what a region file is and what it is used for? From my point of view, a region file is more like a random access mutable key-value database that uses a file for backing. Think about an NBT editor or another tool that wants to access and modify individual chunks. This is also how Minecraft uses region files. Iteration is not a region file's primary nature by design, but can also be implemented with not much extra code, even with the optimization I suggest below. (should be fun) There is not much point in having a RegionFile object (as you can see, SimpleRegionFileReader does not) unless you are caching chunks in it (bad! let the user do that! :P), or using it like a database. This is likely where my misunderstanding of your design goals happened.
Often I find a good design is fast anyway so lets do that last. |
Once I start loading chunks lazily, things start to get way more complicated. It will at least bring back in the state of the open |
I don't see how it is complicated unless you are caching. Can you give me an example?
I am confused, the only implementation that does not keep an open FileChannel is SimpleRegionFileReader.
What? Region files where designed for random access both when reading and writing chunks. Whenever you write a chunk, see if it fits in the old location, and if it is too big, search for a space with enough consecutive empty sectors to put it in. The only state a region file abstraction should expose to the user is the region file its self. Caching should be in its own abstraction layer. |
Removed all of the chunk caching
This PR adds a new
RegionFile
class that makes it easy to access NBT data of chunks within a Minecraft Region/Anvil file. Also added some tests.This pull requests makes use of the changes in #12 and the test won't work until #10 is merged since 1.13 NBT data is tested too.