Add Region file reading support #13

piegamesde · 2018-08-17T19:05:15Z

This PR adds a new RegionFile class that makes it easy to access NBT data of chunks within a Minecraft Region/Anvil file. Also added some tests.

This pull requests makes use of the changes in #12 and the test won't work until #10 is merged since 1.13 NBT data is tested too.

The `NBTi/oStream`s only provide a boolean flag to tell if the stream is compressed or not. In reality, there are three possible compressions: none, gzip or zlib. The latter one is heavily used to store Chunk NBT data in Minecraft. This commit introduces three constants for the compression. They happen to be matching the ID convention used by Minecraft. This way, no additional case checking is required to handle the stream compression when reading region files. The old methods using a boolean are marked as deprecated from now on, since they are ambiguous about which compression algorithm is meant.

I also stumbled across an InputStream not being closed, and ended up fixing a lot of little warnings like this. Sorry for "hiding" these unrelated changes in a commit, I hope it is okay.

This utility class helps with reading Minecraft Region/Anvil files quickly and easily. (NOT TESTED YET)

Removed a lambda/streams expression

This tag was introduced with MC 1.12.

coveralls · 2018-08-17T19:07:42Z

Coverage increased (+17.8%) to 31.481% when pulling 5a9b4e9 on piegamesde:RegionFile into 7a1b6d9 on flow:develop.

Getting the palette data of that long array is a real pain to get right so this may be useful to others too

And more documentation

xcube16 · 2019-02-15T01:15:43Z

https://github.com/flow/nbt/blob/0dde724db4e363dd6db04d2491436167dbdc68e2/src/main/java/com/flowpowered/nbt/regionfile/RegionFile.java

This is a state machine NIGHTMARE!

RegionFile#loadAllChunks() ick!
RegionFile#unloadAllChunks() ick!!
At first glance I thought RegionChunk#unload() and RegionFile#unloadAllChunks() saved the chunks,
but on closer inspection, someone is trying to re-invent the garbage collector!

The only methods should be:
RegionFile#getChunk(x, z) that reads and returns an Optional<RegionChunk>
maybe some kind of iterator functionality to iterate over the RegionChunks
and when writing chunks is supported (optional):
RegionFile#setChunk(x, z, RegionChunk)
RegionFile#clearChunk(x, z)

Making RegionChunk immutable-ish might help, but whatever works best.

Edit: And don't bother caching RegionChunk's at all. Let the user do that with a wrapper if they really need that. If someone is dumb enough to load the same chunk 1000 times the OS caches hot files anyway.

piegamesde · 2019-02-15T13:51:10Z

@xcube16 You're totally right. I changed those things a few months ago, but forgot to push them to the branch. Sadly, a few other changes have sneaked in and I don't really remember the details, I hope it's still ok.

But thanks for reading through my code in the first place :)

xcube16 · 2019-02-15T16:59:12Z

@piegamesde Your welcome. :)
I came here to see if my own bug had been fixed by other PRs.

Now that I notice it, it looks like there is already a RegionFile reading implementation at https://github.com/flow/nbt/blob/develop/src/main/java/com/flowpowered/nbt/regionfile/SimpleRegionFileReader.java
But it could use some work.

It looks like the last commit to the develop branch was 2 years ago. I hope a maintainer is still active on this project.

xcube16 · 2019-02-16T18:34:29Z

src/main/java/com/flowpowered/nbt/regionfile/Chunk.java

+ * Each instance of this class represents a Minecraft chunk within a Region file. The data is represented as a binary blob in form of a
+ * {@link ByteBuffer}. Each object is self-contained and not linked to any physical file or other object. Each object is immutable and
+ * should be treated as such. The data is set in the constructor</br>
+ * The data must have a length of a multiple of 4096 bytes (to be able to write it to disk more easily). The four first bytes specify the


The internal format of a region file should not be exposed to the user. Things like sector alignment, etc.

The blob has a format of
[length | 4 bytes, big endian][compression_flag | byte][data | length - 1 bytes]
'length' is data's length + 1, 'length' == 0 is a special case (a non existing chunk, don't ask my why mojang has this when normally the offset into the region file is 0 when there is a non existing chunk)
'compression_flag' == 1 means GZIP(Input/Output)Stream is used for compression (unused by minecraft to save chunks, but I would support it anyway),
'compression_flag' == 2 means Inflater(Input/Output)Stream is used for compression

A user will not realize this and try to use NBT(Input/Output)Stream on it directly and get a big surprize!
I would instead expose the uncompressed data (I know NBT(Input/Output)Stream supports compression with GZIP(Input/Output)Stream, but lets not force the user to deal with special cases)

src/main/java/com/flowpowered/nbt/regionfile/Chunk.java

xcube16 · 2019-02-16T19:14:14Z

src/main/java/com/flowpowered/nbt/regionfile/SimpleRegionFileReader.java

@@ -77,7 +78,7 @@
                raf.readFully(data);
                ByteArrayInputStream in = new ByteArrayInputStream(data);
                InflaterInputStream iis = new InflaterInputStream(in);


See. SimpleRegionFileReader does not force NBTInputStream to implement a compression method that only it (region files) ever use. :)

xcube16 · 2019-02-16T19:16:03Z

src/main/java/com/flowpowered/nbt/stream/NBTInputStream.java

+	 * Flag indicating that the given data will be compressed with the ZLIB compression algorithm. This is the default compression method used to compress the nbt
+	 * data of the chunks in Minecraft Region/Anvil files, but only if its compression method is {@code 2} (see the respective format documentation), which is default for all newer versions.
+	 */
+	public static final int ZLIB_COMPRESSION = 2;


You did all this work and added all this extra code just to support a compression method that only region files use? o.O
I would just decompress the data in your region file abstraction and feed uncompressed data to NBTInputStream

There are three ways chunk data can be compressed. And if the NBTInputStream provides syntactic sugar to unwrap one compression, I thought that I should add the other one as well (the first time I tried to load region files I didn't know they had custom compression so I thought there was a bug or so, quite painful).
I do agree that this was kind of a poor design decision (the library should not handle compression/endianness at all), but I wanted this change to be backwards-compatible.

One could just require the user to instantiate EndianSwitchableInputStream and make NBTInputStream's constructor require a DataInput instead of InputStream. I would still have EndianSwitchableInputStream in the library. The current helper constructors that support endianness internally are not that bad though... If its not broken don't fix it.

You could create a CompressedInputStream class that supports all three compression types. Wait why would you do that? The only methods common to all three types are already in InputStream. See where this is going?

xcube16 · 2019-02-16T19:24:41Z

src/test/java/com/flowpowered/nbt/RegionFileTest.java

+	 */
+	@Test
+	public void testRead() throws IOException, URISyntaxException {
+		RegionFile file = new RegionFile(Paths.get(getClass().getResource("/r.1.3.mca").toURI()));


Do we really need to include large binary test files in this project? There is a LOT of unknown data in a minecraft region file. This test does not check the integrity of any of that data, all it does is go "yay! It did not crash!".

The best way to test would be to use a custom region file (or, even better! generate one in the test!) with a known and verifiable NBT structure, empty chunks, the works! Than you can test the NBT when you read it and make shore thing is corrupted.

xcube16 · 2019-02-16T19:45:50Z

src/main/java/com/flowpowered/nbt/regionfile/RegionFile.java

+	 *            The chunk to set. Use {@code null} to remove it.
+	 * @author piegames
+	 */
+	public synchronized void setChunk(int x, int z, Chunk chunk) {


Annotate chunk with @Nullable. Or even better, add a clearChunk(x, z) method

src/main/java/com/flowpowered/nbt/regionfile/RegionFile.java

xcube16 · 2019-02-16T19:50:29Z

src/main/java/com/flowpowered/nbt/regionfile/RegionFile.java

+	 * 
+	 * @author piegames
+	 */
+	public synchronized void writeChanges() throws IOException {


I would just write chunks directly in setChunk(), and remove this method.
Some of this code can be copy n' pasted into setChunk() but if there is anything we can do to make it a little easer to read that would be nice.

I initially thought of this, but if there are frequent changes (or say I modify each chunk a bit), there will be a lot of redundant disk writes and pointless position calculations.

Let the user handle that. If there are not frequent changes you are just creating a cache invalidation hell for the user for no reason.
If the user wants to cache chunks, they know the best way to do that.

/*not final*/ Chunk cachedChunk = RegionFile#getChunk(x, z);

Lets make some changes to to the chunk

chunk = makeSomeChanges(chunk);

hmmm... another frequent change

chunk = anotherFrequentChange(chunk);

Now lets save all the changes at once

RegionFile#setChunk(x, z, chunk)

If you are not convinced yet, think about this...
What happens if you want to cache chunks on the world level instead of at the region level (like what Minecraft does internally)?

Caching the chunks would not be so bad if Chunk was immutable as it would hide the caching form the user. But it would still be a lot of useless (in my opinion) code that would just be there for the off change that someone spams getChunk(). And even if they do, it still might not be a problem (most modern OS's cache hot files anyway).

xcube16 · 2019-02-16T19:51:44Z

src/main/java/com/flowpowered/nbt/regionfile/SimpleRegionFileReader.java

@@ -35,6 +35,7 @@
 import com.flowpowered.nbt.Tag;
 import com.flowpowered.nbt.stream.NBTInputStream;

+@Deprecated


We can also delete most of the code in this class! :D
Just make it wrap and or use the new one.

Bah, this code is so ugly I do not really want to touch this :D
It does not even handle missing chunks or exceptions and has no documentation.

All the more reason to get rid of it I would think. An easy one line of code could replace almost that entire file. Doesn't it make make you feel good when you remove lots code?

Maybe I just like seeing code get deleted. :P

As of now this PR adds 1530 new lines of code and only removes 46.

xcube16 · 2019-02-16T19:58:55Z

src/main/java/com/flowpowered/nbt/ByteArrayTag.java

@@ -50,6 +50,11 @@ public ByteArrayTag(String name, byte[] value) {
        return value;
    }

+    @Override
+    public void setValue(byte[] value) {


Why did you make every Tag mutable!? D': cries
I don't care what @kashike says, this is one of my favorite NBT libraries.
Immutable Tags are one reason for that.

If everyone really wants this.... cries some more
Maybe it should be in its own PR?

Have you ever tried to change the position of an entity in chunk with the NBT data being immutable?

Maybe it should be in its own PR?

IIRC this was, but I merged it onto my dev branch and I later build the API changes on top of it; sorry.

piegamesde · 2019-02-17T19:18:02Z

Let me explain the intended workflow with this classes, this might help you understand my design decisions:

Load the entire file from disk (and parse the header). In most of the cases, I want to access the whole file anyway, so let's make it fast at least. Also, once the file is loaded, I don't care about its state on disk.
Iterate over all (or some) chunks and process them.
- Processing starts by converting this to an NBT Tag which gets parsed as needed (this is the probably most expensive part)
- Analyse or modify it
- (Optionally) create a new chunk object with the modified data (this will serialize it back to a buffer; expensive)
(Optionally) write the changed file back to disk.

xcube16 · 2019-02-18T15:22:14Z

* Load the entire file from disk (and parse the header). In most of the cases, I want to access the whole file anyway, so let's make it fast at least. Also, once the file is loaded, I don't care about its state on disk.

This sounds a lot like what SimpleRegionFileReader.readFile(File) does (reads entire file into a list, all be it with limited metadata for each chunk).

* Iterate over all (or some) chunks and process them.
  
  * Processing starts by converting this to an NBT Tag which gets parsed as needed (this is the probably most expensive part)
  * Analyse or modify it
  * (Optionally) create a new chunk object with the modified data (this will serialize it back to a buffer; expensive)

SimpleRegionFileReader.readFile(File).stream()...

* (Optionally) write the changed file back to disk.

All we would need to do is add SimpleRegionFileReader.writeFile(List<Tag>) (we might want to add an immutable region chunk metadata class to encapsulate Tag providing the key (x, y, z) and timestamp)

This simple approach of just upgrading SimpleRegionFileReader would be easy to do and far less new code. When writing back to disk it would also on average create a more compact region file.

But does that really match what a region file is and what it is used for? From my point of view, a region file is more like a random access mutable key-value database that uses a file for backing.

Think about an NBT editor or another tool that wants to access and modify individual chunks. This is also how Minecraft uses region files. Iteration is not a region file's primary nature by design, but can also be implemented with not much extra code, even with the optimization I suggest below. (should be fun)

There is not much point in having a RegionFile object (as you can see, SimpleRegionFileReader does not) unless you are caching chunks in it (bad! let the user do that! :P), or using it like a database. This is likely where my misunderstanding of your design goals happened.

let's make it fast at least

Often I find a good design is fast anyway so lets do that last.
But now that I think about it, the fastest way to iterate the chunks and or load them all into memory would be to read them consecutively with regard to there location on disk. Your implementation, and not even SimpleRegionFileReader does that.

piegamesde · 2019-03-15T17:23:58Z

Once I start loading chunks lazily, things start to get way more complicated. It will at least bring back in the state of the open FileChannel but that's okay for me. The implementation will at least need to keep all modified chunks in RAM since there is no way around collecting them into one single write.

xcube16 · 2019-03-18T03:28:28Z

Once I start loading chunks lazily, things start to get way more complicated.

I don't see how it is complicated unless you are caching. Can you give me an example?

It will at least bring back in the state of the open FileChannel but that's okay for me.

I am confused, the only implementation that does not keep an open FileChannel is SimpleRegionFileReader.

The implementation will at least need to keep all modified chunks in RAM since there is no way around collecting them into one single write.

What? Region files where designed for random access both when reading and writing chunks. Whenever you write a chunk, see if it fits in the old location, and if it is too big, search for a space with enough consecutive empty sectors to put it in.
If you don't want to implement this, replace the setChunk() method with a setChunks() method that requires all chunks upfront (no random access, but at least it would be better than caching).

The only state a region file abstraction should expose to the user is the region file its self. Caching should be in its own abstraction layer.

Removed all of the chunk caching

sainttx and others added 9 commits July 28, 2017 17:42

Allow tags be mutated

109a674

Merge branch 'develop' of github.com:sainttx/nbt into develop

14d50f2

Removed any internal usage of the now deprecated methods

3601525

I also stumbled across an InputStream not being closed, and ended up fixing a lot of little warnings like this. Sorry for "hiding" these unrelated changes in a commit, I hope it is okay.

Added RegionFile class

f118c1a

This utility class helps with reading Minecraft Region/Anvil files quickly and easily. (NOT TESTED YET)

Made it Java 1.7 compatible

6c05f19

Removed a lambda/streams expression

Fix typo

7c0aafd

Support the long array tag

f0438cd

This tag was introduced with MC 1.12.

Added some tests

40c4f23

piegamesde added 7 commits August 23, 2018 09:56

Chunk data extractor

0dde724

Getting the palette data of that long array is a real pain to get right so this may be useful to others too

Merge branch 'RegionFile' into develop

6314abb

Allow to write region files

dbe4ea2

Fixes and improvements

10e0ffe

Merge branch 'develop' into RegionFile

713b339

Allow to mutate LongArrayTag

18cf9f4

API changes

5a9b4e9

And more documentation

xcube16 reviewed Feb 16, 2019

View reviewed changes

piegamesde added 5 commits March 18, 2019 11:53

API changes (again)

b871a74

Removed all of the chunk caching

Fixed RegionFile tests

bbee1ee

Cleanup SimpleRegionFileReader

ef87596

Optimized NBT streams a bit

a2ede34

Update to Java 8

45b6231

piegamesde closed this Aug 26, 2019

piegamesde deleted the RegionFile branch August 26, 2019 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Region file reading support #13

Add Region file reading support #13

piegamesde commented Aug 17, 2018

coveralls commented Aug 17, 2018 •

edited

Loading

xcube16 commented Feb 15, 2019 •

edited

Loading

piegamesde commented Feb 15, 2019 •

edited

Loading

xcube16 commented Feb 15, 2019 •

edited

Loading

xcube16 Feb 16, 2019

xcube16 Feb 16, 2019

xcube16 Feb 16, 2019

piegamesde Feb 17, 2019

xcube16 Feb 28, 2019

xcube16 Feb 16, 2019

xcube16 Feb 16, 2019

xcube16 Feb 16, 2019

piegamesde Feb 17, 2019

xcube16 Feb 28, 2019

xcube16 Feb 16, 2019

piegamesde Feb 17, 2019

xcube16 Feb 28, 2019

xcube16 Feb 16, 2019

piegamesde Feb 17, 2019

piegamesde commented Feb 17, 2019

xcube16 commented Feb 18, 2019

piegamesde commented Mar 15, 2019

xcube16 commented Mar 18, 2019 •

edited

Loading

Add Region file reading support #13

Add Region file reading support #13

Conversation

piegamesde commented Aug 17, 2018

coveralls commented Aug 17, 2018 • edited Loading

xcube16 commented Feb 15, 2019 • edited Loading

piegamesde commented Feb 15, 2019 • edited Loading

xcube16 commented Feb 15, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piegamesde commented Feb 17, 2019

xcube16 commented Feb 18, 2019

piegamesde commented Mar 15, 2019

xcube16 commented Mar 18, 2019 • edited Loading

coveralls commented Aug 17, 2018 •

edited

Loading

xcube16 commented Feb 15, 2019 •

edited

Loading

piegamesde commented Feb 15, 2019 •

edited

Loading

xcube16 commented Feb 15, 2019 •

edited

Loading

xcube16 commented Mar 18, 2019 •

edited

Loading