Skip to content
This repository has been archived by the owner on Jun 26, 2019. It is now read-only.

Add Region file reading support #13

Closed
wants to merge 21 commits into from

Conversation

piegamesde
Copy link

This PR adds a new RegionFile class that makes it easy to access NBT data of chunks within a Minecraft Region/Anvil file. Also added some tests.

This pull requests makes use of the changes in #12 and the test won't work until #10 is merged since 1.13 NBT data is tested too.

sainttx and others added 9 commits July 28, 2017 17:42
The `NBTi/oStream`s only provide a boolean flag to tell if the stream is compressed or not. In reality, there are three possible compressions: none, gzip or zlib. The latter one is heavily used to store Chunk NBT data in Minecraft.

This commit introduces three constants for the compression. They happen to be matching the ID convention used by Minecraft. This way, no additional case checking is required to handle the stream compression when reading region files.

The old methods using a boolean are marked as deprecated from now on, since they are ambiguous about which compression algorithm is meant.
I also stumbled across an InputStream not being closed, and ended up fixing a lot of little warnings like this. Sorry for "hiding" these unrelated changes in a commit, I hope it is okay.
This utility class helps with reading Minecraft Region/Anvil files quickly and easily. (NOT TESTED YET)
Removed a lambda/streams expression
This tag was introduced with MC 1.12.
@coveralls
Copy link

coveralls commented Aug 17, 2018

Coverage Status

Coverage increased (+17.8%) to 31.481% when pulling 5a9b4e9 on piegamesde:RegionFile into 7a1b6d9 on flow:develop.

@xcube16
Copy link

xcube16 commented Feb 15, 2019

https://github.com/flow/nbt/blob/0dde724db4e363dd6db04d2491436167dbdc68e2/src/main/java/com/flowpowered/nbt/regionfile/RegionFile.java

This is a state machine NIGHTMARE!

RegionFile#loadAllChunks() ick!
RegionFile#unloadAllChunks() ick!!
At first glance I thought RegionChunk#unload() and RegionFile#unloadAllChunks() saved the chunks,
but on closer inspection, someone is trying to re-invent the garbage collector!

The only methods should be:
RegionFile#getChunk(x, z) that reads and returns an Optional<RegionChunk>
maybe some kind of iterator functionality to iterate over the RegionChunks
and when writing chunks is supported (optional):
RegionFile#setChunk(x, z, RegionChunk)
RegionFile#clearChunk(x, z)

Making RegionChunk immutable-ish might help, but whatever works best.

Edit: And don't bother caching RegionChunk's at all. Let the user do that with a wrapper if they really need that. If someone is dumb enough to load the same chunk 1000 times the OS caches hot files anyway.

@piegamesde
Copy link
Author

piegamesde commented Feb 15, 2019

@xcube16 You're totally right. I changed those things a few months ago, but forgot to push them to the branch. Sadly, a few other changes have sneaked in and I don't really remember the details, I hope it's still ok.

But thanks for reading through my code in the first place :)

@xcube16
Copy link

xcube16 commented Feb 15, 2019

@piegamesde Your welcome. :)
I came here to see if my own bug had been fixed by other PRs.

Now that I notice it, it looks like there is already a RegionFile reading implementation at https://github.com/flow/nbt/blob/develop/src/main/java/com/flowpowered/nbt/regionfile/SimpleRegionFileReader.java
But it could use some work.

It looks like the last commit to the develop branch was 2 years ago. I hope a maintainer is still active on this project.

* Each instance of this class represents a Minecraft chunk within a Region file. The data is represented as a binary blob in form of a
* {@link ByteBuffer}. Each object is self-contained and not linked to any physical file or other object. Each object is immutable and
* should be treated as such. The data is set in the constructor</br>
* The data must have a length of a multiple of 4096 bytes (to be able to write it to disk more easily). The four first bytes specify the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal format of a region file should not be exposed to the user. Things like sector alignment, etc.

The blob has a format of
[length | 4 bytes, big endian][compression_flag | byte][data | length - 1 bytes]
'length' is data's length + 1, 'length' == 0 is a special case (a non existing chunk, don't ask my why mojang has this when normally the offset into the region file is 0 when there is a non existing chunk)
'compression_flag' == 1 means GZIP(Input/Output)Stream is used for compression (unused by minecraft to save chunks, but I would support it anyway),
'compression_flag' == 2 means Inflater(Input/Output)Stream is used for compression

A user will not realize this and try to use NBT(Input/Output)Stream on it directly and get a big surprize!
I would instead expose the uncompressed data (I know NBT(Input/Output)Stream supports compression with GZIP(Input/Output)Stream, but lets not force the user to deal with special cases)

src/main/java/com/flowpowered/nbt/regionfile/Chunk.java Outdated Show resolved Hide resolved
@@ -77,7 +78,7 @@
raf.readFully(data);
ByteArrayInputStream in = new ByteArrayInputStream(data);
InflaterInputStream iis = new InflaterInputStream(in);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See. SimpleRegionFileReader does not force NBTInputStream to implement a compression method that only it (region files) ever use. :)

* Flag indicating that the given data will be compressed with the ZLIB compression algorithm. This is the default compression method used to compress the nbt
* data of the chunks in Minecraft Region/Anvil files, but only if its compression method is {@code 2} (see the respective format documentation), which is default for all newer versions.
*/
public static final int ZLIB_COMPRESSION = 2;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did all this work and added all this extra code just to support a compression method that only region files use? o.O
I would just decompress the data in your region file abstraction and feed uncompressed data to NBTInputStream

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three ways chunk data can be compressed. And if the NBTInputStream provides syntactic sugar to unwrap one compression, I thought that I should add the other one as well (the first time I tried to load region files I didn't know they had custom compression so I thought there was a bug or so, quite painful).
I do agree that this was kind of a poor design decision (the library should not handle compression/endianness at all), but I wanted this change to be backwards-compatible.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could just require the user to instantiate EndianSwitchableInputStream and make NBTInputStream's constructor require a DataInput instead of InputStream. I would still have EndianSwitchableInputStream in the library. The current helper constructors that support endianness internally are not that bad though... If its not broken don't fix it.

You could create a CompressedInputStream class that supports all three compression types. Wait why would you do that? The only methods common to all three types are already in InputStream. See where this is going?

*/
@Test
public void testRead() throws IOException, URISyntaxException {
RegionFile file = new RegionFile(Paths.get(getClass().getResource("/r.1.3.mca").toURI()));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to include large binary test files in this project? There is a LOT of unknown data in a minecraft region file. This test does not check the integrity of any of that data, all it does is go "yay! It did not crash!".

The best way to test would be to use a custom region file (or, even better! generate one in the test!) with a known and verifiable NBT structure, empty chunks, the works! Than you can test the NBT when you read it and make shore thing is corrupted.

* The chunk to set. Use {@code null} to remove it.
* @author piegames
*/
public synchronized void setChunk(int x, int z, Chunk chunk) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annotate chunk with @Nullable. Or even better, add a clearChunk(x, z) method

*
* @author piegames
*/
public synchronized void writeChanges() throws IOException {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just write chunks directly in setChunk(), and remove this method.
Some of this code can be copy n' pasted into setChunk() but if there is anything we can do to make it a little easer to read that would be nice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially thought of this, but if there are frequent changes (or say I modify each chunk a bit), there will be a lot of redundant disk writes and pointless position calculations.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let the user handle that. If there are not frequent changes you are just creating a cache invalidation hell for the user for no reason.
If the user wants to cache chunks, they know the best way to do that.

/*not final*/ Chunk cachedChunk = RegionFile#getChunk(x, z);

Lets make some changes to to the chunk

chunk = makeSomeChanges(chunk);

hmmm... another frequent change

chunk = anotherFrequentChange(chunk);

Now lets save all the changes at once

RegionFile#setChunk(x, z, chunk)

If you are not convinced yet, think about this...
What happens if you want to cache chunks on the world level instead of at the region level (like what Minecraft does internally)?

Caching the chunks would not be so bad if Chunk was immutable as it would hide the caching form the user. But it would still be a lot of useless (in my opinion) code that would just be there for the off change that someone spams getChunk(). And even if they do, it still might not be a problem (most modern OS's cache hot files anyway).

@@ -35,6 +35,7 @@
import com.flowpowered.nbt.Tag;
import com.flowpowered.nbt.stream.NBTInputStream;

@Deprecated
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also delete most of the code in this class! :D
Just make it wrap and or use the new one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bah, this code is so ugly I do not really want to touch this :D
It does not even handle missing chunks or exceptions and has no documentation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the more reason to get rid of it I would think. An easy one line of code could replace almost that entire file. Doesn't it make make you feel good when you remove lots code?

Maybe I just like seeing code get deleted. :P

As of now this PR adds 1530 new lines of code and only removes 46.

@@ -50,6 +50,11 @@ public ByteArrayTag(String name, byte[] value) {
return value;
}

@Override
public void setValue(byte[] value) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make every Tag mutable!? D': cries
I don't care what @kashike says, this is one of my favorite NBT libraries.
Immutable Tags are one reason for that.

If everyone really wants this.... cries some more
Maybe it should be in its own PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you ever tried to change the position of an entity in chunk with the NBT data being immutable?

Maybe it should be in its own PR?

IIRC this was, but I merged it onto my dev branch and I later build the API changes on top of it; sorry.

@piegamesde
Copy link
Author

Let me explain the intended workflow with this classes, this might help you understand my design decisions:

  • Load the entire file from disk (and parse the header). In most of the cases, I want to access the whole file anyway, so let's make it fast at least. Also, once the file is loaded, I don't care about its state on disk.
  • Iterate over all (or some) chunks and process them.
    • Processing starts by converting this to an NBT Tag which gets parsed as needed (this is the probably most expensive part)
    • Analyse or modify it
    • (Optionally) create a new chunk object with the modified data (this will serialize it back to a buffer; expensive)
  • (Optionally) write the changed file back to disk.

@xcube16
Copy link

xcube16 commented Feb 18, 2019

* Load the entire file from disk (and parse the header). In most of the cases, I want to access the whole file anyway, so let's make it fast at least. Also, once the file is loaded, I don't care about its state on disk.

This sounds a lot like what SimpleRegionFileReader.readFile(File) does (reads entire file into a list, all be it with limited metadata for each chunk).

* Iterate over all (or some) chunks and process them.
  
  * Processing starts by converting this to an NBT Tag which gets parsed as needed (this is the probably most expensive part)
  * Analyse or modify it
  * (Optionally) create a new chunk object with the modified data (this will serialize it back to a buffer; expensive)

SimpleRegionFileReader.readFile(File).stream()...

* (Optionally) write the changed file back to disk.

All we would need to do is add SimpleRegionFileReader.writeFile(List<Tag>) (we might want to add an immutable region chunk metadata class to encapsulate Tag providing the key (x, y, z) and timestamp)

This simple approach of just upgrading SimpleRegionFileReader would be easy to do and far less new code. When writing back to disk it would also on average create a more compact region file.

But does that really match what a region file is and what it is used for? From my point of view, a region file is more like a random access mutable key-value database that uses a file for backing.

Think about an NBT editor or another tool that wants to access and modify individual chunks. This is also how Minecraft uses region files. Iteration is not a region file's primary nature by design, but can also be implemented with not much extra code, even with the optimization I suggest below. (should be fun)

There is not much point in having a RegionFile object (as you can see, SimpleRegionFileReader does not) unless you are caching chunks in it (bad! let the user do that! :P), or using it like a database. This is likely where my misunderstanding of your design goals happened.

let's make it fast at least

Often I find a good design is fast anyway so lets do that last.
But now that I think about it, the fastest way to iterate the chunks and or load them all into memory would be to read them consecutively with regard to there location on disk. Your implementation, and not even SimpleRegionFileReader does that.

@piegamesde
Copy link
Author

Once I start loading chunks lazily, things start to get way more complicated. It will at least bring back in the state of the open FileChannel but that's okay for me. The implementation will at least need to keep all modified chunks in RAM since there is no way around collecting them into one single write.

@xcube16
Copy link

xcube16 commented Mar 18, 2019

Once I start loading chunks lazily, things start to get way more complicated.

I don't see how it is complicated unless you are caching. Can you give me an example?

It will at least bring back in the state of the open FileChannel but that's okay for me.

I am confused, the only implementation that does not keep an open FileChannel is SimpleRegionFileReader.

The implementation will at least need to keep all modified chunks in RAM since there is no way around collecting them into one single write.

What? Region files where designed for random access both when reading and writing chunks. Whenever you write a chunk, see if it fits in the old location, and if it is too big, search for a space with enough consecutive empty sectors to put it in.
If you don't want to implement this, replace the setChunk() method with a setChunks() method that requires all chunks upfront (no random access, but at least it would be better than caching).

The only state a region file abstraction should expose to the user is the region file its self. Caching should be in its own abstraction layer.

@piegamesde piegamesde closed this Aug 26, 2019
@piegamesde piegamesde deleted the RegionFile branch August 26, 2019 19:31
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants