Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New sprite compressor #5627

Open
wants to merge 150 commits into
base: upcoming
Choose a base branch
from
Open

Conversation

hedara90
Copy link
Collaborator

@hedara90 hedara90 commented Oct 31, 2024

Description

Improved compression algorithm for 4bpp images on the GBA.
It's utilizing a modified LZ compression scheme along with an entropy encoding called tabled Asymmetric Numeral System (tANS).

Note: It's recommended that secondary tilesets use either .lz or .fastSmol to avoid stutters when passing map boundries.

People who collaborated with me in this PR

A lot of people.
@DizzyEggg Improved performance massively for the instruction decoding.
@mrgriffin, SBird and @tertu-m who has done even more optimization and answered questions about the GBA hardware whenever I had them.

Feature(s) this PR does NOT handle:

Config option to switch automatically, currently images must be manually set to .4bpp.smol.
Automatically scaling back compression mode to a faster variant if some threshold is exceeded.

Things to note in the release changelog:

  • LZDecompressVram and LZDecompressWram has been deprecated. All calls do decompress LZ compressed data should be using the wrapper functions DecompressDataWithHeaderVram or DecompressDataWithHeaderWram.
  • A new sprite compression format has been introduced. To use it, replace files with .4bpp.lz with .4bpp.smol or .4bpp.fastSmol.
  • .smol is a compression format utilizing entropy encoding in the form of tabled Asymmetric Numeral Systems (tANS) and a modified LZ style RLE/Dictionary encoding scheme specialized for the sprites used in Pokemon games. This is approximately 25% smaller than the default LZ77 compression.
  • .fastSmol skips the entropy encoding and is therefore slightly larger than the default LZ77 compression, but decoding sprites is faster than the default LZ77 compression.

Discord contact info

hedara

@hedara90 hedara90 added the new-feature Adds a feature label Oct 31, 2024
@Bassoonian Bassoonian added this to the 1.11 milestone Oct 31, 2024
src/decompress.c Outdated Show resolved Hide resolved
@mrgriffin
Copy link
Collaborator

mrgriffin commented Nov 7, 2024

I haven't looked into the performance yet at all, but I was wondering if you've done any comparisons between large LZ loads and large smol loads? Off the top of my head, I'd expect warps and connections to be quite heavy? A warp loads up to 32kB of tiles (primary + secondary), and a connection loads up to 16kB of tiles by default (secondary with NUM_TILES_IN_PRIMARY = 512). For connections in particular I'd be concerned about the load taking more than a frame and causing a hitch. There may be some UIs which also load large amounts of compressed data, the Frontier Pass or PokéNav region map maybe?

I'm not sure if there's any source-available and/or open source games out there with graphics that are more complex and detailed than vanilla, but it would be good to benchmark against those if possible. Tilesets and battle backgrounds are the first things that come to mind for potentially-significant differences in fidelity between GF and the community.

EDIT: Just to be clear, the exact cycle counts don't matter except insofar as they affect the frame counts. If a frame has (e.g.) 100k cycles that go unused then smol taking 99k cycles is free, and 101k cycles drops a frame.

@hedara90
Copy link
Collaborator Author

hedara90 commented Nov 7, 2024

NOTE: THIS DATA IS EXTREMELY OUT OF DATE, DECODING TIMES HAS BEEN MASSIVELY REDUCED
It takes quite a while for large images.
No tANS

[WARN] GBA Debug:	Mode: 0
[WARN] GBA Debug:	Bitsteam size: 0
[WARN] GBA Debug:	tANS table build time: 67
[WARN] GBA Debug:	LO decoding time: 40
[WARN] GBA Debug:	Sym decoding time: 38
[WARN] GBA Debug:	Unencoded copy time: 127357
[WARN] GBA Debug:	Instruction decoding time: 535601
[WARN] GBA Debug:	Total time: 663103

With tANS

[WARN] GBA Debug:	Mode: 5
[WARN] GBA Debug:	Bitsteam size: 1883
[WARN] GBA Debug:	tANS table build time: 17702
[WARN] GBA Debug:	LO decoding time: 383654
[WARN] GBA Debug:	Sym decoding time: 2061221
[WARN] GBA Debug:	Unencoded copy time: 75
[WARN] GBA Debug:	Instruction decoding time: 369126
[WARN] GBA Debug:	Total time: 2831778

Most of the time is spent doing symbol decoding, which makes sense, because it's a lot of symbols.
For comparison, LZ took 454313 cycles for the same image.

@mrgriffin
Copy link
Collaborator

mrgriffin commented Nov 7, 2024

It takes quite a while for large images.

Thanks for the numbers! Which image is that you're using?

For reference a frame has 280896 cycles, so:

  • LZ decode is taking 1.62 frames, ~33ms.
  • Non-tANS decode is taking 2.36 frames, ~50ms.
  • tANS decode is taking 10.08 frames, ~183ms.

It's possible that things could take an extra frame due to the overhead of everything else that goes on. Note that, e.g. the v-blank handler is a tax on every frame, and on the OW for connections a fair chunk of cycles may be spent on preparing the OAM buffer.

I'm sure it'll be possible to speed up the decode implementation somewhat. I'm not confident that a 5x speed-up is available, but it's not unheard-of :)

I suppose even if it's not possible to match LZ speeds we can always have smol as opt-in for the files which have a noticeable performance impact. That way downstream users have a way to trade performance for space if they reach a point where they've used up all 32MB.

@DizzyEggg
Copy link
Collaborator

I know this file has over 10k additions, but given that it has 139 commits, it may be a better idea to merge&squash it. Thoughts?

@hedara90
Copy link
Collaborator Author

Basically all of the commits are in 3 files, test/compression/smol.c, include/decompress.h and src/decompress.c.
Those are files that nobody should touch.
I think the only thing I did outside of those files is replacing the calls to decompression functions with the new wrapper for compressions with header.
So squash it, for the sake of the commit history.

include/sprite.h Outdated Show resolved Hide resolved
src/menu.c Outdated Show resolved Hide resolved
src/menu.c Show resolved Hide resolved
include/decompress.h Outdated Show resolved Hide resolved
include/global.h Outdated Show resolved Hide resolved
src/sprite.c Outdated Show resolved Hide resolved
src/decompress.c Outdated Show resolved Hide resolved
src/decompress.c Outdated Show resolved Hide resolved
@hedara90
Copy link
Collaborator Author

Conflicts resolved

Comment on lines +878 to +882
// Inject OW decompression here
if (OW_GFX_COMPRESS && sprite->sheetSpan)
{
imageValue = (imageValue + 1) << sprite->sheetSpan;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are left here for future reference for a potential overworld spritesheet decompressor

Comment on lines +940 to +943
{
// Inject OW frame switcher here
imageValue = (imageValue + 1) << sprite->sheetSpan;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are left here for future reference for a potential overworld spritesheet decompressor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature Adds a feature type: big feature A feature with big diffs and / or high impact / subjectivity / pervasiveness
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants