Difference between revisions of "Saxman compression"
From Sega Retro
(Creating page) |
m |
||
Line 1: | Line 1: | ||
− | '''Saxman compression''' is the name given to two nearly identical compression formats used in the ''[[Sonic the Hedgehog 2]]'' game for the [[Sega Genesis|Sega Genesis/Mega Drive]]. Its name was given in honor of the person who cracked the format, [[sonic:Saxman|saxman]]. It is a variation of the [[LZSS]] algorithm. | + | '''Saxman compression''' is the name given to two nearly identical compression formats used in the ''[[Sonic the Hedgehog 2]]'' game for the [[Sega Genesis|Sega Genesis/Mega Drive]]. Its name was given in honor of the person who cracked the format, [[sonic:User:Saxman|saxman]]. It is a variation of the [[LZSS]] algorithm. |
− | The two variants of Saxman compression used in [[Sonic the Hedgehog 2]] are: | + | The two variants of Saxman compression used in ''[[Sonic the Hedgehog 2]]'' are: |
* Saxman/z80: used to compress music. This version includes compressed data size as a (Big endian) 2-byte header at the start of the file. | * Saxman/z80: used to compress music. This version includes compressed data size as a (Big endian) 2-byte header at the start of the file. | ||
* Saxman/68k: used to compress the sound driver. This version receives the compressed data size as a parameter for the decompressor. | * Saxman/68k: used to compress the sound driver. This version receives the compressed data size as a parameter for the decompressor. |
Revision as of 16:17, 2 June 2014
Saxman compression is the name given to two nearly identical compression formats used in the Sonic the Hedgehog 2 game for the Sega Genesis/Mega Drive. Its name was given in honor of the person who cracked the format, saxman. It is a variation of the LZSS algorithm.
The two variants of Saxman compression used in Sonic the Hedgehog 2 are:
- Saxman/z80: used to compress music. This version includes compressed data size as a (Big endian) 2-byte header at the start of the file.
- Saxman/68k: used to compress the sound driver. This version receives the compressed data size as a parameter for the decompressor.
Contents
Compression Theory
Basic Format
The Saxman compression scheme is a basic LZSS compression, but it has the additional capability of having a zero fill without having encountered any zeroes before.
The compressed data follows the following format:
AA BB BB .. AA BB BB ..
The A field is referred to as the description field. This is always 1 byte in length, starting at the beginning of the compressed data, and is followed by the B field, which is referred to as the data field. This goes on for as long as is necessary to cover all the data described in the description field. After that, the pattern repeats until the end of compression sequence is encountered.
Description Field
The description field is made up of 8 bits in little endian bit (not byte) order, which means that the least significant bit is read first; in other words, each byte is read "backwards." So for example, if the description field is:
FE
then the description field, in bits, in the correct order for interpretation, will be:
[0111 1111]
Now, going from left to right, each bit is interpreted in the following way:
- If the bit is a 1, it indicates uncompressed data.
- If the bit is a 0, it indicates either a dictionary match or a zero fill.
When the last bit from a description field is read, the corresponding command is processed normally. After it has finished being processed, if the file has not ended yet, then a new descriptor field will be read from the file. If the end of file is reached (as counted by number of bytes already read, compared to the size of compression data) and there are still description field bits left, they are ignored.
Uncompressed Data
If a 1 is read in the description field, the corresponding byte in the data field is already uncompressed and simply copied to the destination as-is.
Dictionary Match/Zero Fill
Dictionary matches and zero fills are both indicated by a 0 in the description field. The next 2 bytes will then form the offset and length: the format of these bytes is:
[LLLL LLLL] [HHHH CCCC]
The CCCC bits, plus 3, is the number of bytes to be copied (for dictionary matches) or the number of zeros to write (for zero fills). The remaining bits are used to form the position to copy from. Unlike what happens in normal LZSS or in Kosinski, the offset is not completely relative in Saxman compression; instead, the position to copy from is computed based on the following pseudo-code:
source = ((%HHHHLLLLLLLL + $12) & $FFF) | (destination & $F000) ; Combine everything into an address if source >= destination then: ; If source is overshooting destination... source = (source - $1000) & $FFFF ; ... then we move source back $1000 bytes
where "destination" is the relative address where data will be copied to (based on the start of the decoding buffer), and "source" is where data will be copied from in a dictionary match. The "if...then" is there to ensure that we are dealing with a $1000-byte sliding window: dictionary matches that "overshoot" the destination are actually matches in the end of the previous $1000-byte block of the destination. There is one exception: when "destination" is less than $1000 bytes, "overshooting" the destination is actually a zero fill.
Note: the "if...then" is actually present only on the Saxman/68k decoder; the Saxman/z80 decoder has only $800 bytes in its decompression buffer, so the issue it deals with never happens without overwriting important data in the sound driver.
Examples
Zero Fill
The following Saxman/z80-compressed stream:
03 00 00 00 FF
produces the following output:
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
The first two bytes are the compressed data length, in Big endian order: that is, the compressed data is 3 bytes. The next byte is a description field: [0000 0000]. Reading the lowest bit first, we get a zero, indicating either a dictionary match or a zero fill; the next two bytes are the source address and length, and come out to a source position of $0F12 (computed as outlined above) and a length of $12. These flag a zero fill, since the source ($0F12) is after the destination ($0000), so we write $12 zeroes. When reading a new bit, we notice that all 3 bytes in the compressed data have been read, so decompression terminates.
Decompression code
An annotated version of the Saxman/z80 decompression code is provided below for reference. The code is taken from the Sonic the Hedgehog 2 sound driver.
;zsub_1271 zSaxmanDec: exx ld bc,0 ld de,0 exx ld de,zMusicData ld c,(hl) inc hl ld b,(hl) ; bc = (hl) i.e. "size of song" inc hl ld (zGetNextByte+1),hl ; modify inst. @ zGetNextByte -- set to beginning of decompression stream inc bc ld (zDecEndOrGetByte+1),bc ; modify inst. @ zDecEndOrGetByte -- set to length of song, +1 ;zloc_1288 zSaxmanReadLoop: exx ; shadow reg set srl c ; c >> 1 (active control byte) srl b ; b >> 1 (just a mask that lets us know when we need to reload) bit 0,b ; test next bit of 'b' jr nz,+ ; if it's set, we still have bits left in 'c'; jump to '+' ; If you get here, we're out of bits in 'c'! call zDecEndOrGetByte ; get next byte -> 'a' ld c,a ; a -> 'c' ld b,0FFh ; b = FFh (8 new bits in 'c') + bit 0,c ; test next bit of 'c' exx ; normal reg set jr z,+ ; if bit not set, it's a compression bit; jump to '+' ; If you get here, there's a non-compressed byte call zDecEndOrGetByte ; get next byte -> 'a' ld (de),a ; store it directly to the target memory address inc de ; de++ exx ; shadow reg set inc de ; Also increase shadow-side 'de'... relative pointer only, does not point to output Z80_RAM exx ; normal reg set jr zSaxmanReadLoop ; loop back around... + call zDecEndOrGetByte ; get next byte -> 'a' ld c,a ; a -> 'c' (low byte of target address) call zDecEndOrGetByte ; get next byte -> 'a' ld b,a ; a -> 'b' (high byte of target address + count) and 0Fh ; keep only lower four bits... add a,3 ; add 3 (minimum 3 bytes are to be read in this mode) push af ; save 'a'... ld a,b ; b -> 'a' (low byte of target address) rlca rlca rlca rlca and 0Fh ; basically (b >> 4) & 0xF (upper four bits now exclusively as lower four bits) ld b,a ; a -> 'b' (only upper four bits of value make up part of the address) ld a,c add a,12h ld c,a adc a,b sub c and 0Fh ld b,a ; bc += 12h pop af ; restore 'a' (byte count to read; no less than 3) exx ; shadow reg set push de ; keep current 'de' (relative pointer) value... ld l,a ; how many bytes we will read -> 'hl' ld h,0 add hl,de ; add current relative pointer... ex de,hl ; effectively, de += a exx ; normal reg set pop hl ; shadow 'de' -> 'hl' (relative pointer, prior to all bytes read, relative) or a ; Clear carry sbc hl,bc ; hl -= bc jr nc,+ ; if result positive, jump to '+' ex de,hl ; current output pointer -> 'hl' ld b,a ; how many bytes to load -> 'b' - ld (hl),0 ; fill in zeroes that many times inc hl djnz - ex de,hl ; output pointer updated jr zSaxmanReadLoop ; loop back around... + ld hl,zMusicData ; point at beginning of decompression point add hl,bc ; move ahead however many bytes ld c,a ld b,0 ldir jr zSaxmanReadLoop ; End of function zSaxmanDec ; ||||||||||||||| S U B R O U T I N E ||||||||||||||||||||||||||||||||||||||| ; This is an ugly countdown to zero implemented in repeatedly modifying code!! ; But basically, it starts at the full length of the song +1 (so it can decrement) ; and waits until 'hl' decrements to zero ;zsub_12E8 zDecEndOrGetByte: ld hl,0 ; "self-modified code" -- starts at full length of song +1, waits until it gets to 1... dec hl ; ... where this will be zero ld (zDecEndOrGetByte+1),hl ; "self-modifying code" -- update the count in case it's not zero ld a,h or l jr z,+ ; If 'h' and 'l' both equal zero, we quit!! ;zloc_12F3 zGetNextByte: ld hl,0 ; "self-modified code" -- get address of next compressed byte ld a,(hl) ; put it into -> 'a' inc hl ; next byte... ld (zGetNextByte+1),hl ; change inst @ zGetNextByte so it loads next compressed byte ret ; still going... + pop hl ; throws away return address to this function call so that next 'ret' exits decompressor (we're done!) ret ; Exit decompressor ; End of function zDecEndOrGetByte