Difference between revisions of "Nemesis compression"

From Sega Retro

old>Hivebrain
(35 intermediate revisions by 15 users not shown)
Line 1: Line 1:
'''Nemesis compression''' is the method created by [[Sega]] to reduce the amount of space that graphics data takes up in a [[Sega Megadrive]] [[ROM]]. A decompression algorithm takes the compressed data, unpacks it and transfers it to the [[VRAM]]. The graphics can then be displayed on the screen. The Nemesis format is specifically designed for Megadrive graphics, as it relies on the length of the data being divisible by $20 (hex), which is the exact number of bytes in a single [[8x8 tile]]. Although Nemesis is also used for 16x16 [[block mappings]] in Sonic CD, because the data conforms to the $20 rule.
+
'''Nemesis compression''' is the method created by [[Sega]] to reduce the amount of space that graphics data takes up in a [[Sega Mega Drive]] [[ROM]]. A decompression algorithm takes the compressed data, unpacks it and transfers it to the [[VRAM]]. The graphics can then be displayed on the screen. The Nemesis format is specifically designed for Mega Drive graphics, as it relies on the length of the data being divisible by $20 (hex), which is the exact number of bytes in a single [[8x8 tile]]. However, Nemesis compression is also used for 16x16 [[block mappings]] in Sonic CD, because the data conforms to the $20 rule.
  
Nemesis compression is named after [[Nemesis]], the hacker who first created a program to decompress it. The format is used in numerous Megadrive games, especially [[Sonic the Hedgehog]] games.
+
Nemesis compression is named after Nemesis, the hacker who first created a program to decompress it. The format is used in numerous Mega Drive games, especially [[Sonic the Hedgehog]] games.
  
 
==Compression Theory==
 
==Compression Theory==
The Nemesis compression format uses a traditional entropy encoding method to compress data. The source data is analysed, and a table is built of the most commonly occurring 16-bit sequences in that data. This table shows which data sequences will reduce the size of the data the most if they are compressed. In the Nemesis compression format, the table is implemented in the form of a binary tree. The most commonly occurring values are replaced with codes, which represent entries in the tree. Each code is a series of bits which represent which branch of the tree to follow. The most commonly occurring values will be encoded to require less branches to reach, and hence can be represented with very few bits. If a particular 16-bit sequence makes up 50% or more of the data, an entire root-level branch of the binary tree may be dedicated to that value, allowing a single bit to represent each occurance of that 16-bit sequence.
+
The Nemesis compression format uses both [[wikipedia:Run-length_encoding|run-length encoding]] and [[wikipedia:Entropy_encoding|entropy encoding]] to compress data. The source data is broken into nybbles which are then grouped into runs of 1 to 8 nybbles in length; there are multiple ways to do this grouping, which is called a ''parse''. Given a parse of the file into nybble runs, the file will then be entropy encoded based on the frequency of each nybble run in the file; the code of each nybble run is used to build a dictionary which is used when encoding and decoding the file.
  
In order to decompress archives compressed in this format, the binary tree must also be provided along with the compressed data stream. As a result, each Nemesis archive contains two sections, the first being the compressed binary tree, and the second being the compressed data itself. For performance reasons during decompression, the binary tree is flattened out into a simple table. This table is compressed using a form of run-length encoding. The higher the compression ratio for the data is, the simpler and more compressable the table will be, so archives which have the highest data compression ratio will also have the smallest compressed tree sizes.
+
The final Nemesis compressed file is then composed of a 2-byte header, a dictionary associating nybble runs to their respective codes, and a bitstream (the ''encoded bitstream'') where each nybble run in the parse is replaced with its corresponding code in the dictionary. The dictionary is present in the compressed file so that it is possible to decode it; the decoder used in Mega Drive [[ROM]]s converts the dictionary into a simple table, using the code read as an index.
  
The original compression program used by Sega to compress data into Nemesis archives most likely used the Shannon-Fano coding technique to build the binary tree. As a result, most Nemesis archives found within Megadrive roms are not optimal, and can be improved upon when more accurate methods, such as Huffman coding, are used to build the tree. As a result, compression tools for the Nemesis format have subsiquently been created which provide greater compression than the tools used by the original developers.
+
There are some restrictions to what forms of entropy encoding can be used: first, is limited to 8 bits for each encoded nybble run; this happens because the dictionary leaves only 8 bits for each code. Second, the set of all codes in the dictionary must form a [[wikipedia:prefix-free code|prefix-free code]]; this makes the encoded bitstream unambiguous in its decoding, but it reduces the number of codes that can be used when entropy-encoding the file. Third, a nibble run can be directly inlined using a special 6-bit code prefix that was specified in the format; because the entropy encoding must result in a prefix-free code, neither this inlining bit pattern, nor any of its prefixes, can be used as codes when building the dictionary.
  
The high compression rate offered by the Nemesis compression format made it a very appropriate format for cartridge-based systems, where storage space was at a premium. Although the Nemesis format provides high compression however, it also took a fair amount of time for the 68000 CPU to decompress. To prevent noticeable stalls while loading new sets of tiles into the VRAM, the Nemesis decompression routine limited the console to decompressing a maximum of 3 tiles worth of pattern data each frame.
+
As noted above, there are multiple ways to parse the input file into nybble runs; each such parse will give rise to different dictionaries because the different frequencies of the nybble runs will change the resulting entropy encoding. Given a parse, the [[wikipedia:Package-merge_algorithm|package-merge algorithm]] can be used to find an optimal dictionary; but since it is not possible to know the length of each dictionary entry beforehand, or how many entries are useful to have in the dictionary, the final file size can't be known during the parsing stage. Finding a parse that gives the smallest possible file size is difficult, particularly since the number of possible parses is staggering even for a small file. Nevertheless, more modern techniques, including the aforementioned package-merge algorithm, have generally improved compression ratios compared to what was found within Mega Drive [[ROM]]s.
  
==Usage in Megadrive Games==
+
The high compression rate offered by the Nemesis compression format made it a very appropriate format for cartridge-based systems, where storage space was at a premium. Although the Nemesis format provided high compression, it also took a fair amount of time for the [[68000]] CPU to decompress. To prevent noticeable stalls while loading new sets of tiles into the VRAM, the Nemesis decompression routine limited the console to decompressing a maximum of 3 tiles worth of pattern data each frame; some variants of the decoding routines can decode 6 or even 12 tiles in a frame (with a correspondingly larger decompression buffer), but they are much more rarely used.
The Nemesis compression format was widely used in many Megadrive games, both those produced by Sega, as well as a variety of third party developers. Additionally, the decompression routine for Nemesis archives embedded in roms that use the format is identical in the games in which it was used. It can be concluded from this that the Nemesis compression format was included as one of the standard libraries provided by Sega in their SDK for the console.
 
  
Although Nemesis archives were used for all the compressed art in Sonic 1, for Sonic 2, the main block of art used for level tiles in each stage was compressed using the [[Kosinski compression]] format. This was almost certanly done to decrease load times at the start of the level. Kosinski is an LZ77-based compression algorithm, which provides extremely fast decompression. While the Nemesis format was limited to only 3 tiles of decompressed data per frame, the Kosinski decompression routine may be able to decompress 100 tiles in the same time. This could eliminate up to 3 seconds worth of load times from the start of a level. Although Kosinski compression offers a lightning fast decompression rate however, the algorithm is not well suited to the large number of small matches that generally occur in most tiles. As a result, Kosinski compression generally produces a much lower compression ratio than the Nemesis format achieves for Megadrive art tiles. Additionally, while the Nemesis format can compress any data in 2 passes, the number of passes required with the Kosinski format increases as the size of the data increases. Kosinski can potentially require hundreds or even thousands of passes over the data to achieve its maximum compression ratios. As a result, the Kosinski format provides much slower compression times, which although insignificant today, may have significantly increased the time it would have taken developers to build test roms during development.
+
==Format Description==
 +
In order to understand Nemesis compression it is vital to know how the Mega Drive stores graphics. Graphics are stored in the form of 8x8 tiles known as patterns. Each pixel in a pattern is represented by a single nybble, that nybble being the [[Palette#Mega Drive Palette|palette]] index to use for that pixel. One pattern therefore takes up 8 * 8 = 64 nybbles or 32 bytes.
  
For Sonic 3 and Sonic & Knuckles, most of the art is compressed using Kosinski compression. Despite the lower compression ratio, the use of Kosinski allowed the developers to make large changes to the appearance of the level between acts, and even during the level, without any apparent seams, and without causing visible slowdowns. This was a key factor in creating some of the indoor/outdoor and other mid-level transitions used throughout these games, and allowed the developers to create maps which had much more variety within each level. The extensive use of Kosinski compression for art contributes to the larger ROM size for these later games, 2MB for each game, as opposed to 512KB for Sonic 1, and 1MB for Sonic 2.
+
Nemesis compression works by replacing the most common nybble runs with codes representing that run. A [[wikipedia:prefix-free code|prefix-free code]] system is used, meaning that no valid code can be a prefix of a longer valid code. For example, if 10 is a valid 2-bit code, 110 is a valid 3-bit code but 100 isn't (because it starts with 10). The length of each code is selected according to the frequency of the nybble run that code represents - the most commonly occurring runs will be represented using shorter codes.
  
==Sega Megadrive games which use Nemesis compression==
+
The Nemesis compressor can operate in two modes: normal mode, in which each row of pixels in a pattern is encoded as-is, and XOR mode, in which each row of pixels in a pattern is encoded in terms of its differences from the previous row. The first word in a Nemesis-compressed file gives the number of patterns in the uncompressed data (i.e. the size of the uncompressed data / $20). The sign bit of this word being set indicates that the file uses XOR mode instead of normal mode.
* ''Alien Storm''
+
 
* ''Bare Knuckle 2'' (beta only)
+
Following this word is a section which describes the codes used in the file. The lower nybble of the first byte gives the palette index. The upper nybble of the next byte gives the number of times that palette index is to be copied - 1, its lower nybble is the length of the code in bits, and the byte after that gives the actual code itself. The byte following this can be interpreted in three ways. If it's FF, it marks the end of the section. Otherwise, if its sign bit is set, its lower nybble gives the new palette index and the two bytes following it have the same format as described above. If its sign bit is clear, however, the palette index remains unchanged and the byte contains the copy count and code length with the actual code being stored in the next byte, as described above. The process repeats until the end of the section is encountered, and a code table is built up in RAM to be used in the next stage of the decompression.
* ''Castle of Illusion''
+
 
* ''Columns''
+
The next section of the file contains the actual compressed data, which is stored as a series of bits. The bit series 111111 has a special purpose: it indicates inline data, and the 7 bits following this series have the format XXXYYYY, where YYYY gives the palette index and XXX gives the copy count - 1. All other series represent a code, and the palette index and copy count for that code are fetched from the code table. When an entire row of a pattern (i.e. 8 pixels/4 bytes long) has been decompressed, it is written directly to the destination if in normal mode, or XORed by the previous row and then written to the destination if in XOR mode. The decompression terminates when the number of decompressed patterns matches the pattern count given in the first word of the file.
* ''[[Dr. Robotnik's Mean Bean Machine]]''
+
 
* ''ESWAT''
+
==Usage in Mega Drive Games==
* ''Fatal Labyrinth''
+
The Nemesis compression format was widely used in many Mega Drive games, both those produced by Sega, as well as a variety of third party developers. Additionally, the decompression routine for Nemesis archives embedded in ROMs that use the format is identical in the games in which it was used. It can be concluded from this that the Nemesis compression format was included as one of the standard libraries provided by Sega in their SDK for the console.
* ''[[Flicky (game)|Flicky]]''
+
 
* ''Forgotten Worlds''
+
Although Nemesis archives were used for all the compressed art in ''[[Sonic 1]]'', for ''[[Sonic 2]]'', the main block of art used for level tiles in each stage was compressed using the [[Kosinski compression]] format. This was almost certainly done to decrease load times at the start of the level. Kosinski is an LZ77-based compression algorithm, which provides extremely fast decompression. While the Nemesis format was limited to only 3 tiles of decompressed data per frame, the Kosinski decompression routine may be able to decompress 100 tiles in the same time. This could eliminate up to 3 seconds worth of load times from the start of a level. Although Kosinski compression offered a lightning fast decompression rate, the algorithm was not well suited to the large number of small matches that generally occur in most tiles. As a result, Kosinski compression generally produced a much lower compression ratio than the Nemesis format achieves for Mega Drive art tiles. Additionally, while the Nemesis format could compress any data in 2 passes, the number of passes required with the Kosinski format increased with the size of the data. Kosinski could potentially require hundreds or even thousands of passes over the data to achieve its maximum compression ratios. As a result, the Kosinski format provided much slower compression times, which although insignificant today, may have significantly increased the time it would have taken developers to build test ROMs during development.
* ''Ghostbusters''
+
 
* ''Ghouls 'n' Ghosts''
+
For ''[[Sonic 3]]'' and ''[[Sonic & Knuckles]]'', most of the art was compressed using Kosinski compression. Despite the lower compression ratio, the use of Kosinski allowed the developers to make large changes to the appearance of the level between acts, and even during the level, without any apparent seams, and without causing visible slowdowns. This was a key factor in creating some of the indoor/outdoor and other mid-level transitions used throughout these games, and allowed the developers to create maps which had much more variety within each level. The extensive use of Kosinski compression for art contributed to the larger ROM size for these later games: 2MB for each game, as opposed to 512KB for Sonic 1, and 1MB for Sonic 2.
 +
 
 +
==Sega Mega Drive games which use Nemesis compression==
 +
:''This list is vastly incomplete; please help expand it.
 +
{{multicol|
 +
* ''[[Alien Storm (Mega Drive)|Alien Storm]]''
 +
* ''[[Bare Knuckle 2]]'' (beta only)
 +
* ''[[Castle of Illusion]]''
 +
* ''[[Chaotix]]'' ([[32X]])
 +
* ''[[Columns]]''
 +
* ''[[Dr. Robotnik's Mean Bean Machine]]'' (only for some art — the rest is done using a custom compression introduced in ''[[Puyo Puyo]]'')
 +
* ''[[ESWAT: City Under Siege]]''
 +
* ''[[Fatal Labyrinth]]''
 +
* ''[[Flicky]]''
 +
* ''[[Forgotten Worlds]]''
 +
* ''[[Ghostbusters]]''
 +
* ''[[Ghouls 'n' Ghosts]]''
 
* ''[[Golden Axe]]''
 
* ''[[Golden Axe]]''
* ''Golden Axe 2''
+
* ''[[Golden Axe 2]]''
* ''Golden Axe 3''
+
* ''[[Golden Axe 3]]''
* ''[[Knuckles' Chaotix]]'' ([[32X]])
+
* ''[[Jewel Master]]'' (also contains uncompressed graphics)
* ''Mercs''
+
* ''[[Magical Taruruto-kun (Mega Drive)|Magical Taruruto-kun]]''
* ''Moonwalker''
+
* ''[[Mercs]]''
* ''Phantasy Star II'' ([[SegaNet]] text adventures)
+
* ''[[Moonwalker]]''
* ''PulseMan''
+
* ''[[MUSHA]]''
* ''Quackshot''
+
* ''[[Phantasy Star II]]'' (partially — some art use another simpler compression scheme, some is uncompressed)
* ''Revenge of Shinobi''
+
* [[Phantasy Star II Text Adventures|''Phantasy Star II'' Text Adventures]]
 +
* ''[[Phantasy Star III]]''
 +
* ''[[Phantasy Star IV]]''
 +
* ''[[Psy-O-Blade]]''
 +
* ''[[Pulseman]]''
 +
* ''[[Quackshot]]'' (graphics)
 +
* ''[[Revenge of Shinobi]]''
 
* ''[[Ristar]]''
 
* ''[[Ristar]]''
 
* [[Sega Mega Anser]]
 
* [[Sega Mega Anser]]
* [[Sega Mega CD]] BIOS
+
* [[Sega Mega-CD]] BIOS
* ''Shadow Dancer''
+
* ''[[Shadow Dancer: The Secret of Shinobi]]''
 
* ''[[Sonic & Knuckles]]''
 
* ''[[Sonic & Knuckles]]''
 
* ''[[Sonic Compilation]]'' (menu & games)
 
* ''[[Sonic Compilation]]'' (menu & games)
 
* ''[[Sonic Crackers]]''
 
* ''[[Sonic Crackers]]''
* ''[[Sonic Eraser]]''
+
* ''[[sonic:Sonic Eraser|Sonic Eraser]]''
 
* ''[[Sonic Jam]]'' ([[Saturn]])
 
* ''[[Sonic Jam]]'' ([[Saturn]])
 
* ''[[Sonic the Hedgehog (16-bit)|Sonic the Hedgehog]]''
 
* ''[[Sonic the Hedgehog (16-bit)|Sonic the Hedgehog]]''
 
* ''[[Sonic the Hedgehog 2 (16-bit)|Sonic the Hedgehog 2]]''
 
* ''[[Sonic the Hedgehog 2 (16-bit)|Sonic the Hedgehog 2]]''
 
* ''[[Sonic the Hedgehog 3]]''
 
* ''[[Sonic the Hedgehog 3]]''
* ''[[Sonic the Hedgehog CD]]'' ([[Mega CD]])
+
* ''[[Sonic the Hedgehog CD]]'' ([[Mega-CD]])
* ''Streets of Rage''
+
* ''[[Streets of Rage]]''
* ''Streets of Rage 3''
+
* ''[[Streets of Rage 3]]''
* ''Strider''
+
* ''[[Strider]]''
* ''Super Hang-On''
+
* ''[[Super Hang-On]]''
* ''Super Monaco GP''
+
* ''[[Super Monaco GP]]''
* ''World Cup Italia '90''
+
* ''[[Twin Cobra]]'' (also contains uncompressed graphics)
* ''World of Illusion''
+
* ''[[World Cup Italia '90]]''
 +
* ''[[World of Illusion]]''
 +
}}
 +
 
 +
== Decompression code ==
 +
 
 +
An annotated version of the Nemesis decompression code is provided below for reference. The code is taken from ''[[sonic:Sonic 3 & Knuckles|Sonic 3 & Knuckles]]'', but the same or similar code is used in all the games which use the format.
 +
 
 +
<pre>; ---------------------------------------------------------------------------
 +
; Nemesis decompression subroutine, decompresses art directly to VRAM
 +
; Inputs:
 +
; a0 = art address
 +
; a VDP command to write to the destination VRAM address must be issued
 +
; before calling this routine
 +
; ---------------------------------------------------------------------------
 +
 
 +
; =============== S U B R O U T I N E =======================================
 +
 
 +
 
 +
Nem_Decomp:
 +
movem.l d0-a1/a3-a5,-(sp)
 +
lea (Nem_PCD_WriteRowToVDP).l,a3
 +
lea (VDP_data_port).l,a4 ; write all rows to the VDP data port
 +
bra.s Nem_Decomp_Main
 +
; End of function Nem_Decomp
 +
 
 +
; ---------------------------------------------------------------------------
 +
; Nemesis decompression subroutine, decompresses art to RAM
 +
; Inputs:
 +
; a0 = art address
 +
; a4 = destination RAM address
 +
; ---------------------------------------------------------------------------
 +
 
 +
; =============== S U B R O U T I N E =======================================
 +
 
 +
 
 +
Nem_Decomp_To_RAM:
 +
movem.l d0-a1/a3-a5,-(sp)
 +
lea (Nem_PCD_WriteRowToRAM).l,a3
 +
; End of function Nem_Decomp_To_RAM
 +
 
 +
; ---------------------------------------------------------------------------
 +
; Main Nemesis decompression subroutine
 +
; ---------------------------------------------------------------------------
 +
 
 +
; =============== S U B R O U T I N E =======================================
 +
 
 +
 
 +
Nem_Decomp_Main:
 +
lea (Nem_code_table).w,a1
 +
move.w (a0)+,d2 ; get number of patterns
 +
lsl.w #1,d2
 +
bcc.s + ; branch if the sign bit isn't set
 +
adda.w #Nem_PCD_WriteRowToVDP_XOR-Nem_PCD_WriteRowToVDP,a3 ; otherwise the file uses XOR mode
 +
+
 +
lsl.w #2,d2 ; get number of 8-pixel rows in the uncompressed data
 +
movea.w d2,a5 ; and store it in a5 because there aren't any spare data registers
 +
moveq #8,d3 ; 8 pixels in a pattern row
 +
moveq #0,d2
 +
moveq #0,d4
 +
bsr.w Nem_Build_Code_Table
 +
move.b (a0)+,d5 ; get first byte of compressed data
 +
asl.w #8,d5 ; shift up by a byte
 +
move.b (a0)+,d5 ; get second byte of compressed data
 +
move.w #$10,d6 ; set initial shift value
 +
bsr.s Nem_Process_Compressed_Data
 +
movem.l (sp)+,d0-a1/a3-a5
 +
rts
 +
; End of function Nem_Decomp_Main
 +
 
 +
; ---------------------------------------------------------------------------
 +
; Part of the Nemesis decompressor, processes the actual compressed data
 +
; ---------------------------------------------------------------------------
 +
 
 +
; =============== S U B R O U T I N E =======================================
 +
 
 +
; PCD is used throughout this subroutine as an initialism for Process_Compressed_Data
 +
Nem_Process_Compressed_Data:
 +
move.w d6,d7
 +
subq.w #8,d7 ; get shift value
 +
move.w d5,d1
 +
lsr.w d7,d1 ; shift so that high bit of the code is in bit position 7
 +
cmpi.b #%11111100,d1 ; are the high 6 bits set?
 +
bcc.s Nem_PCD_InlineData ; if they are, it signifies inline data
 +
andi.w #$FF,d1
 +
add.w d1,d1
 +
move.b (a1,d1.w),d0 ; get the length of the code in bits
 +
ext.w d0
 +
sub.w d0,d6 ; subtract from shift value so that the next code is read next time around
 +
cmpi.w #9,d6 ; does a new byte need to be read?
 +
bcc.s + ; if not, branch
 +
addq.w #8,d6
 +
asl.w #8,d5
 +
move.b (a0)+,d5 ; read next byte
 +
+
 +
move.b 1(a1,d1.w),d1
 +
move.w d1,d0
 +
andi.w #$F,d1 ; get palette index for pixel
 +
andi.w #$F0,d0
 +
 
 +
Nem_PCD_GetRepeatCount:
 +
lsr.w #4,d0 ; get repeat count
 +
 
 +
Nem_PCD_WritePixel:
 +
lsl.l #4,d4 ; shift up by a nybble
 +
or.b d1,d4 ; write pixel
 +
subq.w #1,d3 ; has an entire 8-pixel row been written?
 +
bne.s Nem_PCD_WritePixel_Loop ; if not, loop
 +
jmp (a3) ; otherwise, write the row to its destination
 +
; ---------------------------------------------------------------------------
 +
 
 +
Nem_PCD_NewRow:
 +
moveq #0,d4 ; reset row
 +
moveq #8,d3 ; reset nybble counter
 +
 
 +
Nem_PCD_WritePixel_Loop:
 +
dbf d0,Nem_PCD_WritePixel
 +
bra.s Nem_Process_Compressed_Data
 +
; ---------------------------------------------------------------------------
 +
 
 +
Nem_PCD_InlineData:
 +
subq.w #6,d6 ; 6 bits needed to signal inline data
 +
cmpi.w #9,d6
 +
bcc.s +
 +
addq.w #8,d6
 +
asl.w #8,d5
 +
move.b (a0)+,d5
 +
+
 +
subq.w #7,d6 ; and 7 bits needed for the inline data itself
 +
move.w d5,d1
 +
lsr.w d6,d1 ; shift so that low bit of the code is in bit position 0
 +
move.w d1,d0
 +
andi.w #$F,d1 ; get palette index for pixel
 +
andi.w #$70,d0 ; high nybble is repeat count for pixel
 +
cmpi.w #9,d6
 +
bcc.s Nem_PCD_GetRepeatCount
 +
addq.w #8,d6
 +
asl.w #8,d5
 +
move.b (a0)+,d5
 +
bra.s Nem_PCD_GetRepeatCount
 +
; ---------------------------------------------------------------------------
 +
 
 +
Nem_PCD_WriteRowToVDP:
 +
move.l d4,(a4) ; write 8-pixel row
 +
subq.w #1,a5
 +
move.w a5,d4 ; have all the 8-pixel rows been written?
 +
bne.s Nem_PCD_NewRow ; if not, branch
 +
rts ; otherwise the decompression is finished
 +
; ---------------------------------------------------------------------------
 +
 
 +
Nem_PCD_WriteRowToVDP_XOR:
 +
eor.l d4,d2 ; XOR the previous row by the current row
 +
move.l d2,(a4) ; and write the result
 +
subq.w #1,a5
 +
move.w a5,d4
 +
bne.s Nem_PCD_NewRow
 +
rts
 +
; ---------------------------------------------------------------------------
 +
 
 +
Nem_PCD_WriteRowToRAM:
 +
move.l d4,(a4)+
 +
subq.w #1,a5
 +
move.w a5,d4
 +
bne.s Nem_PCD_NewRow
 +
rts
 +
; ---------------------------------------------------------------------------
 +
 
 +
Nem_PCD_WriteRowToRAM_XOR:
 +
eor.l d4,d2
 +
move.l d2,(a4)+
 +
subq.w #1,a5
 +
move.w a5,d4
 +
bne.s Nem_PCD_NewRow
 +
rts
 +
; End of function Nem_Process_Compressed_Data
 +
 
 +
; ---------------------------------------------------------------------------
 +
; Part of the Nemesis decompressor, builds the code table (in RAM)
 +
; ---------------------------------------------------------------------------
 +
 
 +
; =============== S U B R O U T I N E =======================================
 +
 
 +
; BCT is used throughout this subroutine as an initialism for Build_Code_Table
 +
Nem_Build_Code_Table:
 +
move.b (a0)+,d0 ; read first byte
 +
 
 +
Nem_BCT_ChkEnd:
 +
cmpi.b #$FF,d0 ; has the end of the code table description been reached?
 +
bne.s Nem_BCT_NewPalIndex ; if not, branch
 +
rts ; otherwise, this subroutine's work is done
 +
; ---------------------------------------------------------------------------
 +
 
 +
Nem_BCT_NewPalIndex:
 +
move.w d0,d7
 +
 
 +
Nem_BCT_Loop:
 +
move.b (a0)+,d0 ; read next byte
 +
cmpi.b #$80,d0 ; sign bit being set signifies a new palette index
 +
bcc.s Nem_BCT_ChkEnd ; a bmi could have been used instead of a compare and bcc
 +
move.b d0,d1
 +
andi.w #$F,d7 ; get palette index
 +
andi.w #$70,d1 ; get repeat count for palette index
 +
or.w d1,d7 ; combine the two
 +
andi.w #$F,d0 ; get the length of the code in bits
 +
move.b d0,d1
 +
lsl.w #8,d1
 +
or.w d1,d7 ; combine with palette index and repeat count to form code table entry
 +
moveq #8,d1
 +
sub.w d0,d1 ; is the code 8 bits long?
 +
bne.s Nem_BCT_ShortCode ; if not, a bit of extra processing is needed
 +
move.b (a0)+,d0 ; get code
 +
add.w d0,d0 ; each code gets a word-sized entry in the table
 +
move.w d7,(a1,d0.w) ; store the entry for the code
 +
bra.s Nem_BCT_Loop ; repeat
 +
; ---------------------------------------------------------------------------
 +
 
 +
; the Nemesis decompressor uses prefix-free codes (no valid code is a prefix of a longer code)
 +
; e.g. if 10 is a valid 2-bit code, 110 is a valid 3-bit code but 100 isn't
 +
; also, when the actual compressed data is processed the high bit of each code is in bit position 7
 +
; so the code needs to be bit-shifted appropriately over here before being used as a code table index
 +
; additionally, the code needs multiple entries in the table because no masking is done during compressed data processing
 +
; so if 11000 is a valid code then all indices of the form 11000XXX need to have the same entry
 +
Nem_BCT_ShortCode:
 +
move.b (a0)+,d0 ; get code
 +
lsl.w d1,d0 ; shift so that high bit is in bit position 7
 +
add.w d0,d0 ; get index into code table
 +
moveq #1,d5
 +
lsl.w d1,d5
 +
subq.w #1,d5 ; d5 = 2^d1 - 1
 +
 
 +
Nem_BCT_ShortCode_Loop:
 +
move.w d7,(a1,d0.w) ; store entry
 +
addq.w #2,d0 ; increment index
 +
dbf d5,Nem_BCT_ShortCode_Loop ; repeat for required number of entries
 +
bra.s Nem_BCT_Loop
 +
; End of function Nem_Build_Code_Table</pre>
  
[[Category:Data Formats]]
+
[[Category:Data compression]]

Revision as of 16:56, 14 June 2020

Nemesis compression is the method created by Sega to reduce the amount of space that graphics data takes up in a Sega Mega Drive ROM. A decompression algorithm takes the compressed data, unpacks it and transfers it to the VRAM. The graphics can then be displayed on the screen. The Nemesis format is specifically designed for Mega Drive graphics, as it relies on the length of the data being divisible by $20 (hex), which is the exact number of bytes in a single 8x8 tile. However, Nemesis compression is also used for 16x16 block mappings in Sonic CD, because the data conforms to the $20 rule.

Nemesis compression is named after Nemesis, the hacker who first created a program to decompress it. The format is used in numerous Mega Drive games, especially Sonic the Hedgehog games.

Compression Theory

The Nemesis compression format uses both run-length encoding and entropy encoding to compress data. The source data is broken into nybbles which are then grouped into runs of 1 to 8 nybbles in length; there are multiple ways to do this grouping, which is called a parse. Given a parse of the file into nybble runs, the file will then be entropy encoded based on the frequency of each nybble run in the file; the code of each nybble run is used to build a dictionary which is used when encoding and decoding the file.

The final Nemesis compressed file is then composed of a 2-byte header, a dictionary associating nybble runs to their respective codes, and a bitstream (the encoded bitstream) where each nybble run in the parse is replaced with its corresponding code in the dictionary. The dictionary is present in the compressed file so that it is possible to decode it; the decoder used in Mega Drive ROMs converts the dictionary into a simple table, using the code read as an index.

There are some restrictions to what forms of entropy encoding can be used: first, is limited to 8 bits for each encoded nybble run; this happens because the dictionary leaves only 8 bits for each code. Second, the set of all codes in the dictionary must form a prefix-free code; this makes the encoded bitstream unambiguous in its decoding, but it reduces the number of codes that can be used when entropy-encoding the file. Third, a nibble run can be directly inlined using a special 6-bit code prefix that was specified in the format; because the entropy encoding must result in a prefix-free code, neither this inlining bit pattern, nor any of its prefixes, can be used as codes when building the dictionary.

As noted above, there are multiple ways to parse the input file into nybble runs; each such parse will give rise to different dictionaries because the different frequencies of the nybble runs will change the resulting entropy encoding. Given a parse, the package-merge algorithm can be used to find an optimal dictionary; but since it is not possible to know the length of each dictionary entry beforehand, or how many entries are useful to have in the dictionary, the final file size can't be known during the parsing stage. Finding a parse that gives the smallest possible file size is difficult, particularly since the number of possible parses is staggering even for a small file. Nevertheless, more modern techniques, including the aforementioned package-merge algorithm, have generally improved compression ratios compared to what was found within Mega Drive ROMs.

The high compression rate offered by the Nemesis compression format made it a very appropriate format for cartridge-based systems, where storage space was at a premium. Although the Nemesis format provided high compression, it also took a fair amount of time for the 68000 CPU to decompress. To prevent noticeable stalls while loading new sets of tiles into the VRAM, the Nemesis decompression routine limited the console to decompressing a maximum of 3 tiles worth of pattern data each frame; some variants of the decoding routines can decode 6 or even 12 tiles in a frame (with a correspondingly larger decompression buffer), but they are much more rarely used.

Format Description

In order to understand Nemesis compression it is vital to know how the Mega Drive stores graphics. Graphics are stored in the form of 8x8 tiles known as patterns. Each pixel in a pattern is represented by a single nybble, that nybble being the palette index to use for that pixel. One pattern therefore takes up 8 * 8 = 64 nybbles or 32 bytes.

Nemesis compression works by replacing the most common nybble runs with codes representing that run. A prefix-free code system is used, meaning that no valid code can be a prefix of a longer valid code. For example, if 10 is a valid 2-bit code, 110 is a valid 3-bit code but 100 isn't (because it starts with 10). The length of each code is selected according to the frequency of the nybble run that code represents - the most commonly occurring runs will be represented using shorter codes.

The Nemesis compressor can operate in two modes: normal mode, in which each row of pixels in a pattern is encoded as-is, and XOR mode, in which each row of pixels in a pattern is encoded in terms of its differences from the previous row. The first word in a Nemesis-compressed file gives the number of patterns in the uncompressed data (i.e. the size of the uncompressed data / $20). The sign bit of this word being set indicates that the file uses XOR mode instead of normal mode.

Following this word is a section which describes the codes used in the file. The lower nybble of the first byte gives the palette index. The upper nybble of the next byte gives the number of times that palette index is to be copied - 1, its lower nybble is the length of the code in bits, and the byte after that gives the actual code itself. The byte following this can be interpreted in three ways. If it's FF, it marks the end of the section. Otherwise, if its sign bit is set, its lower nybble gives the new palette index and the two bytes following it have the same format as described above. If its sign bit is clear, however, the palette index remains unchanged and the byte contains the copy count and code length with the actual code being stored in the next byte, as described above. The process repeats until the end of the section is encountered, and a code table is built up in RAM to be used in the next stage of the decompression.

The next section of the file contains the actual compressed data, which is stored as a series of bits. The bit series 111111 has a special purpose: it indicates inline data, and the 7 bits following this series have the format XXXYYYY, where YYYY gives the palette index and XXX gives the copy count - 1. All other series represent a code, and the palette index and copy count for that code are fetched from the code table. When an entire row of a pattern (i.e. 8 pixels/4 bytes long) has been decompressed, it is written directly to the destination if in normal mode, or XORed by the previous row and then written to the destination if in XOR mode. The decompression terminates when the number of decompressed patterns matches the pattern count given in the first word of the file.

Usage in Mega Drive Games

The Nemesis compression format was widely used in many Mega Drive games, both those produced by Sega, as well as a variety of third party developers. Additionally, the decompression routine for Nemesis archives embedded in ROMs that use the format is identical in the games in which it was used. It can be concluded from this that the Nemesis compression format was included as one of the standard libraries provided by Sega in their SDK for the console.

Although Nemesis archives were used for all the compressed art in Sonic 1, for Sonic 2, the main block of art used for level tiles in each stage was compressed using the Kosinski compression format. This was almost certainly done to decrease load times at the start of the level. Kosinski is an LZ77-based compression algorithm, which provides extremely fast decompression. While the Nemesis format was limited to only 3 tiles of decompressed data per frame, the Kosinski decompression routine may be able to decompress 100 tiles in the same time. This could eliminate up to 3 seconds worth of load times from the start of a level. Although Kosinski compression offered a lightning fast decompression rate, the algorithm was not well suited to the large number of small matches that generally occur in most tiles. As a result, Kosinski compression generally produced a much lower compression ratio than the Nemesis format achieves for Mega Drive art tiles. Additionally, while the Nemesis format could compress any data in 2 passes, the number of passes required with the Kosinski format increased with the size of the data. Kosinski could potentially require hundreds or even thousands of passes over the data to achieve its maximum compression ratios. As a result, the Kosinski format provided much slower compression times, which although insignificant today, may have significantly increased the time it would have taken developers to build test ROMs during development.

For Sonic 3 and Sonic & Knuckles, most of the art was compressed using Kosinski compression. Despite the lower compression ratio, the use of Kosinski allowed the developers to make large changes to the appearance of the level between acts, and even during the level, without any apparent seams, and without causing visible slowdowns. This was a key factor in creating some of the indoor/outdoor and other mid-level transitions used throughout these games, and allowed the developers to create maps which had much more variety within each level. The extensive use of Kosinski compression for art contributed to the larger ROM size for these later games: 2MB for each game, as opposed to 512KB for Sonic 1, and 1MB for Sonic 2.

Sega Mega Drive games which use Nemesis compression

This list is vastly incomplete; please help expand it.

Decompression code

An annotated version of the Nemesis decompression code is provided below for reference. The code is taken from Sonic 3 & Knuckles, but the same or similar code is used in all the games which use the format.

; ---------------------------------------------------------------------------
; Nemesis decompression subroutine, decompresses art directly to VRAM
; Inputs:
; a0 = art address
; a VDP command to write to the destination VRAM address must be issued
; before calling this routine
; ---------------------------------------------------------------------------

; =============== S U B R O U T I N E =======================================


Nem_Decomp:
		movem.l	d0-a1/a3-a5,-(sp)
		lea	(Nem_PCD_WriteRowToVDP).l,a3
		lea	(VDP_data_port).l,a4	; write all rows to the VDP data port
		bra.s	Nem_Decomp_Main
; End of function Nem_Decomp

; ---------------------------------------------------------------------------
; Nemesis decompression subroutine, decompresses art to RAM
; Inputs:
; a0 = art address
; a4 = destination RAM address
; ---------------------------------------------------------------------------

; =============== S U B R O U T I N E =======================================


Nem_Decomp_To_RAM:
		movem.l	d0-a1/a3-a5,-(sp)
		lea	(Nem_PCD_WriteRowToRAM).l,a3
; End of function Nem_Decomp_To_RAM

; ---------------------------------------------------------------------------
; Main Nemesis decompression subroutine
; ---------------------------------------------------------------------------

; =============== S U B R O U T I N E =======================================


Nem_Decomp_Main:
		lea	(Nem_code_table).w,a1
		move.w	(a0)+,d2	; get number of patterns
		lsl.w	#1,d2
		bcc.s	+	; branch if the sign bit isn't set
		adda.w	#Nem_PCD_WriteRowToVDP_XOR-Nem_PCD_WriteRowToVDP,a3	; otherwise the file uses XOR mode
+
		lsl.w	#2,d2	; get number of 8-pixel rows in the uncompressed data
		movea.w	d2,a5	; and store it in a5 because there aren't any spare data registers
		moveq	#8,d3	; 8 pixels in a pattern row
		moveq	#0,d2
		moveq	#0,d4
		bsr.w	Nem_Build_Code_Table
		move.b	(a0)+,d5	; get first byte of compressed data
		asl.w	#8,d5	; shift up by a byte
		move.b	(a0)+,d5	; get second byte of compressed data
		move.w	#$10,d6	; set initial shift value
		bsr.s	Nem_Process_Compressed_Data
		movem.l	(sp)+,d0-a1/a3-a5
		rts
; End of function Nem_Decomp_Main

; ---------------------------------------------------------------------------
; Part of the Nemesis decompressor, processes the actual compressed data
; ---------------------------------------------------------------------------

; =============== S U B R O U T I N E =======================================

; PCD is used throughout this subroutine as an initialism for Process_Compressed_Data
Nem_Process_Compressed_Data:
		move.w	d6,d7
		subq.w	#8,d7	; get shift value
		move.w	d5,d1
		lsr.w	d7,d1	; shift so that high bit of the code is in bit position 7
		cmpi.b	#%11111100,d1	; are the high 6 bits set?
		bcc.s	Nem_PCD_InlineData	; if they are, it signifies inline data
		andi.w	#$FF,d1
		add.w	d1,d1
		move.b	(a1,d1.w),d0	; get the length of the code in bits
		ext.w	d0
		sub.w	d0,d6	; subtract from shift value so that the next code is read next time around
		cmpi.w	#9,d6	; does a new byte need to be read?
		bcc.s	+	; if not, branch
		addq.w	#8,d6
		asl.w	#8,d5
		move.b	(a0)+,d5	; read next byte
+
		move.b	1(a1,d1.w),d1
		move.w	d1,d0
		andi.w	#$F,d1	; get palette index for pixel
		andi.w	#$F0,d0

Nem_PCD_GetRepeatCount:
		lsr.w	#4,d0	; get repeat count

Nem_PCD_WritePixel:
		lsl.l	#4,d4	; shift up by a nybble
		or.b	d1,d4	; write pixel
		subq.w	#1,d3	; has an entire 8-pixel row been written?
		bne.s	Nem_PCD_WritePixel_Loop	; if not, loop
		jmp	(a3)	; otherwise, write the row to its destination
; ---------------------------------------------------------------------------

Nem_PCD_NewRow:
		moveq	#0,d4	; reset row
		moveq	#8,d3	; reset nybble counter

Nem_PCD_WritePixel_Loop:
		dbf	d0,Nem_PCD_WritePixel
		bra.s	Nem_Process_Compressed_Data
; ---------------------------------------------------------------------------

Nem_PCD_InlineData:
		subq.w	#6,d6	; 6 bits needed to signal inline data
		cmpi.w	#9,d6
		bcc.s	+
		addq.w	#8,d6
		asl.w	#8,d5
		move.b	(a0)+,d5
+
		subq.w	#7,d6	; and 7 bits needed for the inline data itself
		move.w	d5,d1
		lsr.w	d6,d1	; shift so that low bit of the code is in bit position 0
		move.w	d1,d0
		andi.w	#$F,d1	; get palette index for pixel
		andi.w	#$70,d0	; high nybble is repeat count for pixel
		cmpi.w	#9,d6
		bcc.s	Nem_PCD_GetRepeatCount
		addq.w	#8,d6
		asl.w	#8,d5
		move.b	(a0)+,d5
		bra.s	Nem_PCD_GetRepeatCount
; ---------------------------------------------------------------------------

Nem_PCD_WriteRowToVDP:
		move.l	d4,(a4)	; write 8-pixel row
		subq.w	#1,a5
		move.w	a5,d4	; have all the 8-pixel rows been written?
		bne.s	Nem_PCD_NewRow	; if not, branch
		rts	; otherwise the decompression is finished
; ---------------------------------------------------------------------------

Nem_PCD_WriteRowToVDP_XOR:
		eor.l	d4,d2	; XOR the previous row by the current row
		move.l	d2,(a4)	; and write the result
		subq.w	#1,a5
		move.w	a5,d4
		bne.s	Nem_PCD_NewRow
		rts
; ---------------------------------------------------------------------------

Nem_PCD_WriteRowToRAM:
		move.l	d4,(a4)+
		subq.w	#1,a5
		move.w	a5,d4
		bne.s	Nem_PCD_NewRow
		rts
; ---------------------------------------------------------------------------

Nem_PCD_WriteRowToRAM_XOR:
		eor.l	d4,d2
		move.l	d2,(a4)+
		subq.w	#1,a5
		move.w	a5,d4
		bne.s	Nem_PCD_NewRow
		rts
; End of function Nem_Process_Compressed_Data

; ---------------------------------------------------------------------------
; Part of the Nemesis decompressor, builds the code table (in RAM)
; ---------------------------------------------------------------------------

; =============== S U B R O U T I N E =======================================

; BCT is used throughout this subroutine as an initialism for Build_Code_Table
Nem_Build_Code_Table:
		move.b	(a0)+,d0	; read first byte

Nem_BCT_ChkEnd:
		cmpi.b	#$FF,d0	; has the end of the code table description been reached?
		bne.s	Nem_BCT_NewPalIndex	; if not, branch
		rts	; otherwise, this subroutine's work is done
; ---------------------------------------------------------------------------

Nem_BCT_NewPalIndex:
		move.w	d0,d7

Nem_BCT_Loop:
		move.b	(a0)+,d0	; read next byte
		cmpi.b	#$80,d0	; sign bit being set signifies a new palette index
		bcc.s	Nem_BCT_ChkEnd	; a bmi could have been used instead of a compare and bcc
		move.b	d0,d1
		andi.w	#$F,d7	; get palette index
		andi.w	#$70,d1	; get repeat count for palette index
		or.w	d1,d7	; combine the two
		andi.w	#$F,d0	; get the length of the code in bits
		move.b	d0,d1
		lsl.w	#8,d1
		or.w	d1,d7	; combine with palette index and repeat count to form code table entry
		moveq	#8,d1
		sub.w	d0,d1	; is the code 8 bits long?
		bne.s	Nem_BCT_ShortCode	; if not, a bit of extra processing is needed
		move.b	(a0)+,d0	; get code
		add.w	d0,d0	; each code gets a word-sized entry in the table
		move.w	d7,(a1,d0.w)	; store the entry for the code
		bra.s	Nem_BCT_Loop	; repeat
; ---------------------------------------------------------------------------

; the Nemesis decompressor uses prefix-free codes (no valid code is a prefix of a longer code)
; e.g. if 10 is a valid 2-bit code, 110 is a valid 3-bit code but 100 isn't
; also, when the actual compressed data is processed the high bit of each code is in bit position 7
; so the code needs to be bit-shifted appropriately over here before being used as a code table index
; additionally, the code needs multiple entries in the table because no masking is done during compressed data processing
; so if 11000 is a valid code then all indices of the form 11000XXX need to have the same entry
Nem_BCT_ShortCode:
		move.b	(a0)+,d0	; get code
		lsl.w	d1,d0	; shift so that high bit is in bit position 7
		add.w	d0,d0	; get index into code table
		moveq	#1,d5
		lsl.w	d1,d5
		subq.w	#1,d5	; d5 = 2^d1 - 1

Nem_BCT_ShortCode_Loop:
		move.w	d7,(a1,d0.w)	; store entry
		addq.w	#2,d0	; increment index
		dbf	d5,Nem_BCT_ShortCode_Loop	; repeat for required number of entries
		bra.s	Nem_BCT_Loop
; End of function Nem_Build_Code_Table