Difference between revisions of "Nemesis compression"

From Sega Retro

old>Nemesis
(Added "Compression Theory" and "Usage in Megadrive Games" sections)
old>Nemesis
Line 6: Line 6:
 
The Nemesis compression format uses a traditional entropy encoding method to compress data. The source data is analysed, and a table is built of the most commonly occurring 16-bit sequences in that data. This table shows which data sequences will reduce the size of the data the most if they are compressed. In the Nemesis compression format, the table is implemented in the form of a binary tree. The most commonly occurring values are replaced with codes, which represent entries in the tree. Each code is a series of bits which represent which branch of the tree to follow. The most commonly occurring values will be encoded to require less branches to reach, and hence can be represented with very few bits. If a particular 16-bit sequence makes up 50% or more of the data, an entire root-level branch of the binary tree may be dedicated to that value, allowing a single bit to represent each occurance of that 16-bit sequence.
 
The Nemesis compression format uses a traditional entropy encoding method to compress data. The source data is analysed, and a table is built of the most commonly occurring 16-bit sequences in that data. This table shows which data sequences will reduce the size of the data the most if they are compressed. In the Nemesis compression format, the table is implemented in the form of a binary tree. The most commonly occurring values are replaced with codes, which represent entries in the tree. Each code is a series of bits which represent which branch of the tree to follow. The most commonly occurring values will be encoded to require less branches to reach, and hence can be represented with very few bits. If a particular 16-bit sequence makes up 50% or more of the data, an entire root-level branch of the binary tree may be dedicated to that value, allowing a single bit to represent each occurance of that 16-bit sequence.
  
In order to decompress archives compressed in this format, the binary tree must also be provided along with the compressed data stream. As a result, each Nemesis archive contains two sections, the first being the compressed binary tree, and the second being the compressed data itself. For performance reasons during decompression, the binary tree is flattened out into a simple table. This table is compressed using a form of run-length encoding. The higher the compression ratio for the data is, the simpler and more compressable the table will be, so archives which have the highest data compression ratio will also have the smallest compressed dictionary sizes.
+
In order to decompress archives compressed in this format, the binary tree must also be provided along with the compressed data stream. As a result, each Nemesis archive contains two sections, the first being the compressed binary tree, and the second being the compressed data itself. For performance reasons during decompression, the binary tree is flattened out into a simple table. This table is compressed using a form of run-length encoding. The higher the compression ratio for the data is, the simpler and more compressable the table will be, so archives which have the highest data compression ratio will also have the smallest compressed tree sizes.
  
 
The original compression program used by Sega to compress data into Nemesis archives most likely used the Shannon-Fano coding technique to build the binary tree. As a result, most Nemesis archives found within Megadrive roms are not optimal, and can be improved upon when more accurate methods, such as Huffman coding, are used to build the tree. As a result, compression tools for the Nemesis format have subsiquently been created which provide greater compression than the tools used by the original developers.
 
The original compression program used by Sega to compress data into Nemesis archives most likely used the Shannon-Fano coding technique to build the binary tree. As a result, most Nemesis archives found within Megadrive roms are not optimal, and can be improved upon when more accurate methods, such as Huffman coding, are used to build the tree. As a result, compression tools for the Nemesis format have subsiquently been created which provide greater compression than the tools used by the original developers.
  
 
The high compression rate offered by the Nemesis compression format made it a very appropriate format for cartridge-based systems, where storage space was at a premium. Although the Nemesis format provides high compression however, it also took a fair amount of time for the 68000 CPU to decompress. To prevent noticeable stalls while loading new sets of tiles into the VRAM, the Nemesis decompression routine limited the console to decompressing a maximum of 3 tiles worth of pattern data each frame.
 
The high compression rate offered by the Nemesis compression format made it a very appropriate format for cartridge-based systems, where storage space was at a premium. Although the Nemesis format provides high compression however, it also took a fair amount of time for the 68000 CPU to decompress. To prevent noticeable stalls while loading new sets of tiles into the VRAM, the Nemesis decompression routine limited the console to decompressing a maximum of 3 tiles worth of pattern data each frame.
 
  
 
==Usage in Megadrive Games==
 
==Usage in Megadrive Games==

Revision as of 19:44, 7 October 2007

Nemesis compression is the method created by Sega to reduce the amount of space that graphics data takes up in a Sega Megadrive ROM. A decompression algorithm takes the compressed data, unpacks it and transfers it to the VRAM. The graphics can then be displayed on the screen. The Nemesis format is specifically designed for Megadrive graphics, as it relies on the length of the data being divisible by $20 (hex), which is the exact number of bytes in a single 8x8 tile. Although Nemesis is also used for 16x16 block mappings in Sonic CD, because the data conforms to the $20 rule.

Nemesis compression is named after Nemesis, the hacker who first discovered the format and created a program to decompress it. The format is used in numerous Megadrive games, especially Sonic the Hedgehog games.

Compression Theory

The Nemesis compression format uses a traditional entropy encoding method to compress data. The source data is analysed, and a table is built of the most commonly occurring 16-bit sequences in that data. This table shows which data sequences will reduce the size of the data the most if they are compressed. In the Nemesis compression format, the table is implemented in the form of a binary tree. The most commonly occurring values are replaced with codes, which represent entries in the tree. Each code is a series of bits which represent which branch of the tree to follow. The most commonly occurring values will be encoded to require less branches to reach, and hence can be represented with very few bits. If a particular 16-bit sequence makes up 50% or more of the data, an entire root-level branch of the binary tree may be dedicated to that value, allowing a single bit to represent each occurance of that 16-bit sequence.

In order to decompress archives compressed in this format, the binary tree must also be provided along with the compressed data stream. As a result, each Nemesis archive contains two sections, the first being the compressed binary tree, and the second being the compressed data itself. For performance reasons during decompression, the binary tree is flattened out into a simple table. This table is compressed using a form of run-length encoding. The higher the compression ratio for the data is, the simpler and more compressable the table will be, so archives which have the highest data compression ratio will also have the smallest compressed tree sizes.

The original compression program used by Sega to compress data into Nemesis archives most likely used the Shannon-Fano coding technique to build the binary tree. As a result, most Nemesis archives found within Megadrive roms are not optimal, and can be improved upon when more accurate methods, such as Huffman coding, are used to build the tree. As a result, compression tools for the Nemesis format have subsiquently been created which provide greater compression than the tools used by the original developers.

The high compression rate offered by the Nemesis compression format made it a very appropriate format for cartridge-based systems, where storage space was at a premium. Although the Nemesis format provides high compression however, it also took a fair amount of time for the 68000 CPU to decompress. To prevent noticeable stalls while loading new sets of tiles into the VRAM, the Nemesis decompression routine limited the console to decompressing a maximum of 3 tiles worth of pattern data each frame.

Usage in Megadrive Games

The Nemesis compression format was widely used in many Megadrive games, both those produced by Sega, as well as a variety of third party developers. Additionally, the decompression routine for Nemesis archives embedded in roms that use the format is identical in the games in which it was used. It can be concluded from this that the Nemesis compression format was included as one of the standard libraries provided by Sega in their SDK for the console.

Although Nemesis archives were used for all the compressed art in Sonic 1, for Sonic 2, the main block of art used for level tiles in each stage was compressed using the Kosinski compression format. This was almost certanly done to decrease load times at the start of the level. Kosinski is an LZ77-based compression algorithm, which provides extremely fast decompression. While the Nemesis format was limited to only 3 tiles of decompressed data per frame, the Kosinski decompression routine may be able to decompress 100 tiles in the same time. This could eliminate up to 3 seconds worth of load times from the start of a level. Although Kosinski compression offers a lightning fast decompression rate however, the algorithm is not well suited to the large number of small matches that generally occur in most tiles. As a result, Kosinski compression generally produces a much lower compression ratio than the Nemesis format achieves for Megadrive art tiles. Additionally, while the Nemesis format can compress any data in 2 passes, the number of passes required with the Kosinski format increases as the size of the data increases. Kosinski can potentially require hundreds or even thousands of passes over the data to achieve its maximum compression ratios. As a result, the Kozinski format provides much slower compression times, which although insignificant today, may have significantly increased the time it would have taken developers to build test roms during development.

For Sonic 3 and Sonic & Knuckles, most of the art is compressed using Kosinski compression. Despite the lower compression ratio, the use of Kosinski allowed the developers to make large changes to the appearance of the level between acts, and even during the level, without any apparent seams, and without causing visible slowdowns. This was a key factor in creating some of the indoor/outdoor and other mid-level transitions used throughout these games, and allowed the developers to create maps which had much more variety within each level. The extensive use of Kosinski compression for art contributes to the larger ROM size for these later games, 2MB for each game, as opposed to 512KB for Sonic 1, and 1MB for Sonic 2.

Sega Megadrive games which use Nemesis compression