Difference between revisions of "Kosinski compression"

From Sega Retro

old>SoNick
m (Categorization)
old>ElFly
Line 9: Line 9:
 
(Document originally written by [[Sonic Hachelle-Bee]] and posted [http://www.sws2b.com/forums/index.php?showtopic=3482 here])
 
(Document originally written by [[Sonic Hachelle-Bee]] and posted [http://www.sws2b.com/forums/index.php?showtopic=3482 here])
  
The aim of this topic is to explain how the famous Brett Kosinski format works.Reading this post, you will learn how to compress and decompress this compression format all in Hex editing, without utilities.THEORY:First of all, you have to know that in all the Brett Kosinski format, there is some 16 bits value (2 bytes) whose tell the game which byte is uncompressed data or not.These 16 bits values (called bitfields by Brett Kosinski) begins at start, the first 2 bytes of the compressed data.As an example, we have this begining of compressed data:53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...53 FB is this 16 bits value. Let's convert it into bits:53 FB = 0101 0011 1111 1011You have to read it carefully. Cut it into 2 parts in the middle, paste the left part after the right part, then read it from right to left, like this:0101 0011 1111 1011 --> 1111 1011 0101 0011 --> 1100 1010 1101 1111Now, all bits of these 16 represent one byte and his status: uncompressed, compressed (format).There is 3 compression forms you can encounter into Brett Kosinski format. I will use the terms of Brett Kosinski himself to call them:1 --> Uncompressed byte (UB).01 --> Separate compression (SC).00 XX --> Inline compression (IC).In our example, we have this:53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...1100 1010 1101 1111 --> 1 (UB) / 1 (UB) / 00 10 (IC) / 1 (UB) / 01 (SC) / 1 (UB) / 01 (SC) / 1 (UB) / 1 (UB) / 1 (UB) / 1 (UB)Then:01 is an UB. Read it as it.23 is another UB.FE is IC. I will explain it later.67 is an UB.FF F8 0D is SC. I will explain it later.98 is an UB.FF FD is SC.65 is an UB.98 is an UB.75 is an UB.FB B2 is the next bitfield, before the last byte of the previous bitfield.00 is an UB....After the end of this 16 bits bitfield, the game will check for another coming after. The next is before the last byte (or compression form) referred by the current bitfield. In this example, the next bitfield is FB B2. Convert it into bits again, and continue the code with 15...Inline compression:This is when you read something like this under the bitfield:00 XXXX is a value that tells how much bytes to copy after the current uncompressed byte:00 --> 2 bytes.01 --> 3 bytes.10 --> 4 bytes.11 --> 5 bytes.The byte this 00 XX value refers to, is a negative value.In our previous example, it's FE = -2.The game will read all the previous bytes this negative value is telling, and will copy them for XX bytes.In our example: 01 23 FEAnd the bitfield value of the byte FE is 00 10, XX = 10.Then, 01 23 FE = 01 23 01 23 01 23.Writing FF instead of FE will make this: 01 23 FF = 01 23 23 23 23 23
+
The aim of this topic is to explain how the famous Brett Kosinski format works. Reading this post, you will learn how to compress and decompress this compression format all in Hex editing, without utilities.
.Separate compression:This is when there is 01 under the bitfield.The separate compression uses at least 2 bytes, and sometimes 3 (the last is optional). In a binary form:NN DC (CC) = NNNN NNNN DDDD DCCC (CCCC CCCC)NN is another negative value. Unlike IC, this one will tell the game where to read/copy the (uncompressed) data from. Writing FF for NN will read/copy the previous byte only. Writing another value will read/copy the previous bytes this value refers to, until you have CC of them.DD is again a negative value on 5 bits. This is an addon to NN. Take NN and substract 256 (100 in hex) * |DD+1|. Take the result as your new NN value.DD = 11111 = -1 --> NN = NN - (256 * |-1+1|) = NN (Do nothing)DD = 11110 = -2 --> NN = NN - (256 * |-2+1|) = NN - 256DD = 11101 = -3 --> NN = NN - (256 * |-3+1|) = NN - 256 * 2 = NN - 512DD = 11100 = -4 --> NN = NN - (256 * |-4+1|) = NN - 256 * 3 = NN - 768...DD = 00000 = -32 --> NN = NN - (256 * |-32+1|) = NN - 256 * 31 = NN - 7936Example:66 67 FF F8 0D --> NN = -1 and DD = 11111 --> NN = -1, we are starting at 67.66 67 FE F8 0D --> NN = -2 and DD = 11111 --> NN = -2, we are starting at 66....66 67 FF F0 0D --> NN = -1 and DD = 11110 --> NN = -257, we are starting 257 bytes before.66 67 FD E7 --> NN = -3 and DD = 11100 --> NN = -771, we are starting 771 bytes before.CC: Count.If you have no more than 9 bytes to read/copy, then the last CC byte is useless.F0 --> Like F8, be careful of the DD value.F1 --> Copy the read data for 3 bytes, be careful of the DD value.F2 --> Copy the read data for 4 bytes, be careful of the DD value.F3 --> Copy the read data for 5 bytes, be careful of the DD value.F4 --> Copy the read data for 6 bytes, be careful of the DD value.F5 --> Copy the read data for 7 bytes, be careful of the DD value.F6 --> Copy the read data for 8 bytes, be careful of the DD value.F7 --> Copy the read data for 9 bytes, be careful of the DD value.F8 --> Read next.F9 --> Copy the read data for 3 bytes.FA --> Copy the read data for 4 bytes.FB --> Copy the read data for 5 bytes.FC --> Copy the read data for 6 bytes.FD --> Copy the read data for 7 bytes.FE --> Copy the read data for 8 bytes.FF --> Copy the read data for 9 bytes.Else, write F8 = 1111 1000 (CCC = 000) for the first count, and use the last byte to actually write your count (-1):F8 09 --> Copy the read data for 10 bytes.F8 0A --> Copy the read data for 11 bytes....Writing 00 F8 00 (NN = 00 and both counts are 00's) will end the compressed data.In our example:53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...We have 2 SC: FF F8 0D and FF FD.67 FF F8 0D = 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 (67 + 0Ex67)98 FF FD = 98 98 98 98 98 98 98 98 (98 + 07x98)Another: 12 34 56 78 9A BC DE 10 FA F9 = 12 34 56 78 9A BC DE 10 56 78 9AEXAMPLE:We have this compressed data:FF 3F 54 3B C4 44 54 33 33 5B 2D 5C 44 5C C4 C5FC 15 FE C3 44 78 88 98 44 30 FF FF 00 F8 00Bitfields:FF 3F = 1111 1111 0011 1111 --> 1111 1111 1111 1100FC 15 = 1111 1100 0001 0101 --> 0011 1111 1010 1000Green color: Uncompressed byte.Red color: Separate compression.Blue color: Inline compression.Uncompressed data:54 3B C4 44 54 33 33 5B 2D 5C 44 5C C4 C5 C4 C5C3 44 78 88 98 44 30 30 30 30 30 30 30 30 30 30I hope someone will understand something.
+
 
 +
THEORY:First of all, you have to know that in all the Brett Kosinski format, there is some 16 bits value (2 bytes) whose tell the game which byte is uncompressed data or not.These 16 bits values (called bitfields by Brett Kosinski) begins at start, the first 2 bytes of the compressed data.
 +
 
 +
As an example, we have this begining of compressed data:
 +
 
 +
 
 +
'''53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...'''
 +
 
 +
 
 +
53 FB is this 16 bits value.  
 +
 
 +
Let's convert it into bits:
 +
 
 +
 
 +
'''53 FB = 0101 0011 1111 1011'''
 +
 
 +
 
 +
You have to read it carefully. Cut it into 2 parts in the middle, paste the left part after the right part, then read it from right to left, like this:
 +
 
 +
 
 +
'''0101 0011 1111 1011 --> 1111 1011 0101 0011 --> 1100 1010 1101 1111'''
 +
 
 +
 
 +
Now, all bits of these 16 represent one byte and his status: uncompressed, compressed (format). There is 3 compression forms you can encounter into Brett Kosinski format. I will use the terms of Brett Kosinski himself to call them:
 +
 
 +
1 --> Uncompressed byte (UB).
 +
01 --> Separate compression (SC).
 +
00 XX --> Inline compression (IC).
 +
 
 +
In our example, we have this:
 +
 
 +
 
 +
'''53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...1100 1010 1101 1111 --> 1 (UB) / 1 (UB) / 00 10 (IC) / 1 (UB) / 01 (SC) / 1 (UB) / 01 (SC) / 1 (UB) / 1 (UB) / 1 (UB) / 1 (UB)'''
 +
 
 +
Then:01 is an UB. Read it as it. 23 is another UB. FE is IC. I will explain it later. 67 is an UB. FF F8 0D is SC. I will explain it later. 98 is an UB. FF FD is SC. 65 is an UB. 98 is an UB. 75 is an UB. FB B2 is the next bitfield, before the last byte of the previous bitfield. 00 is an UB....After the end of this 16 bits bitfield, the game will check for another coming after. The next is before the last byte (or compression form) referred by the current bitfield. In this example, the next bitfield is FB B2. Convert it into bits again, and continue the code with 15...
 +
 
 +
 
 +
Inline compression: This is when you read something like this under the bitfield:
 +
 
 +
00 XXXX is a value that tells how much bytes to copy after the current uncompressed byte:  
 +
 
 +
00 --> 2 bytes.01 --> 3 bytes.10 --> 4 bytes.11 --> 5 bytes.
 +
 
 +
The byte this 00 XX value refers to, is a negative value.In our previous example, it's FE = -2. The game will read all the previous bytes this negative value is telling, and will copy them for XX bytes.  
 +
 
 +
In our example: 01 23 FE
 +
 
 +
And the bitfield value of the byte FE is 00 10, XX = 10.Then, 01 23 FE = 01 23 01 23 01 23. Writing FF instead of FE will make this: 01 23 FF = 01 23 23 23 23 23.
 +
 
 +
 
 +
Separate compression: This is when there is 01 under the bitfield. The separate compression uses at least 2 bytes, and sometimes 3 (the last is optional).  
 +
 
 +
In a binary form: NN DC (CC) = NNNN NNNN DDDD DCCC (CCCC CCCC)NN is another negative value. Unlike IC, this one will tell the game where to read/copy the (uncompressed) data from.  
 +
 
 +
Writing FF for NN will read/copy the previous byte only. Writing another value will read/copy the previous bytes this value refers to, until you have CC of them.
 +
 
 +
DD is again a negative value on 5 bits. This is an addon to NN. Take NN and substract 256 (100 in hex) * |DD+1|. Take the result as your new NN value. DD = 11111 = -1 --> NN = NN - (256 * |-1+1|) = NN (Do nothing)DD = 11110 = -2 --> NN = NN - (256 * |-2+1|) = NN - 256DD = 11101 = -3 --> NN = NN - (256 * |-3+1|) = NN - 256 * 2 = NN - 512DD = 11100 = -4 --> NN = NN - (256 * |-4+1|) = NN - 256 * 3 = NN - 768...DD = 00000 = -32 --> NN = NN - (256 * |-32+1|) = NN - 256 * 31 = NN - 7936
 +
 
 +
Example:66 67 FF F8 0D --> NN = -1 and DD = 11111 --> NN = -1, we are starting at 67.66 67 FE F8 0D --> NN = -2 and DD = 11111 --> NN = -2, we are starting at 66....66 67 FF F0 0D --> NN = -1 and DD = 11110 --> NN = -257, we are starting 257 bytes before.66 67 FD E7 --> NN = -3 and DD = 11100 --> NN = -771, we are starting 771 bytes before.
 +
 
 +
CC: Count.If you have no more than 9 bytes to read/copy, then the last CC byte is useless.
 +
 
 +
F0 --> Like F8, be careful of the DD value.
 +
 
 +
F1 --> Copy the read data for 3 bytes, be careful of the DD value.
 +
 
 +
F2 --> Copy the read data for 4 bytes, be careful of the DD value.
 +
 
 +
F3 --> Copy the read data for 5 bytes, be careful of the DD value.
 +
 
 +
F4 --> Copy the read data for 6 bytes, be careful of the DD value.
 +
 
 +
F5 --> Copy the read data for 7 bytes, be careful of the DD value.
 +
 
 +
F6 --> Copy the read data for 8 bytes, be careful of the DD value.
 +
 
 +
F7 --> Copy the read data for 9 bytes, be careful of the DD value.
 +
 
 +
F8 --> Read next.
 +
 
 +
F9 --> Copy the read data for 3 bytes.
 +
 
 +
FA --> Copy the read data for 4 bytes.
 +
 
 +
FB --> Copy the read data for 5 bytes.
 +
 
 +
FC --> Copy the read data for 6 bytes.
 +
 
 +
FD --> Copy the read data for 7 bytes.
 +
 
 +
FE --> Copy the read data for 8 bytes.
 +
 
 +
FF --> Copy the read data for 9 bytes.
 +
 
 +
Else, write F8 = 1111 1000 (CCC = 000) for the first count, and use the last byte to actually write your count (-1).
 +
 
 +
F8 09 --> Copy the read data for 10 bytes.
 +
 
 +
F8 0A --> Copy the read data for 11 bytes....Writing 00 F8 00 (NN = 00 and both counts are 00's) will end the compressed data.In our example:53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...We have 2 SC: FF F8 0D and FF FD.67 FF F8 0D = 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 (67 + 0Ex67)98 FF FD = 98 98 98 98 98 98 98 98 (98 + 07x98)Another: 12 34 56 78 9A BC DE 10 FA F9 = 12 34 56 78 9A BC DE 10 56 78 9A
 +
 
 +
EXAMPLE:We have this compressed data:FF 3F 54 3B C4 44 54 33 33 5B 2D 5C 44 5C C4 C5FC 15 FE C3 44 78 88 98 44 30 FF FF 00 F8 00Bitfields:FF 3F = 1111 1111 0011 1111 --> 1111 1111 1111 1100FC 15 = 1111 1100 0001 0101 --> 0011 1111 1010 1000Green color: Uncompressed byte.Red color: Separate compression.Blue color: Inline compression.Uncompressed data:54 3B C4 44 54 33 33 5B 2D 5C 44 5C C4 C5 C4 C5C3 44 78 88 98 44 30 30 30 30 30 30 30 30 30 30I hope someone will understand something.
 +
 
 
[[Category:Hacking Information]]
 
[[Category:Hacking Information]]

Revision as of 00:55, 15 February 2007

Template:Cleanup1 Kosinski compression is the name given by the Sonic Community to a format used in Sonic games for the Sega Genesis/Megadrive. It is named after the person who cracked it, Brett Kosinski. It appears to be an extension/variation of the LZSS algorithm.

Kosinski compression is used to compress the following data types:

Kosinski Compression Theory

(Document originally written by Sonic Hachelle-Bee and posted here)

The aim of this topic is to explain how the famous Brett Kosinski format works. Reading this post, you will learn how to compress and decompress this compression format all in Hex editing, without utilities.

THEORY:First of all, you have to know that in all the Brett Kosinski format, there is some 16 bits value (2 bytes) whose tell the game which byte is uncompressed data or not.These 16 bits values (called bitfields by Brett Kosinski) begins at start, the first 2 bytes of the compressed data.

As an example, we have this begining of compressed data:


53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...


53 FB is this 16 bits value.

Let's convert it into bits:


53 FB = 0101 0011 1111 1011


You have to read it carefully. Cut it into 2 parts in the middle, paste the left part after the right part, then read it from right to left, like this:


0101 0011 1111 1011 --> 1111 1011 0101 0011 --> 1100 1010 1101 1111


Now, all bits of these 16 represent one byte and his status: uncompressed, compressed (format). There is 3 compression forms you can encounter into Brett Kosinski format. I will use the terms of Brett Kosinski himself to call them:

1 --> Uncompressed byte (UB). 01 --> Separate compression (SC). 00 XX --> Inline compression (IC).

In our example, we have this:


53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...1100 1010 1101 1111 --> 1 (UB) / 1 (UB) / 00 10 (IC) / 1 (UB) / 01 (SC) / 1 (UB) / 01 (SC) / 1 (UB) / 1 (UB) / 1 (UB) / 1 (UB)

Then:01 is an UB. Read it as it. 23 is another UB. FE is IC. I will explain it later. 67 is an UB. FF F8 0D is SC. I will explain it later. 98 is an UB. FF FD is SC. 65 is an UB. 98 is an UB. 75 is an UB. FB B2 is the next bitfield, before the last byte of the previous bitfield. 00 is an UB....After the end of this 16 bits bitfield, the game will check for another coming after. The next is before the last byte (or compression form) referred by the current bitfield. In this example, the next bitfield is FB B2. Convert it into bits again, and continue the code with 15...


Inline compression: This is when you read something like this under the bitfield:

00 XXXX is a value that tells how much bytes to copy after the current uncompressed byte:

00 --> 2 bytes.01 --> 3 bytes.10 --> 4 bytes.11 --> 5 bytes.

The byte this 00 XX value refers to, is a negative value.In our previous example, it's FE = -2. The game will read all the previous bytes this negative value is telling, and will copy them for XX bytes.

In our example: 01 23 FE

And the bitfield value of the byte FE is 00 10, XX = 10.Then, 01 23 FE = 01 23 01 23 01 23. Writing FF instead of FE will make this: 01 23 FF = 01 23 23 23 23 23.


Separate compression: This is when there is 01 under the bitfield. The separate compression uses at least 2 bytes, and sometimes 3 (the last is optional).

In a binary form: NN DC (CC) = NNNN NNNN DDDD DCCC (CCCC CCCC)NN is another negative value. Unlike IC, this one will tell the game where to read/copy the (uncompressed) data from.

Writing FF for NN will read/copy the previous byte only. Writing another value will read/copy the previous bytes this value refers to, until you have CC of them.

DD is again a negative value on 5 bits. This is an addon to NN. Take NN and substract 256 (100 in hex) * |DD+1|. Take the result as your new NN value. DD = 11111 = -1 --> NN = NN - (256 * |-1+1|) = NN (Do nothing)DD = 11110 = -2 --> NN = NN - (256 * |-2+1|) = NN - 256DD = 11101 = -3 --> NN = NN - (256 * |-3+1|) = NN - 256 * 2 = NN - 512DD = 11100 = -4 --> NN = NN - (256 * |-4+1|) = NN - 256 * 3 = NN - 768...DD = 00000 = -32 --> NN = NN - (256 * |-32+1|) = NN - 256 * 31 = NN - 7936

Example:66 67 FF F8 0D --> NN = -1 and DD = 11111 --> NN = -1, we are starting at 67.66 67 FE F8 0D --> NN = -2 and DD = 11111 --> NN = -2, we are starting at 66....66 67 FF F0 0D --> NN = -1 and DD = 11110 --> NN = -257, we are starting 257 bytes before.66 67 FD E7 --> NN = -3 and DD = 11100 --> NN = -771, we are starting 771 bytes before.

CC: Count.If you have no more than 9 bytes to read/copy, then the last CC byte is useless.

F0 --> Like F8, be careful of the DD value.

F1 --> Copy the read data for 3 bytes, be careful of the DD value.

F2 --> Copy the read data for 4 bytes, be careful of the DD value.

F3 --> Copy the read data for 5 bytes, be careful of the DD value.

F4 --> Copy the read data for 6 bytes, be careful of the DD value.

F5 --> Copy the read data for 7 bytes, be careful of the DD value.

F6 --> Copy the read data for 8 bytes, be careful of the DD value.

F7 --> Copy the read data for 9 bytes, be careful of the DD value.

F8 --> Read next.

F9 --> Copy the read data for 3 bytes.

FA --> Copy the read data for 4 bytes.

FB --> Copy the read data for 5 bytes.

FC --> Copy the read data for 6 bytes.

FD --> Copy the read data for 7 bytes.

FE --> Copy the read data for 8 bytes.

FF --> Copy the read data for 9 bytes.

Else, write F8 = 1111 1000 (CCC = 000) for the first count, and use the last byte to actually write your count (-1).

F8 09 --> Copy the read data for 10 bytes.

F8 0A --> Copy the read data for 11 bytes....Writing 00 F8 00 (NN = 00 and both counts are 00's) will end the compressed data.In our example:53 FB 01 23 FE 67 FF F8 0D 98 FF FD 65 98 75 FB B2 00 15...We have 2 SC: FF F8 0D and FF FD.67 FF F8 0D = 67 67 67 67 67 67 67 67 67 67 67 67 67 67 67 (67 + 0Ex67)98 FF FD = 98 98 98 98 98 98 98 98 (98 + 07x98)Another: 12 34 56 78 9A BC DE 10 FA F9 = 12 34 56 78 9A BC DE 10 56 78 9A

EXAMPLE:We have this compressed data:FF 3F 54 3B C4 44 54 33 33 5B 2D 5C 44 5C C4 C5FC 15 FE C3 44 78 88 98 44 30 FF FF 00 F8 00Bitfields:FF 3F = 1111 1111 0011 1111 --> 1111 1111 1111 1100FC 15 = 1111 1100 0001 0101 --> 0011 1111 1010 1000Green color: Uncompressed byte.Red color: Separate compression.Blue color: Inline compression.Uncompressed data:54 3B C4 44 54 33 33 5B 2D 5C 44 5C C4 C5 C4 C5C3 44 78 88 98 44 30 30 30 30 30 30 30 30 30 30I hope someone will understand something.