From Sega Retro

The Musical Instrument Digital Interface is an industry-standard protocol defined in 1982 that enables electronic musical instruments, computers, and other equipment to communicate, control, and synchronize with each other. MIDI allows computers, synthesizers, MIDI controllers, sound cards, samplers and drum machines to control one another, and to exchange system data (acting as a raw data encapsulation method for sysex commands). Note names and MIDI note numbers.

MIDI does not transmit an audio signal or media — it transmits "event messages" such as the pitch and intensity of musical notes to play, control signals for parameters such as volume, vibrato and panning, cues, and clock signals to set the tempo. As an electronic protocol, it is notable for its widespread adoption throughout the industry.

Although most Sonic games rely on other sound formats, such as SMPS, GEMS, or, more recently, ADX, MP3, and other forms of digital audio, there is a number of games that make use of MIDI files for their soundtracks; in particular, some Sonic Mobile games, and the Sonic & Knuckles Collection for the Personal Computer, though it doesn't store "pure" MIDI files amongst its data but uses a proprietary format, which is, however, streamed through a MIDI device while playing. The Sonic & Knuckles Collection also presents a number of original compositions; one of the theories proposed for said replacement is that the replaced songs wouldn't be replicable with MIDI files because of the high number of voice samples included in them. Some fan-made MIDI files for the scrapped songs exist though, partially disproving this theory.

Furthermore, there is a program that allows to use MIDI files in a Sonic game: it is saxman's sonic:Sonic QX. It has the ability to import a MIDI sequence (previously set up to fit with the sound hardware of the Genesis) into a Sonic 2 Beta ROM. From there, further conversions are possible to adapt the SMPS subformat to the one used in the Sonic game of choice.

File Format Specifications

Data Formats

All data values are stored in Big-Endian (most significant byte first) format. Also, many values are stored in a variable-length format which may use one or more bytes per value. Variable-length values use the lower 7 bits of a byte for data and the top bit to signal a following data byte. If the top bit is set to 1, then another value byte follows. Below is a table of examples to help demonstrate how variable length values are used.

Value (Hex) Value (Bin) Variable-Length (Hex) Variable-Length (Bin)
00 00000000 00 00000000
C8 11001000 8148 10000001 01001000
100000 00010000 00000000 00000000 C08000 11000000 10000000 00000000
Example values and their variable-length equivalents

A variable-length value may use a maximum of 4 bytes. This means the maximum value that can be represented is 0x0FFFFFFF (represented as 0xFF, 0xFF, 0xFF, 0x7F).

File Structure

MIDI files are organized into data chunks (similar to RIFF files). Each chunk is prefixed with an 8 byte header: 4 byte ID string used to identify the type of chunk followed by a 4 byte size which defines the chunk's length as number of bytes following this chunk's header.

Header Chunk

The header chunk contains information about the entire song including MIDI format type, number of tracks and timing division. There is only one header chunk per standard MIDI file and it always comes first. Before describing each element of the header chunk, here is a chart to help give an overview of the chunk's organization.

Offset Length Type Description Value
0x00 4 char[4] chunk ID "MThd" (0x4D546864)
0x04 4 dword chunk size 6 (0x00000006)
0x08 2 word format type 0 - 2
0x10 2 word number of tracks 1 - 65,535
0x12 2 word time division see following text
MIDI Header Chunk Format

Chunk ID and Size

The chunk ID is always "MThd" (0x4D546864) and the size is always 6 because the header chunk always contains the same 3 word values.

Format Type

The first word describes the MIDI format type. It can be a value of 0, 1 or 2 and describes what how the following track information is to be interpreted. A type 0 MIDI file has one track that contains all of the MIDI events for the entire song, including the song title, time signature, tempo and music events. A type 1 MIDI file should have two or more tracks. The first, by convention, contains song information such as the title, time signature, tempo, etc. (more detail in Track Chunk section). The second and following tracks contain a title, musical event data, etc. specific to that track. This closely matches the organization of modern multi-track MIDI sequencers. A type 2 MIDI file is sort of a combination of the other two types. It contains multiple tracks, but each track represents a different sequence which may not necessarily be played simultaneously. This is meant to be used to save drum patterns, or other multi-pattern music sequences.

Number of Tracks

The second word simply defines the number of track chunks that follow this header chunk. A type 0 MIDI file may only contain a value of 1, because they can only contain one track. Type 1 and 2 MIDI files may contain up to 65,536 (0xFFFF) tracks.

Time Division

The third and final word in the MIDI header chunk is a bit more complicated than the first two. It contains the time division used to decode the track event delta times into "real" time. This value is represents either ticks per beat or frames per second. If the top bit of the word (bit mask 0x8000) is 0, the following 15 bits describe the time division in ticks per beat. Otherwise the following 15 bits (bit mask 0x7FFF) describe the time division in frames per second. Ticks per beat translate to the number of clock ticks or track delta positions (described in the Track Chunk section) in every quarter note of music. Common values range from 48 to 960, although newer sequencers go far beyond this range to ease working with MIDI and digital audio together. Frames per second is defined by breaking the remaining 15 bytes into two values. The top 7 bits (bit mask 0x7F00) define a value for the number of SMPTE frames and can be 24, 25, 29 (for 29.97 fps) or 30. The remaining byte (bit mask 0x00FF) defines how many clock ticks or track delta positions there are per frame. So a time division example of 0x9978 could be broken down into its three parts: the top bit is one, so it is in SMPTE frames per second format, the following 7 bits have a value of 25 (0x19) and the bottom byte has a value of 120 (0x78). This means the example plays at 24 frames per second SMPTE time and has 120 ticks per frame.

Track Chunk

Track chunks contain all of the information for an individual track including, track name and music events. Here is an overview of a track chunk's organization.

Offset Length Type Description Value
0x00 4 char[4] chunk ID "MTrk" (0x4D54726B)
0x04 4 dword chunk size see following text
0x08 track event data see following text
MIDI Track Chunk Format

Chunk ID and Size

The chunk ID is always "MTrk" (0x4D54726B) and the size varies depending on the number of bytes used for all of the events contained in the track.

Track Event Data

The track event data contains a stream of MIDI events that define information about the sequence and how it is played. The next section describes the different types of events.

MIDI Events

Track events are used to describe all of the musical content of a MIDI file, from tempo changes to sequence and track titles to individual music events. Each event includes a delta time, event type and usually some event type specific data.


The event delta time is defined by a variable-length value. It determines when an event should be played relative to the track's last event. A delta time of 0 means that it should play simultaneously with the last event. A track's first event delta time defines the amount of time to wait before playing this first event. Events unaffected by time are still preceded by a delta time, but should always use a value of 0 and come first in the stream of track events. Examples of this type of event include track titles and copyright information. The most important thing to remember about delta times is that they are relative values, not absolute times. The actual time they represent is determined by a couple factors. The time division (defined in the MIDI header chunk) and the tempo (defined with a track event). If no tempo is define, 120 beats per minute is assumed.

Types of Events

There are three types of events: MIDI Control Events, System Exclusive Events and Meta Events.

MIDI Channel Events

Musical control information such as playing a note or adjusting a MIDI channel's modulation value are defined by MIDI Channel Events. Each MIDI Channel Event consists of a variable-length delta time (like all track events) and a two or three byte description which determines the MIDI channel it corresponds to, the type of event it is and one or two event type specific values. Below is a table illustrating how MIDI Channel Events are formatted.

Delta Time Event Type Value MIDI Channel Parameter 1 Parameter 2
variable-length 4 bits 4 bits 1 byte 1 byte
MIDI Channel Event Format

MIDI Channel Events are the most common type of track event and usually make up the bulk of a MIDI file. The following table gives an overview of the seven MIDI Channel Events, listing their numeric value and parameters.

Event Type Value Parameter 1 Parameter 2
Note Off 0x8 note number velocity
Note On 0x9 note number velocity
Note Aftertouch 0xA note number aftertouch value
Controller 0xB controller number controller value
Program Change 0xC program number not used
Channel Aftertouch 0xD aftertouch value not used
Pitch Bend 0xE pitch value (LSB) pitch value (MSB)
MIDI Channel Events

Although all of the MIDI Channel Events follow the same basic format, each one requires a bit of explanation. Below is a detailed description of each and how it is used.

Note Off Event

The Note Off Event is used to signal when a MIDI key is released. These events have two parameters identical to a Note On event. The note number specifies which of the 128 MIDI keys is being played and the velocity determines how fast/hard the key was released. The note number is normally used to specify which previously pressed key is being released and the velocity is usually ignored, but is sometimes used to adjust the slope of an instrument's release phase.

Note Off MIDI Channel Note Number Velocity
8 (0x8) 0-15 0-127 0-127
Note Off Event Value Ranges
Note On Event

The Note On Event is used to signal when a MIDI key is pressed. This type of event has two parameters. The note number that specifies which of the 128 MIDI keys is being played and the velocity determines how fast/hard the key is pressed. The note number is normally used to specify the instruments musical pitch and the velocity is usually used to specify the instruments playback volume and intensity.

Note On MIDI Channel Note Number Velocity
9 (0x9) 0-15 0-127 0-127
Note On Event Value Ranges
Note Aftertouch Event

The Note Aftertouch Event is used to indicate a pressure change on one of the currently pressed MIDI keys. It has two parameters. The note number of which key's pressure is changing and the aftertouch value which specifies amount of pressure being applied (0 = no pressure, 127 = full pressure). Note Aftertouch is used for extra expression of particular notes, often introducing or increasing some type of modulation during the instrument's sustain phase.

Note Aftertouch MIDI Channel Note Number Amount
10 (0xA) 0-15 0-127 0-127
Note Aftertouch Event Value Ranges
Controller Event

The Controller Event signals the change in a MIDI channels state. There are 128 controllers which define different attributes of the channel including volume, pan, modulation, effects, and more. This event type has two parameters. The controller number specifies which control is changing and the controller value defines its new setting.

Controller MIDI Channel Controller Type Value
11 (0xB) 0-15 0-127 0-127
Controller Event Value Ranges

Below is a list of the defined MIDI controller types.

Value Controller Type
0 (0x00) Bank Select
1 (0x01) Modulation
2 (0x02) Breath Controller
4 (0x04) Foot Controller
5 (0x05) Portamento Time
6 (0x06) Data Entry (MSB)
7 (0x07) Main Volume
8 (0x08) Balance
10 (0x0A) Pan
11 (0x0B) Expression Controller
12 (0x0C) Effect Control 1
13 (0x0D) Effect Control 2
16-19 (0x10-0x13) General-Purpose Controllers 1-4
32-63 (0x20-0x3F) LSB for controllers 0-31
64 (0x40) Damper pedal (sustain)
65 (0x41) Portamento
66 (0x42) Sostenuto
67 (0x43) Soft Pedal
68 (0x44) Legato Footswitch
69 (0x45) Hold 2
70 (0x46) Sound Controller 1 (default: Timber Variation)
71 (0x47) Sound Controller 2 (default: Timber/Harmonic Content)
72 (0x48) Sound Controller 3 (default: Release Time)
73 (0x49) Sound Controller 4 (default: Attack Time)
74-79 (0x4A-0x4F) Sound Controller 6-10
80-83 (0x50-0x53) General-Purpose Controllers 5-8
84 (0x54) Portamento Control
91 (0x5B) Effects 1 Depth (formerly External Effects Depth)
92 (0x5C) Effects 2 Depth (formerly Tremolo Depth)
93 (0x5D) Effects 3 Depth (formerly Chorus Depth)
94 (0x5E) Effects 4 Depth (formerly Celeste Detune)
95 (0x5F) Effects 5 Depth (formerly Phaser Depth)
96 (0x60) Data Increment
97 (0x61) Data Decrement
98 (0x62) Non-Registered Parameter Number (LSB)
99 (0x63) Non-Registered Parameter Number (MSB)
100 (0x64) Registered Parameter Number (LSB)
101 (0x65) Registered Parameter Number (MSB)
121-127 (0x79-0x7F) Mode Messages
Defined MIDI Controllers
Program Change Event

The Program Change Event is used to change which program (instrument/patch) should be played on the MIDI channel. This type of event takes only one parameter, the program number of the new instrument/patch.

Program Change MIDI Channel Program Number
12 (0xC) 0-15 0-127
Program Change Event Value Ranges
Channel Aftertouch Event

The Channel Aftertouch Event is similar to the Note Aftertouch message, except it effects all keys currently pressed on the specific MIDI channel. This type of event takes only one parameter, the aftertouch amount (0 = no pressure, 127 = full pressure).

Channel Aftertouch MIDI Channel Amount
13 (0xD) 0-15 0-127
Channel Aftertouch Event Value Ranges
Pitch Bend Event

The Pitch Bend Event is similar to a controller event, except that it is a unique MIDI Channel Event that has two bytes to describe it's value. The pitch value is defined by both parameters of the MIDI Channel Event by joining them in the format of yyyyyyyxxxxxxx where the y characters represent the last 7 bits of the second parameter and the x characters represent the last 7 bits of the first parameter. The combining of both parameters enables high accuracy values (0 - 16383). The pitch value affects all playing notes on the current channel. Values below 8192 decrease the pitch, while values above 8192 increase the pitch. The pitch range may vary from instrument to instrument, but is usually +/-2 semi-tones.

Pitch Bend MIDI Channel Value (LSB) Value (MSB)
14 (0xE) 0-15 0-127 0-127
Pitch Bend Event Value Ranges

Meta Events

Events that are not to be sent or received over a MIDI port are called Meta Events. These events are defined by an event type value of 0xFF and have a variable size of parameter data which is defined after the event type.

Meta Event Type Length Data
255 (0xFF) 0-255 variable-length type specific
Meta Event Values

There are currently fifteen defined Meta Events. Each one is described in detail below.

Sequence Number

This meta event defines the pattern number of a Type 2 MIDI file or the number of a sequence in a Type 0 or Type 1 MIDI file. This meta event should always have a delta time of 0 and come before all MIDI Channel Events and non-zero delta time events.

Meta Event Type Length Number (MSB) Number (LSB)
255 (0xFF) 0 (0x00) 2 0-255 0-255
Sequence Number Meta Event Values
Text Event

This meta event defines some text which can be used for any reason including track notes, comments, etc. The text string is usually ASCII text, but may be any character (0x00-0xFF).

Meta Event Type Length Text
255 (0xFF) 1 (0x01) string length ASCII text
Text Meta Event Values
Copyright Notice

This meta event defines copyright information including the copyright symbol © (0xA9), the year and the author. This meta event should always be in the first track chunk, have a delta time of 0 and come before all MIDI Channel Events and non-zero delta time events.

Meta Event Type Length Text
255 (0xFF) 2 (0x02) string length ASCII text
Copyright Notice Meta Event Values
Sequence/Track Name

This meta event defines the name of a sequence when in a Type 0 or Type 2 MIDI file or in the first track of a Type 1 MIDI file. It defines a track name when it appears in any track after the first in a Type 1 MIDI file. This meta event should always have a delta time of 0 and come before all MIDI Channel Events and non-zero delta time events.

Meta Event Type Length Text
255 (0xFF) 3 (0x03) string length ASCII text
Sequence/Track Name Meta Event Values
Instrument Name

This meta event defines the name of an instrument being used in the current track chunk. This event can be used with the MIDI Channel Prefix meta event to define which instrument is being used on a specific channel.

Meta Event Type Length Text
255 (0xFF) 4 (0x04) string length ASCII text
Instrument Name Meta Event Values

This meta event defines the lyrics in a song and are usually used to define a syllable or group of works per quarter note. This event can be used as an equivalent of sheet music lyrics or for implementing a karaoke-style system.

Meta Event Type Length Text
255 (0xFF) 5 (0x05) string length ASCII text
Lyrics Meta Event Values

This meta event marks a significant point in time for the sequence. It is usually found in the first track chunk, but may appear in any one. This event can be useful for marking the beginning/end of a new verse or chorus.

Meta Event Type Length Text
255 (0xFF) 6 (0x06) string length ASCII text
Marker Meta Event Values
Cue Point

This meta event marks the start of some type of new sound or action. It is usually found in the first track chunk, but may appear in any one. This event is sometimes used by sequencers to mark when playback of a sample or video should begin.

Meta Event Type Length Text
255 (0xFF) 7 (0x07) string length ASCII text
Cue Point Meta Event Values
MIDI Channel Prefix

This meta event associates a MIDI channel with following meta events. Its effect is terminated by another MIDI Channel Prefix event or any non- Meta event. It is often used before an Instrument Name Event to specify which channel an instrument name represents.

Meta Event Type Length Channel
255 (0xFF) 32 (0x20) 1 0-15
MIDI Channel Prefix Meta Event Values
End Of Track

This meta event is used to signal the end of a track chunk and must always appear as the last event in every track chunk.

Meta Event Type Length
255 (0xFF) 47 (0x2F) 0
End Of Track Meta Event Values
Set Tempo
This meta event sets the sequence tempo in terms of microseconds per quarter-note which is encoded in three bytes. It usually is found in the first track chunk, time-aligned to occur at the same time as a MIDI clock message to promote more accurate synchronization. If no set tempo event is present, 120 beats per minute is assumed. The following formula's can be used to translate the tempo from microseconds per quarter-note to beats per minute and back.

Meta Event Type Length Microseconds/Quarter-Note
255 (0xFF) 81 (0x51) 3 0-8355711
Set Tempo Meta Event Values
SMPTE Offset

This meta event is used to specify the SMPTE starting point offset from the beginning of the track. It is defined in terms of hours, minutes, seconds, frames and sub-frames (always 100 sub-frames per frame, no matter what sub-division is specified in the MIDI header chunk). The byte used to specify the hour offset also specifies the frame rate in the following format: 0rrhhhhhh where rr is two bits for the frame rate where 00=24 fps, 01=25 fps, 10=30 fps (drop frame), 11=30 fps and hhhhhh is six bits for the hour (0-23). The hour byte's top bit is always 0. The frame byte's possible range depends on the encoded frame rate in the hour byte. A 25 fps frame rate means that a maximum value of 24 may be set for the frame byte.

Meta Event Type Length Hour Min Sec Fr SubFr
255 (0xFF) 84 (0x54) 5 0-23 * 0-59 0-59 0-30 * 0-99
SMPTE Offset Meta Event Values, * read preceding text for details
Time Signature

This meta event is used to set a sequences time signature. The time signature defined with 4 bytes, a numerator, a denominator, a metronome pulse and number of 32nd notes per MIDI quarter-note. The numerator is specified as a literal value, but the denominator is specified as (get ready) the value to which the power of 2 must be raised to equal the number of subdivisions per whole note. For example, a value of 0 means a whole note because 2 to the power of 0 is 1 (whole note), a value of 1 means a half-note because 2 to the power of 1 is 2 (half-note), and so on. The metronome pulse specifies how often the metronome should click in terms of the number of clock signals per click, which come at a rate of 24 per quarter-note. For example, a value of 24 would mean to click once every quarter-note (beat) and a value of 48 would mean to click once every half-note (2 beats). And finally, the fourth byte specifies the number of 32nd notes per 24 MIDI clock signals. This value is usually 8 because there are usually 8 32nd notes in a quarter-note. At least one Time Signature Event should appear in the first track chunk (or all track chunks in a Type 2 file) before any non-zero delta time events. If one is not specified 4/4, 24, 8 should be assumed.

Meta Event Type Length Numer Denom Metro 32nds
255 (0xFF) 88 (0x58) 4 0-255 0-255 0-255 1-255
Time Signature Meta Event Values
Key Signature

This meta event is used to specify the key (number of sharps or flats) and scale (major or minor) of a sequence. A positive value for the key specifies the number of sharps and a negative value specifies the number of flats. A value of 0 for the scale specifies a major key and a value of 1 specifies a minor key.

Meta Event Type Length Key Scale
255 (0xFF) 89 (0x59) 2 -7-7 0-1
Key Signature Meta Event Values
Sequencer Specific

This meta event is used to specify information specific to a hardware or software sequencer. The first Data byte (or three bytes if the first byte is 0) specifies the manufacturer's ID and the following bytes contain information specified by the manufacturer. The individual manufacturers may document this information in their respective manuals.

Meta Event Type Length Data
255 (0xFF) 127 (0x7F) variable-length any type and amount *
Sequencer Specific Meta Event Values, * read preceding text for details

System Exclusive Events

Also known as SysEx Events, these MIDI events are used to control MIDI hardware or software that require special data bytes that will follow their manufacturer's specifications. Every SysEx event includes an ID that specifies which manufacturer's product is to be the intended receiver. All other products will ignore the event. There are three types of SysEx messages which are used to send data in a single event, across multiple events or authorize the transmission of specific MIDI messages.

Normal SysEx Events

These are the most common type of SysEx event and are used to hold a single block of manufacturer specific data. The first byte is always 0xF0 and the second is a variable-length value that specifies the length of the following SysEx data in bytes. The SysEx data bytes must always end with a 0xF7 byte to signal the end of the message.

SysEx Event Length Data
240 (0xF0) variable-length data bytes, 0xF7
Normal SysEx Event Values
Divided SysEx Events

A large amount of SysEx data in a Normal SysEx Event could cause following MIDI Channel Events to be transmitted after the time they should be played. This will cause an unwanted delay in play back of the following events. The second type of SysEx Events solve this problem by allowing a large amount of SysEx data to be divided into smaller blocks, transmitted with a delay between each division to allow the transmission of other MIDI events in order to prevent congesting of the limited MIDI bandwidth. The initial Divided SysEx Event follows the same format as a Normal SysEx Event with the exception that the last data byte is not 0xF7. This indicates the SysEx data is not finished and will be continued in an upcoming Divided SysEx Event. Any following Divided SysEx Events before the final one use the a similar format as the first, only the start byte is 0xF0 instead of 0xF7 to signal continuation of SysEx data. The final block follows the same format as the continuation blocks, except the last data byte is 0xF7 to signal the completion of the divided SysEx data.

SysEx Event Length Data
240 (0xF0) variable-length data bytes
247 (0xF7) variable-length data bytes
247 (0xF7) variable-length data bytes, 0xF7
Divided SysEx Event Values
Authorization SysEx Events

The last type of SysEx Event authorizes and enables the transmission of special messages such as Song Position Pointer, MIDI Time Code and Song Select messages. These SysEx Events use the event type value 0xF7.

SysEx Event Length Data
247 (0xF7) variable-length data bytes
Authorization SysEx Event Values