SVP documentation (2014-09-23)
From Sega Retro
This is a copy of an "unofficial" document containing original research, for use as a source on Sega Retro. This page likely exists for historical purposes - the contents should ideally be copy-edited and wikified to make better use of Sega Retro's software. Original source: http://notaz.gp2x.de/docs/svpdoc.txt |
------------------------------------------------------------------------------- notaz's SVP doc $Id: svpdoc.txt 964 2014-09-23 00:27:41Z notaz $ Copyright 2008, Grazvydas Ignotas (notaz) ------------------------------------------------------------------------------- If you use this, please credit me in your work or it's documentation. Tasco Deluxe should also be credited for his pioneering work on the subject. Thanks. Use monospace font and disable word wrap when reading this document. updates: 2014-09-23: minor additions about memory map 2008-03-05: added a few notes about arithmetic op operands. 2008-02-06: added Tasco Deluxe's correction about PMC register reads. ------------------------------------------------------------------------------- Table of Contents ------------------------------------------------------------------------------- 0. Introduction 1. Overview 2. The SSP160x DSP 2.1. General registers 2.2. External registers 2.3. Pointer registers 2.4. The instruction set 3. Memory map 4. Other notes ------------------------------------------------------------------------------- 0. Introduction ------------------------------------------------------------------------------- This document is an attempt to provide technical information needed to emulate Sega's SVP chip. It is based on reverse engineering Virtua Racing game and on various Internet sources. Only some of the information provided here has been verified on the real hardware, so some things are likely to be inaccurate. The following information sources were used while writing this document and emulator implementation: [1] SVP Reference Guide (annotated) and SVP Register Guide (annotated) by Tasco Deluxe < tasco.deluxe @ gmail.com > http://www.sharemation.com/TascoDLX/SVP%20Reference%20Guide%202007.02.11.txt http://www.sharemation.com/TascoDLX/SVP%20Register%20Guide%202007.02.11.txt [2] SSP1610 disassembler written by Pierpaolo Prazzoli, MAME source code. http://mamedev.org/ [3] SSP1601 DSP datasheet http://notaz.gp2x.de/docs/SSP1601.pdf [4] DSP page (with code samples) in Samsung Semiconductor website from 1997 retrieved from Internet Archive: The Wayback Machine http://web.archive.org/web/19970607052826/www.sec.samsung.com/Products/dsp/dspcore.htm [5] Sega's SVP Chip: The Road not Taken? Ken Horowitz, Sega-16 http://sega-16.com/feature_page.php?id=37&title=Sega's%20SVP%20Chip:%20The%20Road%20not%20Taken? ------------------------------------------------------------------------------- 1. Overview ------------------------------------------------------------------------------- The only game released with SVP chip was Virtua Racing. There are at least 4 versions of the game: USA, Jap and 2 different Eur revisions. Three of them share identical SSP160x code, one of the Eur revisions has some differences. From the software developer's point of view, the game cartridge contains at least: * Samsung SSP160x 16-bit DSP core, which includes [3]: * Two independent high-speed RAM banks, accessed in single clock cycle, 256 words each. * 16 x 16 bit multiply unit. * 32-bit ALU, status register. * Hardware stack of 6 levels. * 128KB of DRAM. * 2KB of IRAM (instruction RAM). * Memory controller with address mapping capability. * 2MB of game ROM. [5] claims there is also "2 Channels PWM" in the cartridge, but it's either not used or not there at all. European cartridge doesn't have audio pins connected. Various sources claim that SSP160x is SSP1601 which is likely to be true, because the code doesn't seem to use any SSP1605+ features. ------------------------------------------------------------------------------- 2. The SSP160x DSP ------------------------------------------------------------------------------- SSP160x is 16-bit DSP, capable of performing multiplication + addition in single clock cycle [3]. It has 8 general, 8 external and 8 pointer registers. There is a status register which has operation control bits and condition flags. Condition flags are set/cleared during ALU (arithmetic, logic) operations. It also has 6-level hardware stack and 2 internal RAM banks RAM0 and RAM1, 256 words each. The device is only capable of addressing 16-bit words, so all addresses refer to words (16bit value in ROM, accessed by 68k through address 0x84 would be accessed by SSP160x using address 0x42). [3] mentions interrupt pins, but interrupts don't seem to be used by SVP code (actually there are functions which look like interrupt handler routines, but they don't seem to do anything important). 2.1. General registers ---------------------- There are 8 general registers: -, X, Y, A, ST, STACK, PC and P ([2] [4]). Size is given in bits. 2.1.1. "-" Constant register with all bits set (0xffff). Also used for programming external registers (blind reads/writes, see 2.2). size: 16 2.1.2. "X" Generic register. Also acts as a multiplier 1 for P register. size: 16 2.1.3. "Y" Generic register. Also acts as a multiplier 2 for P register. size: 16 2.1.4. "A" Accumulator. Stores the result of all ALU (but not multiply) operations, status register is updated according to this. When directly accessed, only upper word is read/written. Low word can be accessed by using AL (see 2.2.8). size: 32 2.1.5. "ST" STatus register. Bits 0-9 are CONTROL, other are FLAG [2]. Only some of them are actually used by SVP. Bits: fedc ba98 7654 3210 210 - RPL "Loop size". If non-zero, makes (rX+) and (rX-) respectively modulo-increment and modulo-decrement (see 2.3). The value shows which power of 2 to use, i.e. 4 means modulo by 16. 43 - RB Unknown. Not used by SVP code. 5 - ST5 Affects behavior of external registers. See 2.2. 6 - ST6 Affects behavior of external registers. See 2.2. According to [3] (5,6) bits correspond to hardware pins. 7 - IE Interrupt enable? Not used by SVP code. 8 - OP Saturated value? Not used by SVP code. 9 - MACS MAC shift? Not used by SVP code. a - GPI_0 Interrupt 0 enable/status? Not used by SVP code. b - GPI_1 Interrupt 1 enable/status? Not used by SVP code. c - L L flag. Similar to carry? Not used by SVP code. d - Z Zero flag. Set after ALU operations, when all 32 accumulator bits become zero. e - OV Overflow flag. Not used by SVP code. f - N Negative flag. Set after ALU operations, when bit31 in accumulator is 1. size: 16 2.1.6. "STACK" Hardware stack of 6 levels [3]. Values are "pushed" by directly writing to it, or by "call" instruction. "Pop" is performed by directly reading the register or by "ret" instruction. size: 16 2.1.7. "PC" Program Counter. Can be written directly to perform a jump. It is not clear if it is possible to read it (SVP code never does). size: 16 2.1.8. "P" multiply Product - multiplication result register. Always contains 32-bit multiplication result of X, Y and 2 (P = X * Y * 2). X and Y are sign-extended before performing the multiplication. size: 32 2.2. External registers ----------------------- The external registers, as the name says, are external to SSP160x, they are hooked to memory controller in SVP, so by accessing them we actually program the memory controller. They act as programmable memory access registers or external status registers [1]. Some of them can act as both, depending on how ST5 ans ST6 bits are set in status register. After a register is programmed, accessing it causes reads/writes from/to external memory (see section 3 for the memory map). The access may also cause some additional effects, like incremental of address, associated with accessed register. In this document and my emu, instead of using names EXT0-EXT7 from [4] I used different names for these registers. Those names are from Tasco Deluxe's [1] doc. All these registers can be blind-accessed (as said in [1]) by performing (ld -, PMx) or (ld PMx, -). This programs them to access memory (except PMC, where the effect is different). All registers are 16-bit. 2.2.1. "PM0" If ST5 or ST6 is set, acts as Programmable Memory access register (see 2.2.7). Else it acts as status of XST (2.2.4). It is also mapped to a15004 on 68k side: ???????? ??????10 0: set, when SSP160x has written something to XST (cleared when a15004 is read by 68k) 1: set, when 68k has written something to a15000 or a15002 (cleared on PM0 read by SSP160x) Note that this is likely to be incorrect, but such behavior is OK for emulation to work. 2.2.2. "PM1" Programmable Memory access register. Only accessed with ST bits set by SVP code. 2.2.3. "PM2" Same as PM1. 2.2.4. "XST" If ST5 or ST6 is set, acts as Programmable Memory access register (only used by memory test code). Else it acts as eXternal STatus register, which is also mapped to a15000 and a15002 on 68k side. Affects PM0 when written to. 2.2.5. "PM4" Programmable Memory access register. Not affected by ST5 and ST6 bits, always stays in PMAR mode. 2.2.6. "EXT5" Not used by SVP, so not covered by this document. 2.2.7. "PMC" Programmable Memory access Control. It is set using 2 16bit writes, first address, then mode word. After setting PMAC, PMx should be blind accessed using (ld -, PMx) or (ld PMx, -) to program it for reading or writing external memory respectively. Every PMx register can be programmed to access it's own memory location with it's own mode. Registers are programmed separately for reading and writing. Reading PMC register also shifts it's state (from "waiting for address" to "waiting for mode" and back). In state "waiting for address" reads return address word related to last PMx register accessed. If read in "waiting for mode" state, we get the same value as in other state, but rotated by 4 (or with nibbles swapped, VR always does this to words with both bytes equal, like 'abab' to get 'baba' for chessboard dithering effect). The address word contains bits 0-15 of the memory word-address. The mode word format is as follows: dsnnnv?? ???aaaaa a: bits 16-20 of memory word-address. n: auto-increment value. If set, after every access of PMx, word-address value related to it will be incremented by (words): 1 - 1 5 - 16 2 - 2 6 - 32 3 - 4 7 - 128 4 - 8 d: make auto-increment negative - decrement by count listed above. s: special-increment mode. If current address is even (when accessing programmed PMx), increment it by 1. Else, increment by 32. It is not clear what happens if d and n bits are also set (never done by SVP). v: over-write mode when writing, unknown when reading (not used). Over-write mode splits the word being written into 4 nibbles and only writes out ones which are non zero. When auto-increment is performed, it affects all 21 address bits. 2.2.8. "AL" This register acts more like a general register. If this register is blind-accessed, it is "dummy programmed", i.e. nothing happens and PMC is reset to "waiting for address" state. In all other cases, it is Accumulator Low - 16 least significant bits of accumulator. Normally reading acc (ld X, A) you get 16 most significant bits, so this allows you access the low word of 32bit accumulator. 2.3. Pointer registers ---------------------- There are 8 8-bit pointer registers rX, which are internal to SSP160x and are used to access internal RAM banks RAM0 and RAM1, or program memory indirectly. r0-r3 (ri) point to RAM0, r4-r7 (rj) point to RAM1. Each bank has 256 words of RAM, so 8bit registers can fully address them. The registers can be accessed directly, or 2 indirection levels can be used [ (rX), ((rX)) ]. They work similar to * and ** operators in C, only they use different types of memory and ((rX)) also performs post-increment. First indirection level (rX) accesses a word in RAMx, second accesses program memory at address read from (rX), and increments value in (rX). Only r0,r1,r2,r4,r5,r6 can be directly modified (ldi r0, 5), or by using modifiers. 3 modifiers can be applied when using first indirection level (optional): + : post-increment (ld a, (r0+) ). Increment register value after operation. Can be made modulo-increment by setting RPL bits in status register (see 2.1.5). - : post-decrement. Also can be made modulo-decrement by using RPL bits in ST. +!: post-increment, unaffected by RPL (probably). These are only used on 1st indirection level, so things like ( ld a, ((r0+)) ) and (ld X, r6-) are probably invalid. r3 and r7 are special and can not be changed (at least Samsung samples [4] and SVP code never do). They are fixed to the start of their RAM banks. (They are probably changeable for ssp1605+, Samsung's old DSP page claims that). 1 of these 4 modifiers must be used on these registers (short form direct addressing? [2]): |00: RAMx[0] The very first word in the RAM bank. |01: RAMx[1] Second word |10: RAMx[2] ... |11: RAMx[3] 2.4. The instruction set ------------------------ The Samsung SSP16 series assembler uses right-to-left notation ([2] [4]): ld X, Y means value from Y should be copied to X. Size of every instruction is word, some have extension words for immediate values. When writing an interpreter, 7 most significant bits are usually enough to determine which opcode it is. encoding bits are marked as: rrrr - general or external register, in order specified in 2.1 and 2.2 (0 is '-', 1 'X', ..., 8 is 'PM0', ..., 0xf is 'AL') dddd - same as above, as destination operand ssss - same as above, as source operand jpp - pointer register index, 0-7 j - specifies RAM bank, i.e. RAM0 or RAM1 i* - immediate value bits a* - offset in internal RAM bank mm - modifier for pointer register, depending on register: r0-r2,r4-r6 r3,r7 examples 0: (none) |00 ld a, (r0) cmp a, (r7|00) 1: +! |01 ld (r0+!), a ld (r7|01), a 2: - |10 add a, (r0-) 3: + |11 cccc - encodes condition, only 3 used by SVP, see check_cond() below ooo - operation to perform Operation is written in C-style pseudo-code, where: program_memory[X] - access program memory at address X RAMj[X] - access internal RAM bank j=0,1 (RAM0 or RAM1), word offset X RIJ[X] - pointer register rX, X=0-7 pr_modif_read(m,X) - read pointer register rX, applying modifier m: if register is r3 or r7, return value m else switch on value m: 0: return rX; 1: tmp = rX; rX++; return tmp; // rX+! 2: tmp = rX; modulo_decrement(rX); return tmp; // rX- 3: tmp = rX; modulo_increment(rX); return tmp; // rX+ the modulo value used (if used at all) depends on ST RPL bits (see 2.1.5) check_cond(c,f) - checks if a flag matches f bit: switch (c) { case 0: return true; case 5: return (Z == f) ? true : false; // check Z flag case 7: return (N == f) ? true : false; // check N flag } // other conditions are possible, but they are not used update_flags() - update ST flags according to last ALU operation. sign_extend(X) - sign extend 16bit value X to 32bits. next_op_address() - address of instruction after current instruction. 2.4.1. ALU instructions All of these instructions update flags, which are set according to full 32bit accumulator. The SVP code only checks N and Z flags, so it is not known when exactly OV and L flags are set. Operations are performed on full A, so (andi A, 0) would clear all 32 bits of A. They share the same addressing modes. The exact arithmetic operation is determined by 3 most significant (ooo) bits: 001 - sub - subtract (OP -=) 011 - cmp - compare (OP -, flags are updated according to result) 100 - add - add (OP +=) 101 - and - binary AND (OP &=) 110 - or - binary OR (OP |=) 111 - eor - exclusive OR (OP ^=) syntax encoding operation OP A, s ooo0 0000 0000 rrrr A OP r << 16; OP A, (ri) ooo0 001j 0000 mmpp A OP RAMj[pr_modif_read(m,jpp)] << 16; OP A, adr ooo0 011j aaaa aaaa A OP RAMj[a] << 16; OPi A, imm ooo0 1000 0000 0000 A OP i << 16; iiii iiii iiii iiii op A, ((ri)) ooo0 101j 0000 mmpp tmp = pr_modif_read(m,jpp); A OP program_memory[RAMj[tmp]] << 16; RAMj[tmp]++; op A, ri ooo1 001j 0000 00pp A OP RIJ[jpp] << 16; OPi simm ooo1 1000 iiii iiii A OP i << 16; Note that in (OP A, s) case, if s is 32bit register, operation is performed on all 32 bits, including when s is accumulator itself, like for (and A, A), which is a valid operation. There is also "perform operation on accumulator" instruction: syntax encoding operation mod cond, op 1001 000f cccc 0ooo if (check_cond(c,f)) switch(o) { case 2: A >>= 1; break; // arithmetic shift case 3: A <<= 1; break; case 6: A = -A; break; // negate A case 7: A = abs(A); break; // absolute val. } // other operations are possible, but // they are not used by SVP. 2.4.2. Load (move) instructions These instructions never affect flags (even ld A). If destination is A, and source is 16bit, only upper word is transfered (same thing happens on opposite). If dest. is A, and source is P, whole 32bit value is transfered. It is not clear if P can be destination operand (probably not, no code ever does this). Writing to STACK pushes a value there, reading pops. It is not known what happens on overflow/underflow (never happens in SVP code). ld -, - is used as a nop. syntax encoding operation ld d, s 0000 0000 dddd ssss d = s; ld d, (ri) 0000 001j dddd mmpp d = RAMj[pr_modif_read(m,jpp)]; ld (ri), s 0000 010j ssss mmpp RAMj[pr_modif_read(m,jpp)] = s; ldi d, imm 0000 1000 dddd 0000 d = i; iiii iiii iiii iiii ld d, ((ri)) 0000 101j dddd mmpp tmp = pr_modif_read(m,jpp); d = program_memory[RAMj[tmp]]; RAMj[tmp]++; ldi (ri), imm 0000 110l 0000 mmpp RAMj[pr_modif_read(m,jpp)] = i; iiii iiii iiii iiii ld adr, a 0000 111j aaaa aaaa RAMj[a] = A; ld d, ri 0001 001j dddd 00pp d = RIJ[jpp]; ld ri, s 0001 010j ssss 00pp RIJ[jpp] = s; ldi ri, simm 0001 1jpp iiii iiii RIJ[jpp] = i; ld d, (a) 0100 1010 dddd 0000 d = program_memory[A[31:16]]; // read a word from program memory. Offset // is the upper word in A. 2.4.3. Program control instructions Only 3 instructions: call, ret (alias of ld PC, STACK) and branch. Indirect jumps can be performed by simply writing to PC. syntax encoding operation call cond, addr 0100 100f cccc 0000 if (check_cond(c,f)) { aaaa aaaa aaaa aaaa STACK = next_op_address(); PC = a; } bra cond, addr 0100 110f cccc 0000 if (check_cond(c,f)) PC = a; aaaa aaaa aaaa aaaa ret 0000 0000 0110 0101 PC = STACK; // same as ld PC, STACK 2.4.4. Multiply-accumulate instructions Not sure if (ri) and (rj) really get loaded into X and Y, but multiplication result surely is loaded into P. There is probably optional 3rd operand (1, 0; encoded by bit16, default 1), but it's not used by SVP code. syntax encoding operation mld (rj), (ri) 1011 0111 nnjj mmii A = 0; update_flags(); X = RAM0[pr_modif_read(m,0ii)]; Y = RAM1[pr_modif_read(m,1jj)]; P = sign_extend(X) * sign_extend(Y) * 2 mpya (rj), (ri) 1001 0111 nnjj mmii A += P; update_flags(); X = RAM0[pr_modif_read(m,0ii)]; Y = RAM1[pr_modif_read(m,1jj)]; P = sign_extend(X) * sign_extend(Y) * 2 mpys (rj), (ri) 0011 0111 nnjj mmii A -= P; update_flags(); X = RAM0[pr_modif_read(m,0ii)]; Y = RAM1[pr_modif_read(m,1jj)]; P = sign_extend(X) * sign_extend(Y) * 2 ------------------------------------------------------------------------------- 3. Memory map ------------------------------------------------------------------------------- The SSP160x can access it's own program memory, and external memory through EXT registers (see 2.2). Program memory is read-execute-only, the size of this space is 64K words (this is how much 16bit PC can address): byte address word address name 0- 7ff 0- 3ff IRAM 800-1ffff 400-ffff ROM There were reports that SVP has internal ROM, but fortunately they were wrong. The location 800-1ffff is mapped from the same location in the 2MB game ROM. The IRAM is read-only (as SSP160x doesn't have any means of writing to it's program memory), but it can be changed through external memory space, as it's also mapped there. The external memory space seems to match the one visible by 68k, with some differences: 68k space SVP space word address name 0-1fffff 0-1fffff 0- fffff game ROM 200000-2fffff ? ? unused (1) 300000-31ffff 300000-31ffff 180000-18ffff DRAM 320000-37ffff ? ? 3 mirrors od DRAM 380000-38ffff ? ? unused (1) ? 390000-3907ff 1c8000-1c83ff IRAM 390000-39ffff ? ? "cell arrange" 1 3a0000-3affff ? ? "cell arrange" 2 3b0000-3fffff ? ? unused (2) a15000-a1500f n/a n/a Status/control registers unused (1) - reads seem to return data from internal bus (last word read by SSP160x). Writes probably have no effect. unused (2) - reads return 0xffff, writes have no effect. The external memory can be read/written by SSP160x (except game ROM, which can only be read). "cell arrange" 1 and 2 are similar to the one used in SegaCD, they map 300000-30ffff location to 390000-39ffff and 3a0000-3affff, where linear image written to 300000 can be read as VDP patterns at 390000. Virtua Racing doesn't seem to use this feature, it is only used by memory test code. Here is the list of status/control registers (16bit size): addr rst v description a15000 ffff w/r command/result register. Visible as XST for SSP160x see (2.2.4). a15002 ffff mirror of the above. a15004 0 status of command/result register (see 2.2.1). a15006 ffff possibly halts the SVP. Before doing DMA from DRAM, 68k code writes 0xa, and after it's finished, writes 0. This is probably done to prevent SVP accessing DRAM and avoid bus clashes. a15008 ffff possibly causes an interrupt. There is (unused?) code which writes 0, 1, and again 0 in sequence. a1500a ffff ? a1500c ffff ? a1500e ffff ? ------------------------------------------------------------------------------- 4. Other notes ------------------------------------------------------------------------------- The game has arcade-style memory self-check mode, which can be accessed by pressing _all_ buttons (including directions) on 3-button controller. There was probably some loopback plug for this. SVP seems to have DMA latency issue similar to one in Sega CD, as the code always sets DMA source address value larger by 2, then intended for copy. This is even true for DMAs from ROM, as it's probably hooked through SVP's memory controller. The entry point for the code seems to be at address 0x800 (word 0x400) in ROM, but it is not clear where the address is fetched from when the system powers up. The memory test code also sets up "ld PC, .." opcodes at 0x7f4, 0x7f8 and 0x7fc, which jump to some routines, possibly interrupt handlers. This means that mentioned addresses might be built-in interrupt vectors. The SVP code doesn't seem to be timing sensitive, so it can be emulated without knowing timing of the instructions or even how fast the chip is clocked. Overclocking doesn't have any effect, underclocking causes slowdowns. Running 10-12M instructions/sec (or possibly less) is sufficient.