Difference between revisions of "Sega Dreamcast/Hardware comparison"
From Sega Retro
(Added rewrite and clean up tags.) |
|||
(25 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | {{otherPage|desc=technical details on the Sega Dreamcast|page=Sega Dreamcast/Technical specifications}} | ||
+ | {{rewrite}} | ||
{{cleanup}} | {{cleanup}} | ||
+ | This article presents a hardware comparison between the [[Sega Dreamcast]] and other rival systems in its time. It compares the technical specifications and hardware advantages/disadvantages between the systems. | ||
==Vs. Arcade== | ==Vs. Arcade== | ||
Line 13: | Line 16: | ||
===Pentium, GeForce, Voodoo=== | ===Pentium, GeForce, Voodoo=== | ||
− | In most ways, the Dreamcast was generally the most powerful home system during 1998–1999, outperforming high-end PC hardware in most ways during that era.{{ | + | In most ways, the Dreamcast was generally the most powerful home system during 1998–1999, outperforming high-end PC hardware in most ways during that era.{{magref|gr|3|29}} The Dreamcast's [[Hitachi]] [[SuperH|SH-4]] CPU calculates 3D graphics several times faster than a [[wikipedia:Pentium II|Pentium II]] from 1998,{{magref|gr|3|29}} and faster than a [[wikipedia:Pentium III|Pentium III]] and [[NVIDIA]] [[wikipedia:GeForce256|GeForce 256]] from 1999. The Dreamcast's PowerVR CLX2 GPU, due to its tiled rendering architecture, also has has a higher [[fillrate]] and faster polygon rendering throughput than a [[wikipedia:Voodoo3|Voodoo3]] and GeForce 256 from 1999. On the other hand, the GeForce 256 has a higher fillrate for translucent polygons, whereas the Dreamcast's CLX2 has a higher fillrate for opaque polygons and an overall higher average fillrate (for scenes with both opaque and translucent polygons). |
− | The Dreamcast's CPU–GPU transmission bus is faster than the Voodoo3 and has a higher effective bandwidth than the GeForce 256 due to the Dreamcast's efficient bandwidth usage, including its lack of CPU overhead from the [[wikipedia:Operating system|operating system]] | + | The Dreamcast's CPU–GPU transmission bus is faster than the Voodoo3 and has a higher effective bandwidth than the GeForce 256 due to the Dreamcast's efficient bandwidth usage, including its lack of CPU overhead from the [[wikipedia:Operating system|operating system]], [[wikipedia:Texture mapping|textures]] loaded directly to VRAM (freeing up CPU–GPU transmission bus for polygons), higher [[wikipedia:Texture compression|texture compression]], and the CLX2's tiled rendering architecture: on-chip tile buffer with internal [[wikipedia:Z-buffer|Z-buffering]], and [[wikipedia:Deferred shading|deferred rendering]] (no need to draw, [[wikipedia:Shading|shade]] or texture overdrawn polygons). The CLX2 is also capable of [[wikipedia:Order-independent transparency|order-independent transparency]] (which the Voodoo3 and GeForce 256 lacked) and [[wikipedia:Normal mapping|Dot3 normal mapping]] (which the Voodoo3 lacked).{{ref|1=''[[wikipedia:PC Magazine|PC Magazine]]'', [https://books.google.co.uk/books?id=90OvoBUqQoIC&pg=PA193 December 1999, page 193]}} |
In terms of game engine performance, the CLX2 peaks at 5 million polygons/sec,{{ref|[http://planetdc.segaretro.org/games/reviews/testdrivelemans/index.html Test Drive: Le Mans] ([[wikipedia:IGN|IGN]])}} compared to the GeForce 256 which peaks at 2.9 million polygons/sec.{{ref|[http://ixbtlabs.com/articles/gf2hwtl/ Actual HW T&L perfomance of NVIDIA GeForce/GeForce2 chips (IXBT Labs)]}} Dreamcast game engines rendered 50,000–160,000 polygons per scene (3–5 million polygons/sec),{{ref|[http://planetdc.segaretro.org/games/reviews/testdrivelemans/index.html Test Drive: Le Mans] ([[wikipedia:IGN|IGN]])}} while PC game engines of 1999 rendered up to 10,000 polygons per scene{{ref|1=''[[wikipedia:PC Magazine|PC Magazine]]'', [https://books.google.co.uk/books?id=90OvoBUqQoIC&pg=PA203 December 1999, page 203]}}{{ref|[https://udn.epicgames.com/Two/UnrealModeling.html Unreal Modeling Guide (Unreal Developer Network)]}} (1–1.6 million polygons/sec).{{ref|[http://gamepilgrimage.com/sites/default/files/32-bitCompare/6thGen/QuakeIIIArena/QIIIPolygonCountsEstimated.png QIII Arena High Polygon Count]}} Character models in particular were significantly more detailed in Dreamcast games than in PC games during 1998–1999.{{ref|1=[https://www.youtube.com/watch?v=c0blSBgpRUg DF Retro: Shenmue - A Game Ahead Of Its Time] ([[wikipedia:Eurogamer|Digital Foundry]])}} | In terms of game engine performance, the CLX2 peaks at 5 million polygons/sec,{{ref|[http://planetdc.segaretro.org/games/reviews/testdrivelemans/index.html Test Drive: Le Mans] ([[wikipedia:IGN|IGN]])}} compared to the GeForce 256 which peaks at 2.9 million polygons/sec.{{ref|[http://ixbtlabs.com/articles/gf2hwtl/ Actual HW T&L perfomance of NVIDIA GeForce/GeForce2 chips (IXBT Labs)]}} Dreamcast game engines rendered 50,000–160,000 polygons per scene (3–5 million polygons/sec),{{ref|[http://planetdc.segaretro.org/games/reviews/testdrivelemans/index.html Test Drive: Le Mans] ([[wikipedia:IGN|IGN]])}} while PC game engines of 1999 rendered up to 10,000 polygons per scene{{ref|1=''[[wikipedia:PC Magazine|PC Magazine]]'', [https://books.google.co.uk/books?id=90OvoBUqQoIC&pg=PA203 December 1999, page 203]}}{{ref|[https://udn.epicgames.com/Two/UnrealModeling.html Unreal Modeling Guide (Unreal Developer Network)]}} (1–1.6 million polygons/sec).{{ref|[http://gamepilgrimage.com/sites/default/files/32-bitCompare/6thGen/QuakeIIIArena/QIIIPolygonCountsEstimated.png QIII Arena High Polygon Count]}} Character models in particular were significantly more detailed in Dreamcast games than in PC games during 1998–1999.{{ref|1=[https://www.youtube.com/watch?v=c0blSBgpRUg DF Retro: Shenmue - A Game Ahead Of Its Time] ([[wikipedia:Eurogamer|Digital Foundry]])}} | ||
Line 32: | Line 35: | ||
The [[GameCube]] and [[Xbox]] are both generally more powerful than the Dreamcast, but the Dreamcast has several hardware advantages. The [[wikipedia:Transform and lighting|T&L]] geometry performance of the Dreamcast's SH-4 CPU is faster than the Xbox's Pentium III CPU but slower than the GameCube's PowerPC CPU; however, the GameCube and Xbox have T&L GPU, each with faster geometry performance than the Dreamcast. | The [[GameCube]] and [[Xbox]] are both generally more powerful than the Dreamcast, but the Dreamcast has several hardware advantages. The [[wikipedia:Transform and lighting|T&L]] geometry performance of the Dreamcast's SH-4 CPU is faster than the Xbox's Pentium III CPU but slower than the GameCube's PowerPC CPU; however, the GameCube and Xbox have T&L GPU, each with faster geometry performance than the Dreamcast. | ||
− | The Dreamcast has an on-chip Z-buffer, which the GameCube also has but the Xbox lacks. The Dreamcast has a faster Z-buffer bandwidth than both | + | The Dreamcast has an on-chip Z-buffer, which the GameCube also has but the Xbox lacks. The Dreamcast has a faster Z-buffer bandwidth than both. Its tiled rendering also gives it a higher opaque fillrate, but with lower translucent fillrate. The higher opaque fillrate allows the Dreamcast to draw a higher number of large opaque polygons, whereas the GameCube and Xbox can draw a higher number of small polygons and/or translucent polygons. |
==Graphics comparison table== | ==Graphics comparison table== | ||
Line 40: | Line 43: | ||
|- | |- | ||
! colspan="2" | System | ! colspan="2" | System | ||
− | ! scope="col" | [[Dreamcast]] (1998) | + | ! scope="col" | [[Dreamcast]] (1998){{intref|Sega Dreamcast/Technical specifications}} |
− | ! scope="col" | [[NAOMI]] (1998) | + | ! scope="col" | [[NAOMI]] (1998){{intref|Sega NAOMI}} |
! scope="col" | [[wikipedia:PC game|PC]] (1998) | ! scope="col" | [[wikipedia:PC game|PC]] (1998) | ||
! scope="col" colspan="3" style="text-align:center;" | PC (1999) | ! scope="col" colspan="3" style="text-align:center;" | PC (1999) | ||
Line 48: | Line 51: | ||
! scope="col" | [[Xbox]] (2001) | ! scope="col" | [[Xbox]] (2001) | ||
|- | |- | ||
− | ! colspan="2" | [[wikipedia:Geometry pipelines|Geometry | + | ! colspan="2" | [[wikipedia:Geometry pipelines|Geometry processor]] |
! [[Hitachi]] [[SuperH|SH-4]] <br> (200 MHz) | ! [[Hitachi]] [[SuperH|SH-4]] <br> (200 MHz) | ||
! Hitachi SH-4 <br> (200 MHz) | ! Hitachi SH-4 <br> (200 MHz) | ||
! [[wikipedia:Pentium II|Intel Pentium II]] <br> (450 MHz) | ! [[wikipedia:Pentium II|Intel Pentium II]] <br> (450 MHz) | ||
− | ! colspan="3" style="text-align:center;" | [[wikipedia:Pentium III|Intel Pentium III 800EB]] (800 MHz) | + | ! colspan="3" style="text-align:center;" | [[wikipedia:Pentium III|Intel Pentium III 800EB]] <br> (800 MHz){{ref|GeForce 256 T&L unit outperformed by Pentium III (742 MHz){{ref|[https://www.beyond3d.com/content/articles/50/ Benchmarking T&L in 3DMark 2000]}}|group=n}} |
! [[wikipedia:Emotion Engine|Emotion Engine]] <br> (294.912 MHz) | ! [[wikipedia:Emotion Engine|Emotion Engine]] <br> (294.912 MHz) | ||
− | ! [[wikipedia: | + | ! [[wikipedia:Nintendo GameCube technical specifications|ATI Flipper]] <br> (162 MHz){{ref|[[wikipedia:Gekko (microprocessor)|Gekko]] (485 MHz) CPU could be used as an alternative geometry processor.|group=n}} |
− | ! | + | ! [[Nvidia]] [[wikipedia:Xbox technical specifications|NV2A]] <br> (233 MHz){{ref|Pentium III (733 MHz) CPU could be used as an alternative geometry processor.|group=n}} |
|- | |- | ||
− | ! rowspan="3" | [[wikipedia:Transformation matrix|Matrix <br> | + | ! rowspan="3" | [[wikipedia:Transformation matrix|Matrix <br> transformations]] <br> {{ref|Matrix transformation (4×4 matrix × 4×1 vector) - 28 computations (16 multiplies, 12 adds){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (page 95)]}}|group=n}} |
− | ! [[wikipedia:FLOPS|FLOPS]] | + | ! Matrix [[wikipedia:FLOPS|FLOPS]] |
| 1.4 [[wikipedia:GFLOPS|GFLOPS]]{{ref|1.4 GFLOPS,{{ref|http://web.archive.org/web/20000823204755/computer.org/micro/articles/dreamcast_2.htm}}{{fileref|SH-4 Next-Generation DSP Architecture.pdf|page=5}} 7 floating-point operations per cycle (28 computations per 4 cycles){{fileref|Entertainment Systems and High-Performance Processor SH-4.pdf|page=4}}{{fileref|SH-4 Next-Generation DSP Architecture.pdf|page=31}}|group=n}} | | 1.4 [[wikipedia:GFLOPS|GFLOPS]]{{ref|1.4 GFLOPS,{{ref|http://web.archive.org/web/20000823204755/computer.org/micro/articles/dreamcast_2.htm}}{{fileref|SH-4 Next-Generation DSP Architecture.pdf|page=5}} 7 floating-point operations per cycle (28 computations per 4 cycles){{fileref|Entertainment Systems and High-Performance Processor SH-4.pdf|page=4}}{{fileref|SH-4 Next-Generation DSP Architecture.pdf|page=31}}|group=n}} | ||
| 1.4 GFLOPS | | 1.4 GFLOPS | ||
| 230 [[wikipedia:MFLOPS|MFLOPS]]{{ref|28 floating-point operations per 53 cycles{{fileref|Streaming SIMD Extensions - Matrix Multiplication.pdf|page=7}}|group=n}} | | 230 [[wikipedia:MFLOPS|MFLOPS]]{{ref|28 floating-point operations per 53 cycles{{fileref|Streaming SIMD Extensions - Matrix Multiplication.pdf|page=7}}|group=n}} | ||
− | | colspan="3" style="text-align:center;" | | + | | colspan="3" style="text-align:center;" | 1.1 GFLOPS{{ref|24 floating-point operations per 17 cycles{{ref|[http://www.cortstratton.org/articles/OptimizingForSSE.php Optimizing for SSE: A Case Study]}}|group=n}} |
| 5.5 GFLOPS{{ref|Emotion Engine FPU: 0.64 GFLOPS <br> Emotion Engine VU0/VU1: 5.52 GFLOPS|group=n}} | | 5.5 GFLOPS{{ref|Emotion Engine FPU: 0.64 GFLOPS <br> Emotion Engine VU0/VU1: 5.52 GFLOPS|group=n}} | ||
− | | 7.5 GFLOPS{{ref| | + | | 7.5 GFLOPS{{ref|Flipper: 7.533 GFLOPS (46.5 floating-point operations per cycle){{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}} <br> Gekko: 1.94 GFLOPS (4 floating-point operations per cycle)|group=n}} |
− | | 5.8 GFLOPS{{ref| | + | | 5.8 GFLOPS{{ref|NV2A: 5.8 GFLOPS (24 floating-point operations per cycle) <br> Pentium III: 1 GFLOPS (24 floating-point operations per 17 cycles)|group=n}} |
|- | |- | ||
! [[wikipedia:Multiply–accumulate operation|MACs]]/sec | ! [[wikipedia:Multiply–accumulate operation|MACs]]/sec | ||
| 800 million{{ref|4 MAC operations per cycle{{fileref|SH-4 Next-Generation DSP Architecture.pdf|page=31}}|group=n}} | | 800 million{{ref|4 MAC operations per cycle{{fileref|SH-4 Next-Generation DSP Architecture.pdf|page=31}}|group=n}} | ||
| 800 million | | 800 million | ||
− | | | + | | 130 million{{ref|3.3125 cycles per MAC operation: 53 cycles per 12 MACs{{fileref|Streaming SIMD Extensions - Matrix Multiplication.pdf|page=7}}|group=n}} |
− | | colspan="3" style="text-align:center;" | | + | | colspan="3" style="text-align:center;" | 420 million{{ref|17 cycles per 9 MAC operations{{ref|[http://www.cortstratton.org/articles/OptimizingForSSE.php Optimizing for SSE: A Case Study]}}|group=n}} |
| 2 billion{{ref|8 MAC operations per cycle (4 MAC operations per VU){{fileref|ThePowerOfPS2.pdf|page=6}}|group=n}} | | 2 billion{{ref|8 MAC operations per cycle (4 MAC operations per VU){{fileref|ThePowerOfPS2.pdf|page=6}}|group=n}} | ||
| 3 billion{{ref|19 MAC operations (12 MACs matrix transform, 7 MACs perspective transform) per cycle (8 cycles per 8 vertices){{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | | 3 billion{{ref|19 MAC operations (12 MACs matrix transform, 7 MACs perspective transform) per cycle (8 cycles per 8 vertices){{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | ||
− | | 2 billion{{ref| | + | | 2 billion{{ref|NV2A: 2.2 billion MAC operations per second (116.5 million vertices/sec, 19 MACs each) per second (9.5 MAC operations per cycle) <br> Pentium III: 380 million MAC operations per second (17 cycles per 9 MAC operations)|group=n}} |
|- | |- | ||
! [[wikipedia:Vertex (computer graphics)|Vertices]] | ! [[wikipedia:Vertex (computer graphics)|Vertices]] | ||
Line 80: | Line 83: | ||
| 50 MVertices/s | | 50 MVertices/s | ||
| 8.4 MVertices/s{{ref|53 cycles per matrix transformation{{fileref|Streaming SIMD Extensions - Matrix Multiplication.pdf|page=7}}|group=n}} | | 8.4 MVertices/s{{ref|53 cycles per matrix transformation{{fileref|Streaming SIMD Extensions - Matrix Multiplication.pdf|page=7}}|group=n}} | ||
− | | colspan="3" style="text-align:center;" | | + | | colspan="3" style="text-align:center;" | 47 MVertices/s{{ref|17 cycles per matrix transformation{{ref|[http://www.cortstratton.org/articles/OptimizingForSSE.php Optimizing for SSE: A Case Study]}}|group=n}} |
| 140 MVertices/s{{ref|2 matrix transformations (1 transformation per VU) per 4 cycles{{fileref|ThePowerOfPS2.pdf|page=12}}|group=n}} | | 140 MVertices/s{{ref|2 matrix transformations (1 transformation per VU) per 4 cycles{{fileref|ThePowerOfPS2.pdf|page=12}}|group=n}} | ||
| 162 MVertices/s{{ref|1 matrix transformation per cycle{{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | | 162 MVertices/s{{ref|1 matrix transformation per cycle{{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | ||
− | | 116 MVertices/s{{ref|Pentium III: | + | | 116 MVertices/s{{ref|NV2A: 116 million vertices per second <br> Pentium III: 43 million vertices per second (17 cycles per matrix transformation)|group=n}} |
|- | |- | ||
− | ! colspan="2" | [[wikipedia:3D projection|Perspective | + | ! colspan="2" | [[wikipedia:3D projection|Perspective transformations]] |
| 16 MVertices/s{{ref|1=MVertices/s = Million vertices per second|group=n}} | | 16 MVertices/s{{ref|1=MVertices/s = Million vertices per second|group=n}} | ||
| 16 MVertices/s | | 16 MVertices/s | ||
| 2.6 MVertices/s{{ref|170 cycles per perspective transformation: 53 cycles matrix transformation, 117 cycles projection (2 multiplies, 2 adds, 3 divides){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (pages 95-97)]}} (2 cycles per multiply, 1 cycle per add, 37 cycles per divide){{fileref|Instruction Tables.pdf|page=107}}|group=n}} | | 2.6 MVertices/s{{ref|170 cycles per perspective transformation: 53 cycles matrix transformation, 117 cycles projection (2 multiplies, 2 adds, 3 divides){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (pages 95-97)]}} (2 cycles per multiply, 1 cycle per add, 37 cycles per divide){{fileref|Instruction Tables.pdf|page=107}}|group=n}} | ||
− | | colspan="3" style="text-align:center;" | | + | | colspan="3" style="text-align:center;" | 11 MVertices/s{{ref|72 cycles per perspective transformation: 17 cycles matrix transformation, 55 cycles projection (2 multiplies, 2 adds, 3 divides){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (pages 95-97)]}} (1 cycle per multiply, 1 cycle per add, 17 cycles per divide){{fileref|Instruction Tables.pdf|page=110}}|group=n}} |
| 80 MVertices/s | | 80 MVertices/s | ||
| 160 MVertices/s{{ref|8 cycles per 8 perspective transformations in T&L pipeline{{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | | 160 MVertices/s{{ref|8 cycles per 8 perspective transformations in T&L pipeline{{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | ||
− | | 110 MVertices/s{{ref| | + | | 110 MVertices/s{{ref|NV2A: 116.5 million vertices per second (2 cycles per vertex) <br> Pentium III: 10 million vertices per second (72 cycles per perspective transformation)|group=n}} |
|- | |- | ||
! rowspan="2" | [[wikipedia:Transform, clipping, and lighting|Lighting]] | ! rowspan="2" | [[wikipedia:Transform, clipping, and lighting|Lighting]] | ||
Line 102: | Line 105: | ||
| 39 MPolygons/s{{ref|15 cycles/vertex{{ref|1=[http://www.gamasutra.com/view/feature/131444/procedural_rendering_on_.php?page=4 Procedural Rendering on Playstation 2 (page 4)] ([[wikipedia:Gamasutra|Gamasutra]])}} per VU|group=n}} | | 39 MPolygons/s{{ref|15 cycles/vertex{{ref|1=[http://www.gamasutra.com/view/feature/131444/procedural_rendering_on_.php?page=4 Procedural Rendering on Playstation 2 (page 4)] ([[wikipedia:Gamasutra|Gamasutra]])}} per VU|group=n}} | ||
| 90 MPolygons/s{{ref|14 cycles per 8 vertices{{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | | 90 MPolygons/s{{ref|14 cycles per 8 vertices{{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | ||
− | | 46 MPolygons/s{{ref| | + | | 46 MPolygons/s{{ref|NV2A: 46 million vertices per second, 5 cycles per vertex (2 cycles transform, 21 MACs lighting){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (pages 95-97)]}} <br> Pentium III: 6.6 million vertices per second (110 cycles per vertex)|group=n}} |
|- | |- | ||
! 4 light sources | ! 4 light sources | ||
Line 111: | Line 114: | ||
| 9.8 MPolygons/s{{ref|60 cycles/vertex per VU: 4 light sources,{{fileref|ThePowerOfPS2.pdf|page=4}} 15 cycles/vertex per light source{{ref|1=[http://www.gamasutra.com/view/feature/131444/procedural_rendering_on_.php?page=4 Procedural Rendering on Playstation 2 (page 4)] ([[wikipedia:Gamasutra|Gamasutra]])}}|group=n}} | | 9.8 MPolygons/s{{ref|60 cycles/vertex per VU: 4 light sources,{{fileref|ThePowerOfPS2.pdf|page=4}} 15 cycles/vertex per light source{{ref|1=[http://www.gamasutra.com/view/feature/131444/procedural_rendering_on_.php?page=4 Procedural Rendering on Playstation 2 (page 4)] ([[wikipedia:Gamasutra|Gamasutra]])}}|group=n}} | ||
| 20 MPolygons/s{{ref|63 cycles per 8 vertices{{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | | 20 MPolygons/s{{ref|63 cycles per 8 vertices{{ref|''[[wikipedia:The Nikkei|Nikkei Electronics]]'' (2000/10/9)}}|group=n}} | ||
− | | 16 MPolygons/s{{ref| | + | | 16 MPolygons/s{{ref|NV2A: 13 million vertices per second, 14 cycles per vertex (2 cycles transform, 3 cycles per light source){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (pages 95-97)]}} <br> Pentium III: 5.3 million vertices per second (136 cycles per vertex)|group=n}} |
|- | |- | ||
! colspan="2" | [[wikipedia:Rendering pipeline|Rendering processors]] | ! colspan="2" | [[wikipedia:Rendering pipeline|Rendering processors]] | ||
− | ! [[Sega Dreamcast#Graphics|PowerVR CLX2]] <br> (100 MHz) | + | ! [[Sega Dreamcast/Technical specifications#Graphics|PowerVR CLX2]] <br> (100 MHz) |
! [[Sega NAOMI#Graphics|PowerVR2]] <br> (100 MHz){{ref|High-end arcade revision of PowerVR2 with twice the performance of the Dreamcast's PowerVR CLX2|group=n}} | ! [[Sega NAOMI#Graphics|PowerVR2]] <br> (100 MHz){{ref|High-end arcade revision of PowerVR2 with twice the performance of the Dreamcast's PowerVR CLX2|group=n}} | ||
! 2x [[wikipedia:Voodoo2|Voodoo2]] ([[wikipedia:Scan-Line Interleave|SLI]]) <br> (90 MHz){{ref|2x framebuffer (twice Voodoo2), 2x TMU (texture mapping units) (same as Voodoo2)|group=n}} | ! 2x [[wikipedia:Voodoo2|Voodoo2]] ([[wikipedia:Scan-Line Interleave|SLI]]) <br> (90 MHz){{ref|2x framebuffer (twice Voodoo2), 2x TMU (texture mapping units) (same as Voodoo2)|group=n}} | ||
Line 121: | Line 124: | ||
! GeForce 256 <br> (120 MHz) | ! GeForce 256 <br> (120 MHz) | ||
! [[wikipedia:Graphics Synthesizer|Graphics Synthesizer]] <br> (147.456 MHz) | ! [[wikipedia:Graphics Synthesizer|Graphics Synthesizer]] <br> (147.456 MHz) | ||
− | ! Flipper (162 MHz) | + | ! Flipper <br> (162 MHz) |
− | ! NV2A (233 MHz) | + | ! NV2A <br> (233 MHz) |
|- | |- | ||
! rowspan="2" | [[wikipedia:Tiled rendering|Tiled <br> rendering]] | ! rowspan="2" | [[wikipedia:Tiled rendering|Tiled <br> rendering]] | ||
Line 142: | Line 145: | ||
| 32×16 pixels | | 32×16 pixels | ||
|- | |- | ||
− | ! rowspan=" | + | ! rowspan="3" | [[Pixel]] <br> [[fillrate]] |
! Opaque | ! Opaque | ||
| 3.2 [[Pixel|GPixels/s]]{{ref|ISP unit's PE Array of 32 processor elements process 32 pixels per cycle|group=n}} | | 3.2 [[Pixel|GPixels/s]]{{ref|ISP unit's PE Array of 32 processor elements process 32 pixels per cycle|group=n}} | ||
Line 148: | Line 151: | ||
| 100 [[Pixel|MPixels/s]]{{ref|720 MB/s framebuffer bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 646 MB/s available bandwidth, 6 bytes per pixel (16-bit color, 32-bit Z-buffer precision)|group=n}} | | 100 [[Pixel|MPixels/s]]{{ref|720 MB/s framebuffer bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 646 MB/s available bandwidth, 6 bytes per pixel (16-bit color, 32-bit Z-buffer precision)|group=n}} | ||
| 500 MPixels/s | | 500 MPixels/s | ||
− | | rowspan=" | + | | rowspan="3" | 200 MPixels/s |
− | | rowspan=" | + | | rowspan="3" | 430 MPixels/s <br> {{ref|2.656 GB/s VRAM bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 2.582 GB/s available bandwidth, 6 bytes per pixel (16-bit color, 32-bit Z-buffer precision)|group=n}} |
− | | rowspan=" | + | | rowspan="3" | 2.3 GPixels/s <br> {{ref|16 pixel pipelines|group=n}} |
− | | rowspan=" | + | | rowspan="3" | 648 MPixels/s |
− | | rowspan=" | + | | rowspan="3" | 870 MPixels/s <br> {{ref|5.336 GB/s video RAM bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 5.262 GB/s available bandwidth, 6 bytes per pixel (double-buffered 16-bit color, 32-bit Z-buffer precision)|group=n}} |
|- | |- | ||
− | ! Opaque/ | + | ! Opaque/Translucent |
| 500 MPixels/s | | 500 MPixels/s | ||
| 1 [[Pixel|GPixel/s]] | | 1 [[Pixel|GPixel/s]] | ||
| 100 MPixels/s | | 100 MPixels/s | ||
| 250 MPixels/s | | 250 MPixels/s | ||
+ | |- | ||
+ | ! [[wikipedia:Alpha blending|Translucent]] | ||
+ | | 200 MPixels/s | ||
+ | | 400 MPixels/s | ||
+ | | 100 MPixels/s | ||
+ | | 125 MPixels/s | ||
|- | |- | ||
! rowspan="2" | [[Texel|Texture <br> fillrate]] | ! rowspan="2" | [[Texel|Texture <br> fillrate]] | ||
Line 265: | Line 274: | ||
| 6:1 (S3TC) | | 6:1 (S3TC) | ||
|- | |- | ||
− | ! rowspan="2" | CPU–GPU <br> transfer <br> | + | ! rowspan="2" | CPU–GPU <br> transfer bus <br> {{ref|Bus interface transfers polygons and/or textures from the CPU to the GPU's [[VRAM]]|group=n}} |
! [[Byte|Bandwidth]] | ! [[Byte|Bandwidth]] | ||
| 800 [[Byte|MB/s]]{{ref|http://web.archive.org/web/20000823204755/computer.org/micro/articles/dreamcast_2.htm}} | | 800 [[Byte|MB/s]]{{ref|http://web.archive.org/web/20000823204755/computer.org/micro/articles/dreamcast_2.htm}} | ||
Line 361: | Line 370: | ||
|- | |- | ||
! Texture buffer | ! Texture buffer | ||
− | | 800 MB/s <br> ( | + | | 800 MB/s <br> (compressed 6 GB/s) |
− | | 1 GB/s <br> ( | + | | 1 GB/s <br> (compressed 7 GB/s) |
− | | 720 MB/s{{ref|90 MHz, 2x 64-bit TMU, 2x 64-bit framebuffer|group=n}} | + | | 720 MB/s{{ref|90 MHz, 2x 64-bit TMU, 2x 64-bit framebuffer|group=n}} |
| 9.6 GB/s{{ref|38.4 GB/s framebuffer, 9.6 GB/s texture cache|group=n}} | | 9.6 GB/s{{ref|38.4 GB/s framebuffer, 9.6 GB/s texture cache|group=n}} | ||
| 10 GB/s{{ref|10.368 GB/s texture cache (512-bit, 162 MHz), 9.632 GB/s framebuffer|group=n}} | | 10 GB/s{{ref|10.368 GB/s texture cache (512-bit, 162 MHz), 9.632 GB/s framebuffer|group=n}} |
Latest revision as of 18:01, 24 October 2022
- For technical details on the Sega Dreamcast, see Sega Dreamcast/Technical specifications.
This article needs to be rewritten. This article needs to be rewritten to conform to a higher standard of article quality. After the article has been rewritten, you may remove this message. For help, see the How to Edit a Page article. |
This article needs cleanup. This article needs to be edited to conform to a higher standard of article quality. After the article has been cleaned up, you may remove this message. For help, see the How to Edit a Page article. |
This article presents a hardware comparison between the Sega Dreamcast and other rival systems in its time. It compares the technical specifications and hardware advantages/disadvantages between the systems.
Contents
Vs. Arcade
The Sega Dreamcast's arcade counterpart, the Sega NAOMI, has the same CPU, the Hitachi SH-4, at the same clock rate, but is more powerful in other ways, including an updated PowerVR2 GPU with faster performance, additional RAM and VRAM, higher bandwidth, and faster ROM cartridge storage. The NAOMI released for $1995, ten times the price of the Dreamcast and more expensive than a high-end PC at the time, but cheaper than the Sega Model 3 arcade system (which debuted at $20,000 in 1996).
The NAOMI was, in turn, the basis for two significantly more powerful arcade systems, the Hikaru (debuted 1999) and NAOMI 2 (debuted 2000). Sega later packaged the Dreamcast into an arcade board as the Atomiswave. While the Dreamcast is not as powerful as 1997–1999 Sega arcade hardware, including the Model 3 Step 2 (debuted 1997), NAOMI, and Hikaru, the Dreamcast surpassed the Model 3 Step 1 (debuted 1996) in performance.[1]
Vs. PC
Neon 250
The Dreamcast's PowerVR CLX2 GPU was the basis for the PowerVR PMX1, a PC GPU released with the Neon 250 graphics card in 1999. However, the Neon 250 lacks many of the tiled rendering features of the CLX2: the tile size is halved (halving the fillrate), it lacks the CLX2's internal Z-buffering and alpha test capability with hardware front-to-back translucency sorting (further reducing the fillrate and performance, as well as requiring the Neon 250 to render a Z-buffer externally), and the tiling is partially handled by software (the CLX2 handles the tiling entirely in hardware). The Neon 250 also lacks the CLX2's latency buffering and palettized texture support while VQ texture compression performance is halved, and it has bus contention due to having a single data bus (whereas the CLX2 has two data buses).
The PowerVR2 was also optimized for the Hitachi SH-4's geometry processing capabilities (rather than for a Pentium II or III), while PC drivers and software were not optimized for the Neon 250's tiled rendering architecture (compared to Dreamcast games which were optimized for the CLX2's tiled rendering architecture). The Neon 250 thus had only a fraction of the Dreamcast CLX2's fillrate and rendering performance. The reduction in performance from the Dreamacst's CLX2 to the Neon 250 was comparable to the reduction in performance from the Sega Model 3's Real3D Pro-1000 to the Intel740.
Pentium, GeForce, Voodoo
In most ways, the Dreamcast was generally the most powerful home system during 1998–1999, outperforming high-end PC hardware in most ways during that era.[2] The Dreamcast's Hitachi SH-4 CPU calculates 3D graphics several times faster than a Pentium II from 1998,[2] and faster than a Pentium III and NVIDIA GeForce 256 from 1999. The Dreamcast's PowerVR CLX2 GPU, due to its tiled rendering architecture, also has has a higher fillrate and faster polygon rendering throughput than a Voodoo3 and GeForce 256 from 1999. On the other hand, the GeForce 256 has a higher fillrate for translucent polygons, whereas the Dreamcast's CLX2 has a higher fillrate for opaque polygons and an overall higher average fillrate (for scenes with both opaque and translucent polygons).
The Dreamcast's CPU–GPU transmission bus is faster than the Voodoo3 and has a higher effective bandwidth than the GeForce 256 due to the Dreamcast's efficient bandwidth usage, including its lack of CPU overhead from the operating system, textures loaded directly to VRAM (freeing up CPU–GPU transmission bus for polygons), higher texture compression, and the CLX2's tiled rendering architecture: on-chip tile buffer with internal Z-buffering, and deferred rendering (no need to draw, shade or texture overdrawn polygons). The CLX2 is also capable of order-independent transparency (which the Voodoo3 and GeForce 256 lacked) and Dot3 normal mapping (which the Voodoo3 lacked).[3]
In terms of game engine performance, the CLX2 peaks at 5 million polygons/sec,[4] compared to the GeForce 256 which peaks at 2.9 million polygons/sec.[5] Dreamcast game engines rendered 50,000–160,000 polygons per scene (3–5 million polygons/sec),[4] while PC game engines of 1999 rendered up to 10,000 polygons per scene[6][7] (1–1.6 million polygons/sec).[8] Character models in particular were significantly more detailed in Dreamcast games than in PC games during 1998–1999.[9]
Vs. Consoles
PlayStation 2
Compared to the rival PlayStation 2, the Dreamcast is more effective at textures, anti-aliasing, and image quality, while the PS2 is more effective at polygon geometry, physics, particles, and lighting. The PS2 has a more powerful CPU geometry engine, higher translucent fillrate, and more main RAM (32 MB, compared to Dreamcast's 16 MB), while the DC has more VRAM (8 MB, compared to PS2's 4 MB), higher opaque fillrate, and more GPU hardware features, with CLX2 capabilities like tiled rendering, super-sample anti-aliasing, Dot3 normal mapping, order-independent transparency, and texture compression, which the PS2's GPU lacks.
With larger VRAM and tiled rendering, the DC can render a larger framebuffer at higher native resolution (with an on-chip Z-buffer), and with texture compression, it can compress around 20–60 MB of texture data in its VRAM. Because the PS2 has only 4 MB VRAM, it relies on the main RAM to store textures. While the PS2's CPU–GPU transmission bus for transferring polygons and textures is 50% faster than the Dreamcast's CPU–GPU transmission bus, the DC has textures loaded directly to VRAM (freeing up the CPU–GPU transmission bus for polygons) and texture compression gives it higher effective texture bandwidth.
Dreamcast games were effectively using 20–30 MB of texture data[1] (compressed to around 5–6 MB),[10] while PS2 games up until 2003 peaked at 5.5 MB of texture data (average 1.5 MB). PS2 games up until 2003 rendered up to 7.5 million polygons/sec (145,000 polygons per scene), with most rendering 2–5 million polygons/sec (average 52,000 polygons per scene);[11] in comparison, Dreamcast game engines rendered up to 5 million polygons/sec (160,000 polygons per scene),[4] with most games rendering 2–4 million polygons/sec (average 50,000 polygons per scene).
The Dreamcast is more user-friendly for developers, making it easier to develop for, while the PS2 is more difficult to develop for; this is the reverse of the 32-bit era, when the PlayStation was more user-friendly, and the Saturn more difficult, for developers.
GameCube and Xbox
The GameCube and Xbox are both generally more powerful than the Dreamcast, but the Dreamcast has several hardware advantages. The T&L geometry performance of the Dreamcast's SH-4 CPU is faster than the Xbox's Pentium III CPU but slower than the GameCube's PowerPC CPU; however, the GameCube and Xbox have T&L GPU, each with faster geometry performance than the Dreamcast.
The Dreamcast has an on-chip Z-buffer, which the GameCube also has but the Xbox lacks. The Dreamcast has a faster Z-buffer bandwidth than both. Its tiled rendering also gives it a higher opaque fillrate, but with lower translucent fillrate. The higher opaque fillrate allows the Dreamcast to draw a higher number of large opaque polygons, whereas the GameCube and Xbox can draw a higher number of small polygons and/or translucent polygons.
Graphics comparison table
- See Sega Dreamcast technical specifications for more technical details on Dreamcast hardware
System | Dreamcast (1998)[12] | NAOMI (1998)[13] | PC (1998) | PC (1999) | PlayStation 2 (2000) | GameCube (2001) | Xbox (2001) | |||
---|---|---|---|---|---|---|---|---|---|---|
Geometry processor | Hitachi SH-4 (200 MHz) |
Hitachi SH-4 (200 MHz) |
Intel Pentium II (450 MHz) |
Intel Pentium III 800EB (800 MHz)[n 1] |
Emotion Engine (294.912 MHz) |
ATI Flipper (162 MHz)[n 2] |
Nvidia NV2A (233 MHz)[n 3] | |||
Matrix transformations [n 4] |
Matrix FLOPS | 1.4 GFLOPS[n 5] | 1.4 GFLOPS | 230 MFLOPS[n 6] | 1.1 GFLOPS[n 7] | 5.5 GFLOPS[n 8] | 7.5 GFLOPS[n 9] | 5.8 GFLOPS[n 10] | ||
MACs/sec | 800 million[n 11] | 800 million | 130 million[n 12] | 420 million[n 13] | 2 billion[n 14] | 3 billion[n 15] | 2 billion[n 16] | |||
Vertices | 50 MVertices/s[n 17] | 50 MVertices/s | 8.4 MVertices/s[n 18] | 47 MVertices/s[n 19] | 140 MVertices/s[n 20] | 162 MVertices/s[n 21] | 116 MVertices/s[n 22] | |||
Perspective transformations | 16 MVertices/s[n 23] | 16 MVertices/s | 2.6 MVertices/s[n 24] | 11 MVertices/s[n 25] | 80 MVertices/s | 160 MVertices/s[n 26] | 110 MVertices/s[n 27] | |||
Lighting | 1 light source | 14 MPolygons/s[n 28] | 14 MPolygons/s | 2 MPolygons/s[n 29] | 7.2 MPolygons/s[n 30] | 39 MPolygons/s[n 31] | 90 MPolygons/s[n 32] | 46 MPolygons/s[n 33] | ||
4 light sources | 6.8 MPolygons/s | 6.8 MPolygons/s | 1.1 MPolygons/s[n 34] | 5.8 MPolygons/s[n 35] | 9.8 MPolygons/s[n 36] | 20 MPolygons/s[n 37] | 16 MPolygons/s[n 38] | |||
Rendering processors | PowerVR CLX2 (100 MHz) |
PowerVR2 (100 MHz)[n 39] |
2x Voodoo2 (SLI) (90 MHz)[n 40] |
Neon 250 (125 MHz) |
Voodoo3 SE (200 MHz)[n 41] |
GeForce 256 (120 MHz) |
Graphics Synthesizer (147.456 MHz) |
Flipper (162 MHz) |
NV2A (233 MHz) | |
Tiled rendering |
Tiling FPU | 720 MFLOPS | 1 GFLOPS | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
Tile size | 32×32 pixels | 32×32 pixels | N/A | 32×16 pixels | ||||||
Pixel fillrate |
Opaque | 3.2 GPixels/s[n 42] | 6 GPixels/s[n 43] | 100 MPixels/s[n 44] | 500 MPixels/s | 200 MPixels/s | 430 MPixels/s [n 45] |
2.3 GPixels/s [n 46] |
648 MPixels/s | 870 MPixels/s [n 47] |
Opaque/Translucent | 500 MPixels/s | 1 GPixel/s | 100 MPixels/s | 250 MPixels/s | ||||||
Translucent | 200 MPixels/s | 400 MPixels/s | 100 MPixels/s | 125 MPixels/s | ||||||
Texture fillrate |
Opaque | 3.2 GTexels/s | 6 GTexels/s | 100 MTexels/s | 500 MTexels/s | 380 MTexels/s [n 48] |
320 MTexels/s [n 49] |
1.1 GTexels/s | 648 MTexels/s | 650 MTexels/s [n 50] |
Opaque/Translucent | 500 MTexels/s | 1 GTexel/s | 100 MTexels/s | 250 MTexels/s | ||||||
Multi-texture fillrate |
Opaque | 1.6 GTexels/s | 3 GTexels/s | 100 MTexels/s | 250 MTexels/s | 310 MTexels/s [n 51] |
250 MTexels/s [n 52] |
580 MTexels/s | 648 MTexels/s | 520 MTexels/s [n 53] |
Opaque/Translucent | 250 MTexels/s | 500 MTexels/s | 100 MTexels/s | 120 MTexels/s | ||||||
Textured polygons |
32-pixel | 7.1 MPolygons/s | 12 MPolygons/s | 2 MPolygons/s | 4 MPolygons/s[31] | 6 MPolygons/s | 7 MPolygons/s | 30 MPolygons/s | 20 MPolygons/s | 20 MPolygons/s |
100-pixel (opaque) | 7.1 MPolygons/s | 12 MPolygons/s | 1 MPolygons/s | 4 MPolygons/s | 2 MPolygons/s | 4.3 MPolygons/s | 10 MPolygons/s | 6.4 MPolygons/s | 8 MPolygons/s | |
100-pixel (opaque/translucent) |
5 MPolygons/s | 10 MPolygons/s | 1 MPolygons/s | 2.5 MPolygons/s | ||||||
Multi-texture polygons |
32-pixel | 7.1 MPolygons/s | 12 MPolygons/s | 2 MPolygons/s | 4 MPolygons/s | 5 MPolygons/s | 7 MPolygons/s | 18 MPolygons/s | 20 MPolygons/s | 16 MPolygons/s |
100-pixel (opaque) | 7.1 MPolygons/s | 12 MPolygons/s | 1 MPolygons/s | 2.5 MPolygons/s | 2 MPolygons/s | 2.5 MPolygons/s | 5 MPolygons/s | 6.4 MPolygons/s | 5 MPolygons/s | |
100-pixel (opaque/translucent) |
2.5 MPolygons/s | 5 MPolygons/s | 1 MPolygons/s | 1.2 MPolygons/s | ||||||
Texture compression ratio | 7.98:1 (VQ) | 7.98:1 (VQ) | 3:1 (palette)[n 54] | 4:1 (VQ) | 4:1 (FXT1) | 6:1 (S3TC) | 3:1 (palette)[n 55] | 6:1 (S3TC) | 6:1 (S3TC) | |
CPU–GPU transfer bus [n 56] |
Bandwidth | 800 MB/s[16] | 800 MB/s | 260 MB/s[n 57] | 530 MB/s[n 58] | 530 MB/s[n 59] | 1 GB/s[n 60] | 1.2 GB/s[30] | 1.3 GB/s[n 61] | 1.064 GB/s[n 62] |
Texture compression | 6.3 GB/s | 6.3 GB/s | 800 MB/s | 2.1 GB/s | 2.1 GB/s | 6 GB/s | 3.6 GB/s | 7.7 GB/s | 6.3 GB/s | |
Internal GPU cache |
Cache memory | 33 KB[n 63] | 46 KB[n 64] | N/A | 16 KB[n 65] | N/A | N/A | 4 MB | 3 MB | N/A |
Bandwidth | 15 GB/s[n 66] | 28 GB/s[n 67] | N/A | 1 GB/s | N/A | N/A | 48 GB/s[n 68] | 20 GB/s[n 69] | ||
External video memory |
External memory | 24 MB (SDRAM)[n 70] |
48 MB (SDRAM), 100 MB (VROM)[1] |
16 MB (SDRAM)[n 71] |
32 MB (SDRAM) |
16 MB (SDRAM) |
32 MB (SDRAM) |
32 MB (RDRAM)[n 72] |
24 MB (1T-SRAM) |
64 MB (DDR SDRAM) |
Texture compression | 190 MB | 300 MB (SDRAM), 700 MB (VROM) |
32 MB[n 73] | 120 MB | 64 MB | 190 MB | 96 MB | 144 MB | 300 MB | |
Bandwidth | 1.6 GB/s[n 74] | 1.8 GB/s[n 75] | 2.8 GB/s[n 76] | 1 GB/s | 3.1 GB/s | 2.6 GB/s | 3.2 GB/s[n 77] | 2.6 GB/s[n 78] | 5.3 GB/s[n 79] | |
Buffering bandwidth |
Framebuffer | 800 MB/s (tiled 6.4 GB/s)[n 80] |
1 GB/s (tiled 12 GB/s)[n 81] |
720 MB/s[n 76] | 1 GB/s | 3.1 GB/s | 2.6 GB/s | 38 GB/s[n 68] | 9.6 GB/s[n 69] | 5.3 GB/s |
Z-buffer | 12 GB/s[n 66] | 25 GB/s[n 67] | ||||||||
Texture buffer | 800 MB/s (compressed 6 GB/s) |
1 GB/s (compressed 7 GB/s) |
720 MB/s[n 76] | 9.6 GB/s[n 68] | 10 GB/s[n 69] | |||||
System | Dreamcast (1998) | NAOMI (1998) | PC (1998) | PC (1999) | PlayStation 2 (2000) | GameCube (2001) | Xbox (2001) |
Notes
- ↑ [GeForce 256 T&L unit outperformed by Pentium III (742 MHz)[14] GeForce 256 T&L unit outperformed by Pentium III (742 MHz)[14]]
- ↑ Gekko (485 MHz) CPU could be used as an alternative geometry processor.
- ↑ [Pentium III (733 MHz) CPU could be used as an alternative geometry processor. Pentium III (733 MHz) CPU could be used as an alternative geometry processor.]
- ↑ [Matrix transformation (4×4 matrix × 4×1 vector) - 28 computations (16 multiplies, 12 adds)[15] Matrix transformation (4×4 matrix × 4×1 vector) - 28 computations (16 multiplies, 12 adds)[15]]
- ↑ [1.4 GFLOPS,[16][17] 7 floating-point operations per cycle (28 computations per 4 cycles)[18][19] 1.4 GFLOPS,[16][17] 7 floating-point operations per cycle (28 computations per 4 cycles)[18][19]] (Wayback Machine: 2000-08-23 20:47)
- ↑ [28 floating-point operations per 53 cycles[20] 28 floating-point operations per 53 cycles[20]]
- ↑ [24 floating-point operations per 17 cycles[21] 24 floating-point operations per 17 cycles[21]]
- ↑ [Emotion Engine FPU: 0.64 GFLOPS
Emotion Engine VU0/VU1: 5.52 GFLOPS Emotion Engine FPU: 0.64 GFLOPS
Emotion Engine VU0/VU1: 5.52 GFLOPS] - ↑ [Flipper: 7.533 GFLOPS (46.5 floating-point operations per cycle)[22]
Gekko: 1.94 GFLOPS (4 floating-point operations per cycle) Flipper: 7.533 GFLOPS (46.5 floating-point operations per cycle)[22]
Gekko: 1.94 GFLOPS (4 floating-point operations per cycle)] - ↑ [NV2A: 5.8 GFLOPS (24 floating-point operations per cycle)
Pentium III: 1 GFLOPS (24 floating-point operations per 17 cycles) NV2A: 5.8 GFLOPS (24 floating-point operations per cycle)
Pentium III: 1 GFLOPS (24 floating-point operations per 17 cycles)] - ↑ [4 MAC operations per cycle[19] 4 MAC operations per cycle[19]]
- ↑ [3.3125 cycles per MAC operation: 53 cycles per 12 MACs[20] 3.3125 cycles per MAC operation: 53 cycles per 12 MACs[20]]
- ↑ [17 cycles per 9 MAC operations[21] 17 cycles per 9 MAC operations[21]]
- ↑ [8 MAC operations per cycle (4 MAC operations per VU)[23] 8 MAC operations per cycle (4 MAC operations per VU)[23]]
- ↑ [19 MAC operations (12 MACs matrix transform, 7 MACs perspective transform) per cycle (8 cycles per 8 vertices)[22] 19 MAC operations (12 MACs matrix transform, 7 MACs perspective transform) per cycle (8 cycles per 8 vertices)[22]]
- ↑ [NV2A: 2.2 billion MAC operations per second (116.5 million vertices/sec, 19 MACs each) per second (9.5 MAC operations per cycle)
Pentium III: 380 million MAC operations per second (17 cycles per 9 MAC operations) NV2A: 2.2 billion MAC operations per second (116.5 million vertices/sec, 19 MACs each) per second (9.5 MAC operations per cycle)
Pentium III: 380 million MAC operations per second (17 cycles per 9 MAC operations)] - ↑ [4 cycles per matrix transformation[24] 4 cycles per matrix transformation[24]]
- ↑ [53 cycles per matrix transformation[20] 53 cycles per matrix transformation[20]]
- ↑ [17 cycles per matrix transformation[21] 17 cycles per matrix transformation[21]]
- ↑ [2 matrix transformations (1 transformation per VU) per 4 cycles[25] 2 matrix transformations (1 transformation per VU) per 4 cycles[25]]
- ↑ [1 matrix transformation per cycle[22] 1 matrix transformation per cycle[22]]
- ↑ [NV2A: 116 million vertices per second
Pentium III: 43 million vertices per second (17 cycles per matrix transformation) NV2A: 116 million vertices per second
Pentium III: 43 million vertices per second (17 cycles per matrix transformation)] - ↑ [MVertices/s = Million vertices per second MVertices/s = Million vertices per second]
- ↑ [170 cycles per perspective transformation: 53 cycles matrix transformation, 117 cycles projection (2 multiplies, 2 adds, 3 divides)[26] (2 cycles per multiply, 1 cycle per add, 37 cycles per divide)[27] 170 cycles per perspective transformation: 53 cycles matrix transformation, 117 cycles projection (2 multiplies, 2 adds, 3 divides)[26] (2 cycles per multiply, 1 cycle per add, 37 cycles per divide)[27]]
- ↑ [72 cycles per perspective transformation: 17 cycles matrix transformation, 55 cycles projection (2 multiplies, 2 adds, 3 divides)[26] (1 cycle per multiply, 1 cycle per add, 17 cycles per divide)[28] 72 cycles per perspective transformation: 17 cycles matrix transformation, 55 cycles projection (2 multiplies, 2 adds, 3 divides)[26] (1 cycle per multiply, 1 cycle per add, 17 cycles per divide)[28]]
- ↑ [8 cycles per 8 perspective transformations in T&L pipeline[22] 8 cycles per 8 perspective transformations in T&L pipeline[22]]
- ↑ [NV2A: 116.5 million vertices per second (2 cycles per vertex)
Pentium III: 10 million vertices per second (72 cycles per perspective transformation) NV2A: 116.5 million vertices per second (2 cycles per vertex)
Pentium III: 10 million vertices per second (72 cycles per perspective transformation)] - ↑ [MPolygons/s = Million polygons per second MPolygons/s = Million polygons per second]
- ↑ [223 cycles per vertex: 170 cycles perspective transformation, 53 cycles lighting (21 multiplies, 11 adds),[26] 2 cycles per multiply, 1 cycle per add, 37 cycles per divide[27] 223 cycles per vertex: 170 cycles perspective transformation, 53 cycles lighting (21 multiplies, 11 adds),[26] 2 cycles per multiply, 1 cycle per add, 37 cycles per divide[27]]
- ↑ [110 cycles per vertex: Pentium III (742 MHz) calculates 6,752,000 triangle strips per second, with 1 light, faster than GeForce 256's T&L unit[14] 110 cycles per vertex: Pentium III (742 MHz) calculates 6,752,000 triangle strips per second, with 1 light, faster than GeForce 256's T&L unit[14]]
- ↑ [15 cycles/vertex[29] per VU 15 cycles/vertex[29] per VU]
- ↑ [14 cycles per 8 vertices[22] 14 cycles per 8 vertices[22]]
- ↑ [NV2A: 46 million vertices per second, 5 cycles per vertex (2 cycles transform, 21 MACs lighting)[26]
Pentium III: 6.6 million vertices per second (110 cycles per vertex) NV2A: 46 million vertices per second, 5 cycles per vertex (2 cycles transform, 21 MACs lighting)[26]
Pentium III: 6.6 million vertices per second (110 cycles per vertex)] - ↑ [382 cycles per vertex: 170 cycles perspective transformation, 212 cycles lighting (84 multiplies, 44 adds),[26] 2 cycles per multiply, 1 cycle per add, 37 cycles per divide 382 cycles per vertex: 170 cycles perspective transformation, 212 cycles lighting (84 multiplies, 44 adds),[26] 2 cycles per multiply, 1 cycle per add, 37 cycles per divide]
- ↑ [136 cycles per vertex: Pentium III (742 MHz) calculates 5,453,000 triangle strips per second, with 4 lights, faster than GeForce 256's T&L unit[14] 136 cycles per vertex: Pentium III (742 MHz) calculates 5,453,000 triangle strips per second, with 4 lights, faster than GeForce 256's T&L unit[14]]
- ↑ [60 cycles/vertex per VU: 4 light sources,[30] 15 cycles/vertex per light source[29] 60 cycles/vertex per VU: 4 light sources,[30] 15 cycles/vertex per light source[29]]
- ↑ [63 cycles per 8 vertices[22] 63 cycles per 8 vertices[22]]
- ↑ [NV2A: 13 million vertices per second, 14 cycles per vertex (2 cycles transform, 3 cycles per light source)[26]
Pentium III: 5.3 million vertices per second (136 cycles per vertex) NV2A: 13 million vertices per second, 14 cycles per vertex (2 cycles transform, 3 cycles per light source)[26]
Pentium III: 5.3 million vertices per second (136 cycles per vertex)] - ↑ [High-end arcade revision of PowerVR2 with twice the performance of the Dreamcast's PowerVR CLX2 High-end arcade revision of PowerVR2 with twice the performance of the Dreamcast's PowerVR CLX2]
- ↑ [2x framebuffer (twice Voodoo2), 2x TMU (texture mapping units) (same as Voodoo2) 2x framebuffer (twice Voodoo2), 2x TMU (texture mapping units) (same as Voodoo2)]
- ↑ [Falcon Voodoo3 3500 TV Special Edition Falcon Voodoo3 3500 TV Special Edition]
- ↑ [ISP unit's PE Array of 32 processor elements process 32 pixels per cycle ISP unit's PE Array of 32 processor elements process 32 pixels per cycle]
- ↑ [2 ISP units, PE Arrays of 64 processor elements process 64 pixels per cycle 2 ISP units, PE Arrays of 64 processor elements process 64 pixels per cycle]
- ↑ [720 MB/s framebuffer bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 646 MB/s available bandwidth, 6 bytes per pixel (16-bit color, 32-bit Z-buffer precision) 720 MB/s framebuffer bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 646 MB/s available bandwidth, 6 bytes per pixel (16-bit color, 32-bit Z-buffer precision)]
- ↑ [2.656 GB/s VRAM bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 2.582 GB/s available bandwidth, 6 bytes per pixel (16-bit color, 32-bit Z-buffer precision) 2.656 GB/s VRAM bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 2.582 GB/s available bandwidth, 6 bytes per pixel (16-bit color, 32-bit Z-buffer precision)]
- ↑ [16 pixel pipelines 16 pixel pipelines]
- ↑ [5.336 GB/s video RAM bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 5.262 GB/s available bandwidth, 6 bytes per pixel (double-buffered 16-bit color, 32-bit Z-buffer precision) 5.336 GB/s video RAM bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 5.262 GB/s available bandwidth, 6 bytes per pixel (double-buffered 16-bit color, 32-bit Z-buffer precision)]
- ↑ [3.19 GB/s bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 3.116 GB/s available bandwidth, 8 bytes per pixel (16-bit pixel, 16-bit texel, 32-bit Z-buffer) 3.19 GB/s bandwidth, 74 MB/s framebuffer (640×480, 16-bit color, double-buffered, 60 FPS), 3.116 GB/s available bandwidth, 8 bytes per pixel (16-bit pixel, 16-bit texel, 32-bit Z-buffer)]
- ↑ [2.582 GB/s available bandwidth, 8 bytes per pixel (16-bit pixel, 16-bit texel, 32-bit Z-buffer) 2.582 GB/s available bandwidth, 8 bytes per pixel (16-bit pixel, 16-bit texel, 32-bit Z-buffer)]
- ↑ [5.262 GB/s available bandwidth, 8 bytes per pixel (16-bit pixel, 16-bit texel, 32-bit Z-buffer) 5.262 GB/s available bandwidth, 8 bytes per pixel (16-bit pixel, 16-bit texel, 32-bit Z-buffer)]
- ↑ [3.116 GB/s available bandwidth, 10 bytes per pixel (16-bit pixel, dual 16-bit texels, 32-bit Z-buffer) 3.116 GB/s available bandwidth, 10 bytes per pixel (16-bit pixel, dual 16-bit texels, 32-bit Z-buffer)]
- ↑ [2.582 GB/s available bandwidth, 10 bytes per pixel (16-bit pixel, dual 16-bit texels, 32-bit Z-buffer) 2.582 GB/s available bandwidth, 10 bytes per pixel (16-bit pixel, dual 16-bit texels, 32-bit Z-buffer)]
- ↑ [5.262 GB/s available bandwidth, 10 bytes per pixel (16-bit pixel, dual 16-bit texels, 32-bit Z-buffer) 5.262 GB/s available bandwidth, 10 bytes per pixel (16-bit pixel, dual 16-bit texels, 32-bit Z-buffer)]
- ↑ [Low-quality compression[32] Low-quality compression[32]] (Wayback Machine: 1999-04-28 17:17)
- ↑ [Low-quality compression[32] Low-quality compression[32]] (Wayback Machine: 1999-04-28 17:17)
- ↑ [Bus interface transfers polygons and/or textures from the CPU to the GPU's VRAM Bus interface transfers polygons and/or textures from the CPU to the GPU's VRAM]
- ↑ [1x AGP bus[33] 1x AGP bus[33]]
- ↑ [2x AGP bus[31][33] 2x AGP bus[31][33]]
- ↑ [2x AGP bus[33] 2x AGP bus[33]]
- ↑ [Transmission bus from Pentium III 800EB (133 MHz FSB, 1 GB/s) to GeForce 256 (4x AGP)[33] Transmission bus from Pentium III 800EB (133 MHz FSB, 1 GB/s) to GeForce 256 (4x AGP)[33]]
- ↑ [162 MHz (64-bit) CPU FSB 162 MHz (64-bit) CPU FSB]
- ↑ [133 MHz (64-bit) CPU FSB 133 MHz (64-bit) CPU FSB]
- ↑ [8.25 KB register memory, 12.25 KB ISP cache, 13 KB TSP cache, 256 bytes FIFO buffer 8.25 KB register memory, 12.25 KB ISP cache, 13 KB TSP cache, 256 bytes FIFO buffer]
- ↑ [8.25 KB register memory, 24.5 KB ISP cache, 13 KB TSP cache, 256 bytes FIFO buffer 8.25 KB register memory, 24.5 KB ISP cache, 13 KB TSP cache, 256 bytes FIFO buffer]
- ↑ [12 KB ISP Parameter Cache, 4 KB TSP Parameter Cache[34] 12 KB ISP Parameter Cache, 4 KB TSP Parameter Cache[34]]
- ↑ 66.0 66.1 [1.2 GB/s register memory, 12.8 GB/s ISP PE Array, 1.6 GB/s TSP cache 1.2 GB/s register memory, 12.8 GB/s ISP PE Array, 1.6 GB/s TSP cache]
- ↑ 67.0 67.1 [1.6 GB/s register memory, 25.6 GB/s ISP PE Array, 1.6 GB/s TSP cache 1.6 GB/s register memory, 25.6 GB/s ISP PE Array, 1.6 GB/s TSP cache]
- ↑ 68.0 68.1 68.2 [38.4 GB/s framebuffer, 9.6 GB/s texture cache 38.4 GB/s framebuffer, 9.6 GB/s texture cache]
- ↑ 69.0 69.1 69.2 [10.368 GB/s texture cache (512-bit, 162 MHz), 9.632 GB/s framebuffer 10.368 GB/s texture cache (512-bit, 162 MHz), 9.632 GB/s framebuffer]
- ↑ [16 MB main RAM (accessible by SH-4 and CLX2), 8 MB VRAM (accessible by CLX2) 16 MB main RAM (accessible by SH-4 and CLX2), 8 MB VRAM (accessible by CLX2)]
- ↑ [8 MB texture RAM, 8 MB (2x 4 MB) framebuffer RAM 8 MB texture RAM, 8 MB (2x 4 MB) framebuffer RAM]
- ↑ [Accessible by Emotion Engine and Graphics Synthesizer Accessible by Emotion Engine and Graphics Synthesizer]
- ↑ [24 MB texture RAM compression, 8 MB framebuffer RAM 24 MB texture RAM compression, 8 MB framebuffer RAM]
- ↑ [800 MB/s main RAM, 800 MB/s VRAM 800 MB/s main RAM, 800 MB/s VRAM]
- ↑ [800 MB/s main RAM, 1 GB/s VRAM, 612 MB/s VROM 800 MB/s main RAM, 1 GB/s VRAM, 612 MB/s VROM]
- ↑ 76.0 76.1 76.2 [90 MHz, 2x 64-bit TMU, 2x 64-bit framebuffer 90 MHz, 2x 64-bit TMU, 2x 64-bit framebuffer]
- ↑ [Accessible by Emotion Engine at 3.2 GB/s, accessible by Graphics Synthesizer through 1.2 GB/s transmission bus Accessible by Emotion Engine at 3.2 GB/s, accessible by Graphics Synthesizer through 1.2 GB/s transmission bus]
- ↑ [162 MHz (128-bit) bus 162 MHz (128-bit) bus]
- ↑ [6.4 GB/s RAM bandwidth - 1.064 GB/s CPU FSB bandwidth[35] 6.4 GB/s RAM bandwidth - 1.064 GB/s CPU FSB bandwidth[35]]
- ↑ [3.2 gigapixels/sec effective fillrate, 16-bit (2 bytes) per pixel 3.2 gigapixels/sec effective fillrate, 16-bit (2 bytes) per pixel]
- ↑ [6 gigapixels/sec effective fillrate, 16-bit (2 bytes) per pixel 6 gigapixels/sec effective fillrate, 16-bit (2 bytes) per pixel]
References
- ↑ 1.0 1.1 1.2 Hideki Sato Sega Interview (Edge)
- ↑ 2.0 2.1 Gamers' Republic, "August 1998" (US; 1998-07-21), page 29
- ↑ [PC Magazine, December 1999, page 193 PC Magazine, December 1999, page 193]
- ↑ 4.0 4.1 4.2 Test Drive: Le Mans (IGN)
- ↑ Actual HW T&L perfomance of NVIDIA GeForce/GeForce2 chips (IXBT Labs)
- ↑ [PC Magazine, December 1999, page 203 PC Magazine, December 1999, page 203]
- ↑ Unreal Modeling Guide (Unreal Developer Network)
- ↑ QIII Arena High Polygon Count
- ↑ DF Retro: Shenmue - A Game Ahead Of Its Time (Digital Foundry)
- ↑ htt (Wayback Machine: 2001-03-06 00:59)
- ↑ File:HowFarHaveWeGot.pdf
- ↑ Sega Dreamcast/Technical specifications
- ↑ Sega NAOMI
- ↑ 14.0 14.1 14.2 Benchmarking T&L in 3DMark 2000
- ↑ Design of Digital Systems and Devices (page 95)
- ↑ 16.0 16.1 htt (Wayback Machine: 2000-08-23 20:47)
- ↑ File:SH-4 Next-Generation DSP Architecture.pdf, page 5
- ↑ File:Entertainment Systems and High-Performance Processor SH-4.pdf, page 4
- ↑ 19.0 19.1 File:SH-4 Next-Generation DSP Architecture.pdf, page 31
- ↑ 20.0 20.1 20.2 File:Streaming SIMD Extensions - Matrix Multiplication.pdf, page 7
- ↑ 21.0 21.1 21.2 Optimizing for SSE: A Case Study
- ↑ 22.0 22.1 22.2 22.3 22.4 22.5 [Nikkei Electronics (2000/10/9) Nikkei Electronics (2000/10/9)]
- ↑ File:ThePowerOfPS2.pdf, page 6
- ↑ File:SH-4 Next-Generation DSP Architecture.pdf, page 12
- ↑ File:ThePowerOfPS2.pdf, page 12
- ↑ 26.0 26.1 26.2 26.3 26.4 26.5 Design of Digital Systems and Devices (pages 95-97)
- ↑ 27.0 27.1 File:Instruction Tables.pdf, page 107
- ↑ File:Instruction Tables.pdf, page 110
- ↑ 29.0 29.1 Procedural Rendering on Playstation 2 (page 4) (Gamasutra)
- ↑ 30.0 30.1 File:ThePowerOfPS2.pdf, page 4
- ↑ 31.0 31.1 htt (Wayback Machine: 2007-08-07 15:12)
- ↑ 32.0 32.1 htt (Wayback Machine: 1999-04-28 17:17)
- ↑ 33.0 33.1 33.2 33.3 AGP Peak Speeds
- ↑ PC 3D Graphics Accelerators FAQ: VideoLogic PowerVR
- ↑ Hardware Behind the Consoles - Part I: Microsoft's Xbox (Understanding the Hardware – The X-CPU) (AnandTech)