Difference between revisions of "Sega Dreamcast/Hardware comparison"
From Sega Retro
Line 43: | Line 43: | ||
| 450 MFLOPS{{ref|1 cycle per floating-point add{{ref|1=[http://www.agner.org/optimize/instruction_tables.pdf#page=107 Instruction tables (page 107)]}}|group=n}} | | 450 MFLOPS{{ref|1 cycle per floating-point add{{ref|1=[http://www.agner.org/optimize/instruction_tables.pdf#page=107 Instruction tables (page 107)]}}|group=n}} | ||
| colspan="2" | 800 MFLOPS{{ref|Pentium III: 800 MFLOPS,{{ref|1=[https://books.google.co.uk/books?id=ZyZPAQAAMAAJ&q=pentium+iii+800+mflops ''Automatic Performance Tuning of Sparse Matrix Kernels'', Volume 1, page 14]}}{{ref|1=[https://books.google.co.uk/books?id=Zi8lBAAAQBAJ&pg=PA9 ''Cluster Computing'', page 9]}} 1 cycle per floating-point add or multiply{{ref|1=[http://www.agner.org/optimize/instruction_tables.pdf#page=107 Instruction tables (page 107)]}} <br> GeForce 256: Outperformed by Pentium III (742 MHz){{ref|[https://www.beyond3d.com/content/articles/50/ Benchmarking T&L in 3DMark 2000]}}|group=n}} | | colspan="2" | 800 MFLOPS{{ref|Pentium III: 800 MFLOPS,{{ref|1=[https://books.google.co.uk/books?id=ZyZPAQAAMAAJ&q=pentium+iii+800+mflops ''Automatic Performance Tuning of Sparse Matrix Kernels'', Volume 1, page 14]}}{{ref|1=[https://books.google.co.uk/books?id=Zi8lBAAAQBAJ&pg=PA9 ''Cluster Computing'', page 9]}} 1 cycle per floating-point add or multiply{{ref|1=[http://www.agner.org/optimize/instruction_tables.pdf#page=107 Instruction tables (page 107)]}} <br> GeForce 256: Outperformed by Pentium III (742 MHz){{ref|[https://www.beyond3d.com/content/articles/50/ Benchmarking T&L in 3DMark 2000]}}|group=n}} | ||
− | | 6200 MFLOPS{{ | + | | 6200 MFLOPS{{fileref|ThePowerOfPS2.pdf|page=6}} |
|- | |- | ||
! colspan="2" | [[wikipedia:Multiply–accumulate operation|MAC operations]] | ! colspan="2" | [[wikipedia:Multiply–accumulate operation|MAC operations]] | ||
Line 49: | Line 49: | ||
| 150 MACs/sec{{ref|3 cycles per MAC operation (2 cycles per multiply, 1 cycle per add){{ref|1=[http://www.agner.org/optimize/instruction_tables.pdf#page=107 Instruction tables (page 107)]}}|group=n}} | | 150 MACs/sec{{ref|3 cycles per MAC operation (2 cycles per multiply, 1 cycle per add){{ref|1=[http://www.agner.org/optimize/instruction_tables.pdf#page=107 Instruction tables (page 107)]}}|group=n}} | ||
| colspan="2" | 400 MACs/sec{{ref|2 cycles per MAC operation (1 cycle per multiply, 1 cycle per add){{ref|1=[http://www.agner.org/optimize/instruction_tables.pdf#page=107 Instruction tables (page 107)]}}|group=n}} | | colspan="2" | 400 MACs/sec{{ref|2 cycles per MAC operation (1 cycle per multiply, 1 cycle per add){{ref|1=[http://www.agner.org/optimize/instruction_tables.pdf#page=107 Instruction tables (page 107)]}}|group=n}} | ||
− | | 1179 MACs/sec{{ref|4 MAC operations per cycle{{ | + | | 1179 MACs/sec{{ref|4 MAC operations per cycle{{fileref|ThePowerOfPS2.pdf|page=6}}|group=n}} |
|- | |- | ||
! colspan="2" | Matrix transformations | ! colspan="2" | Matrix transformations | ||
Line 55: | Line 55: | ||
| 10 million vertices/s{{ref|44 cycles per matrix transformation (16 multiplies, 12 adds){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (page 95)]}}|group=n}} | | 10 million vertices/s{{ref|44 cycles per matrix transformation (16 multiplies, 12 adds){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (page 95)]}}|group=n}} | ||
| colspan="2" | 28 million vertices/s{{ref|28 cycles per matrix transformation (16 multiplies, 12 adds){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (page 95)]}}|group=n}} | | colspan="2" | 28 million vertices/s{{ref|28 cycles per matrix transformation (16 multiplies, 12 adds){{ref|1=[https://books.google.co.uk/books?id=iAvHt5RCHbMC&pg=PA95 ''Design of Digital Systems and Devices'' (page 95)]}}|group=n}} | ||
− | | 73 million vertices/s{{ref|4 cycles per matrix transformation{{ | + | | 73 million vertices/s{{ref|4 cycles per matrix transformation{{fileref|ThePowerOfPS2.pdf|page=12}}|group=n}} |
|- | |- | ||
! colspan="2" | [[wikipedia:Transform, clipping, and lighting|Transform/Lighting]] computations | ! colspan="2" | [[wikipedia:Transform, clipping, and lighting|Transform/Lighting]] computations | ||
Line 109: | Line 109: | ||
| 533 MB/s{{ref|2x AGP bus{{ref|[http://www.playtool.com/pages/agpcompat/agp.html AGP Peak Speeds]}}|group=n}} | | 533 MB/s{{ref|2x AGP bus{{ref|[http://www.playtool.com/pages/agpcompat/agp.html AGP Peak Speeds]}}|group=n}} | ||
| 1 [[Byte|GB/s]]{{ref|Transmission bus from Pentium III 800EB (133 MHz [[wikipedia:Front-side bus|FSB]], 1 GB/s) to GeForce 256 (4x AGP){{ref|[http://www.playtool.com/pages/agpcompat/agp.html AGP Peak Speeds]}}|group=n}} | | 1 [[Byte|GB/s]]{{ref|Transmission bus from Pentium III 800EB (133 MHz [[wikipedia:Front-side bus|FSB]], 1 GB/s) to GeForce 256 (4x AGP){{ref|[http://www.playtool.com/pages/agpcompat/agp.html AGP Peak Speeds]}}|group=n}} | ||
− | | rowspan="2" | 1.2 GB/s{{ | + | | rowspan="2" | 1.2 GB/s{{fileref|ThePowerOfPS2.pdf|page=4}} |
|- | |- | ||
! Effective texture <br> bandwidth | ! Effective texture <br> bandwidth |
Revision as of 01:33, 27 November 2016
This article needs cleanup. This article needs to be edited to conform to a higher standard of article quality. After the article has been cleaned up, you may remove this message. For help, see the How to Edit a Page article. |
Contents
Vs. PC
The Sega Dreamcast's PowerVR CLX2 GPU was the basis for the PowerVR PMX1, a PC GPU released with the Neon 250 graphics card in 1999. However, the Neon 250 lacks many of the tiled rendering features of the CLX2: the tile size is halved from 32×32 pixels to 32×16 pixels (halving the fillrate), it lacks the CLX2's internal Z-buffering and alpha test capability with hardware front-to-back translucency sorting (further reducing the fillrate and performance, as well as requiring the Neon 250 to render a Z-buffer externally), and the tiling is partially handled by software (the CLX2 handles the tiling entirely in hardware). The Neon 250 also lacks the CLX2's latency buffering and palettized texture support while VQ texture compression performance is halved, and it has bus contention due to having a single data bus (whereas the CLX2 has two data buses). The PowerVR2 was also optimized for the Hitachi SH-4's geometry processing capabilities (rather than for a Pentium II or III), while PC drivers and software were not optimized for the Neon 250's tiled rendering architecture (compared to Dreamcast games which were optimized for the CLX2's tiled rendering architecture). The Neon 250 thus had only a fraction of the Dreamcast CLX2's fillrate and rendering performance. The reduction in performance from the Dreamacst's CLX2 to the Neon 250 was comparable to the reduction in performance from the Sega Model 3's Real3D Pro-1000 to the Intel740.
The Dreamcast was generally the most powerful home system during 1998–1999, outperforming high-end PC hardware at the time.[1] The Dreamcast's Hitachi SH-4 CPU calculates 3D graphics four times faster than a Pentium II from 1998,[1] and faster than a Pentium III and NVIDIA GeForce 256 from 1999. The Dreamcast's PowerVR CLX2 GPU, due to its tiled rendering architecture, also has has a higher fillrate and faster polygon rendering throughput than a Voodoo3 and GeForce 256 from 1999.
The Dreamcast's CPU–GPU transmission bus is faster than the Voodoo3 and has a higher effective bandwidth than the GeForce 256 due to the Dreamcast's efficient bandwidth usage, including its lack of CPU overhead from the operating system and the CLX2's tiled rendering architecture: textures loaded directly to VRAM (freeing up CPU–GPU transmission bus for polygons), higher texture compression, on-chip tile buffer with internal Z-buffering, and deferred rendering (no need to draw, shade or texture overdrawn polygons). The CLX2 is also capable of order-independent transparency (which the Voodoo3 and GeForce 256 lacked) and Dot3 normal mapping (which the Voodoo3 lacked).[2]
In terms of game engine performance, the CLX2 peaks at 5 million polygons/s,[3] compared to the GeForce 256 which peaks at 2.9 million polygons/s.[4] Dreamcast game engines rendered 50,000–166,666 polygons per scene (3–5 million polygons/s),[3] while PC game engines of 1999 rendered up to 10,000 polygons per scene[5][6] (1–1.6 million polygons/s).[7] Character models in particular were significantly more detailed in Dreamcast games than in PC games during 1998–1999.[8]
Vs. PlayStation 2
Compared to the rival PlayStation 2, the Dreamcast is better at textures, anti-aliasing, and image quality, while the PS2 is better at polygon geometry, particles, and lighting. The PS2 has a more powerful CPU geometry engine, higher translucent fillrate, and more main RAM (32 MB, compared to Dreamcast's 16 MB), while the DC has more VRAM (8 MB, compared to PS2's 4 MB), higher opaque fillrate, and more GPU hardware features, with CLX2 capabilities like tiled rendering, super-sample anti-aliasing, Dot3 normal mapping, order-independent transparency, and texture compression, which the PS2's GPU lacks.
With larger VRAM and tiled rendering, the DC can render a larger framebuffer at higher native resolution (with an on-chip Z-buffer), and with texture compression, it can compress around 20–60 MB of texture data in its VRAM. Because the PS2 has only 4 MB VRAM, it relies on the main RAM to store textures. While the PS2's CPU–GPU transmission bus for transferring polygons and textures is 50% faster than the Dreamcast's CPU–GPU transmission bus, the DC has textures loaded directly to VRAM (freeing up the CPU–GPU transmission bus for polygons) and texture compression gives it higher effective texture bandwidth.
Dreamcast games were effectively using 20–30 MB of texture data[9] (compressed to around 5–6 MB),[10] while PS2 games up until 2003 peaked at 5.5 MB of texture data (average 1.5 MB). PS2 games up until 2003 rendered up to 7.5 million polygons/s (145,000 polygons per scene), with most rendering 2–5 million polygons/s (average 52,000 polygons per scene);[11] in comparison, Dreamcast game engines rendered up to 5 million polygons/s (166,666 polygons per scene), with most games rendering 2–4 million polygons/s (average 50,000 polygons per scene).[3]
The Dreamcast is more user-friendly for developers, making it easier to develop for, while the PS2 is more difficult to develop for; this is the reverse of the 32-bit era, when the PlayStation was more user-friendly, and the Saturn more difficult, for developers.
Vs. GameCube and Xbox
The Xbox and GameCube were both more powerful than the Dreamcast, but the Dreamcast had several hardware advantages. The Dreamcast has a higher opaque fillrate than the GameCube and Xbox (both under 1 GPixels/s). The Dreamcast's opaque/translucent fillrate was comparable to the Xbox's practical fillrate (250-700 MPixels/s), but lower than the GameCube's fillrate (648-800 MPixels/s).[12] The Dreamcast's SH-4 CPU has a faster floating-point performance than the Xbox's PIII-based CPU (733 MFLOPS), but lower than the GameCube CPU's floating-point performance (1.9 GFLOPS). However, the GameCube and Xbox have T&L GPU with floating-point capabilities, giving both faster floating-point performance than the Dreamcast.
Graphics comparison
- See Sega Dreamcast technical specifications for more technical details on Dreamcast hardware
System | Sega Dreamcast (1998) | PC (1998) | PC (1999) | Sony PlayStation 2 (2000) | ||
---|---|---|---|---|---|---|
Geometry processors | Hitachi SH-4 (200 MHz) |
Intel Pentium II (450 MHz) |
Intel Pentium III 800EB (800 MHz), NVIDIA GeForce 256 (120 MHz) |
Sony-Toshiba Emotion Engine (294.912 MHz) | ||
Floating-point operations | 1400 MFLOPS[n 1] | 450 MFLOPS[n 2] | 800 MFLOPS[n 3] | 6200 MFLOPS[20] | ||
MAC operations | 600 MACs/sec[n 4] | 150 MACs/sec[n 5] | 400 MACs/sec[n 6] | 1179 MACs/sec[n 7] | ||
Matrix transformations | 50 million vertices/s[n 8] | 10 million vertices/s[n 9] | 28 million vertices/s[n 10] | 73 million vertices/s[n 11] | ||
Transform/Lighting computations | 13 million polygons/s | 2.1 million polygons/s[n 12] | 7.2 million polygons/s[n 13] | 38 million polygons/s | ||
Rendering processor | NEC-VideoLogic PowerVR CLX2 (100 MHz) |
3dfx Voodoo Banshee (100 MHz) |
3dfx Voodoo3 3500 TV SE (200 MHz) |
NVIDIA GeForce 256 (120 MHz) |
Sony Graphics Synthesizer (147.456 MHz) | |
Tiled rendering calculations | 200 MFLOPS | N/A | N/A | N/A | N/A | |
Rendering fillrate |
Opaque polygons | 3200 megapixels/s[13] | 100 megapixels/s | 200 megapixels/s | 480 megapixels/s | 2359 megapixels/s[n 14] |
Opaque/Translucent polygons |
500 megapixels/s[24] | |||||
Texture mapping |
Texture fillrate | 3200 megatexels/s (opaque), 500 megatexels/s (opaque/translucent) |
100 megatexels/s | 400 megatexels/s | 480 megatexels/s | 1200 megatexels/s |
Texture compression | 8:1 (VQ) | 1:1 (N/A) | 4:1 (FXT1) | 6:1 (S3TC) | 1:1 (N/A) | |
CPU–GPU transmission bus |
Bandwidth | 800 MB/s[13] | 267 MB/s[n 15] | 533 MB/s[n 16] | 1 GB/s[n 17] | 1.2 GB/s[26] |
Effective texture bandwidth |
6.4 GB/s | 267 MB/s | 2 GB/s | 6 GB/s | ||
Polygon rendering |
100-pixel polygons | 7.1 million polygons/s (opaque), 5 million polygons/s (opaque/translucent) |
1 million polygons/s | 2 million polygons/s | 4.8 million polygons/s | 23 million polygons/s (flat), 11 million polygons/s (textured) |
500-pixel polygons | 6.4 million polygons/s (opaque), 1 million polygons/s (opaque/translucent) |
200,000 polygons/s | 400,000 polygons/s | 960,000 polygons/s | 4.7 million polygons/s (flat), 2.3 million polygons/s (textured) |
Notes
- ↑ [1.4 GFLOPS,[13][14] 7 floating-point operations per cycle (28 floating-point operations per 4 cycles)[15] 1.4 GFLOPS,[13][14] 7 floating-point operations per cycle (28 floating-point operations per 4 cycles)[15]] (Wayback Machine: 2000-08-23 20:47)
- ↑ [1 cycle per floating-point add[16] 1 cycle per floating-point add[16]]
- ↑ [Pentium III: 800 MFLOPS,[17][18] 1 cycle per floating-point add or multiply[16]
GeForce 256: Outperformed by Pentium III (742 MHz)[19] Pentium III: 800 MFLOPS,[17][18] 1 cycle per floating-point add or multiply[16]
GeForce 256: Outperformed by Pentium III (742 MHz)[19]] - ↑ [3 MAC operations per cycle (12 MAC operations per 4 cycles)[15] 3 MAC operations per cycle (12 MAC operations per 4 cycles)[15]]
- ↑ [3 cycles per MAC operation (2 cycles per multiply, 1 cycle per add)[16] 3 cycles per MAC operation (2 cycles per multiply, 1 cycle per add)[16]]
- ↑ [2 cycles per MAC operation (1 cycle per multiply, 1 cycle per add)[16] 2 cycles per MAC operation (1 cycle per multiply, 1 cycle per add)[16]]
- ↑ [4 MAC operations per cycle[20] 4 MAC operations per cycle[20]]
- ↑ [4 cycles per matrix transformation[15] 4 cycles per matrix transformation[15]]
- ↑ [44 cycles per matrix transformation (16 multiplies, 12 adds)[21] 44 cycles per matrix transformation (16 multiplies, 12 adds)[21]]
- ↑ [28 cycles per matrix transformation (16 multiplies, 12 adds)[21] 28 cycles per matrix transformation (16 multiplies, 12 adds)[21]]
- ↑ [4 cycles per matrix transformation[22] 4 cycles per matrix transformation[22]]
- ↑ [214 cycles per vertex (39 multiplies, 25 adds, 3 divides),[23] 2 cycles per multiply, 1 cycle per add, 37 cycles per divide[16] 214 cycles per vertex (39 multiplies, 25 adds, 3 divides),[23] 2 cycles per multiply, 1 cycle per add, 37 cycles per divide[16]]
- ↑ [Pentium III (742 MHz) calculates 6,752,000 triangle strips per second, faster than GeForce 256's T&L unit[19] Pentium III (742 MHz) calculates 6,752,000 triangle strips per second, faster than GeForce 256's T&L unit[19]]
- ↑ [16 pixel pipelines 16 pixel pipelines]
- ↑ [1x AGP bus[25] 1x AGP bus[25]]
- ↑ [2x AGP bus[25] 2x AGP bus[25]]
- ↑ [Transmission bus from Pentium III 800EB (133 MHz FSB, 1 GB/s) to GeForce 256 (4x AGP)[25] Transmission bus from Pentium III 800EB (133 MHz FSB, 1 GB/s) to GeForce 256 (4x AGP)[25]]
References
- ↑ 1.0 1.1 File:GamersRepublic US 03.pdf, page 29
- ↑ [PC Magazine, December 1999, page 193 PC Magazine, December 1999, page 193]
- ↑ 3.0 3.1 3.2 Test Drive: Le Mans (IGN)
- ↑ Actual HW T&L perfomance of NVIDIA GeForce/GeForce2 chips (IXBT Labs)
- ↑ [PC Magazine, December 1999, page 203 PC Magazine, December 1999, page 203]
- ↑ Unreal Modeling Guide (Unreal Developer Network)
- ↑ '95-'99 PC Comparisons
- ↑ DF Retro: Shenmue - A Game Ahead Of Its Time (Digital Foundry)
- ↑ Hideki Sato Sega Interview (Edge)
- ↑ How Many Polygons Can the Dreamcast Render?
- ↑ Reaching for the Limits of PS2 Performance: How Far Have We Got? (2003) (SCEE) (Wayback Machine: 2003-12-10 07:46)
- ↑ Graphics Processor Specifications (IGN) (Wayback Machine: 2001-03-31 05:05)
- ↑ 13.0 13.1 13.2 Sega Dreamcast: Implementation (IEEE) (Wayback Machine: 2000-08-23 20:47)
- ↑ File:SH-4 Next-Generation DSP Architecture.pdf, page 5
- ↑ 15.0 15.1 15.2 File:SH-4 Next-Generation DSP Architecture.pdf, page 12
- ↑ 16.0 16.1 16.2 16.3 16.4 Instruction tables (page 107)
- ↑ Automatic Performance Tuning of Sparse Matrix Kernels, Volume 1, page 14
- ↑ Cluster Computing, page 9
- ↑ 19.0 19.1 Benchmarking T&L in 3DMark 2000
- ↑ 20.0 20.1 File:ThePowerOfPS2.pdf, page 6
- ↑ 21.0 21.1 Design of Digital Systems and Devices (page 95)
- ↑ File:ThePowerOfPS2.pdf, page 12
- ↑ Design of Digital Systems and Devices (pages 95-97)
- ↑ File:Edge UK 067.pdf, page 11
- ↑ 25.0 25.1 25.2 AGP Peak Speeds
- ↑ File:ThePowerOfPS2.pdf, page 4