GPU
Now there is a bit of nuance here, as AMD’s GPU architecture is offered piecemeal: the shader cores, the memory controllers, the display controllers, etc are all separate blocks that can be mixed and matches.
This is how the PS4 Pro uses just parts of Vega. So it’s entirely possible that there are other bits and pieces in Scorpio that are newer than Polaris, however the all-important shader cores and ROP backends clearly point to Polaris.
Diving into the specs a bit deeper, we do have the clockspeeds and configurations for both the GPU and the memory. Scorpio’s GPU is a 40 CU (2,560 SP) wide design – a bit wider than the Radeon RX 480 – which is a rather extensive upgrade over the original Xbox One. Ignoring clockspeeds for the moment (more in a sec), just the CU count itself is 3.33 times the 12 CUs in the original XB1. Similarly,
Microsoft has doubled the number of ROP backends from 16 to 32. The ROP change is badly needed in order for Microsoft to reach their 4K goal, and it has been a pretty universal suspicion that the original XB1’s 16 ROPs were a big part of the reason that major multiplatform games tend to go with 900p instead of a native 1080p.
Meanwhile on the clockspeed front, the new GPU is clocked at 1172MHz, giving Microsoft 6 TFLOPS right on the dot. This is a 37% clockspeed increase over the original XB1, and a 28% increase over the XB1S, which received a slight clockspeed bump of its own.
These clockspeeds are well within the range of what the Polaris architecture can offer, and while not as conservative as Sony’s design choices, should still be reasonably power efficient, though I’m very much interested in seeing what total power consumption is like.
More importantly, combined with the much wider GPU, the impact to the various throughput metrics is staggering. Shader/texture throughput will be 4.58x the original XB1, and ROP throughput will be 2.75x. Microsoft had a very large gap to close from the original Xbox One if they wanted to do 4K, and they have certainly put together a design that is equally large to help close that gap. However with that said,
with performance that, on paper, is slightly ahead of a Radeon RX 480, I expect we’re still going to see some compromises here to consistently hit Microsoft’s 4K goal. 6 TFLOPS often isn’t enough for native 4K at current image quality levels, which means developers will have to resort to some clever optimizations or image scaling.
Memory subsystem
What makes things especially interesting though is that Microsoft didn’t just switch out DDR3 for GDDR5, but they’re using a wider memory bus as well; expanding it by 50% to 384-bits wide. Not only does this even further expand the console’s memory bandwidth – now to a total of 326GB/sec, or 4.8x the XB1’s DDR3 – but it means we have an odd mismatch between the ROP backends and the memory bus. Briefly, the ROP backends and memory bus are typically balanced 1-to-1 in a GPU, so a single memory controller will feed 1 or two ROP partitions. However in this case, we have a 384-bit bus feeding 32 ROPs, which is not a compatible mapping.
What this means is that at some level, Microsoft is running an additional memory crossbar in the SoC, which would be very similar to what AMD did back in 2012 with the Radeon HD 7970. Because the console SoC needs to split its memory bandwidth between the CPU and the GPU, things aren’t as cut and dry here as they are with discrete GPUs. But, at a high level, what we saw from the 7970 is that the extra bandwidth + crossbar setup did not offer much of a benefit over a straight-connected, lower bandwidth configuration. Accordingly, AMD has never done it again in their dGPUs. So I think it will be very interesting to see if developers can consistently consume more than 218GB/sec or so of bandwidth using the GPU.