The controller itself hooks up to the main processor via a four-lane PCI Express 4.0 interconnect, and contains a number of bespoke hardware blocks designed to eliminate SSD bottlenecks. The system has six priority levels, meaning that developers can literally prioritise the delivery of data according to the game's needs.
The controller supports hardware decompression for the industry-standard ZLIB, but also the new Kraken format from RAD Game Tools, which offers an additional 10 per cent of compression efficiency. The bottom line? 5.5GBs of bandwidth translates into an effective eight or nine gigabytes per second fed into the system. "By the way, in terms of performance, that custom decompressor equates to nine of our Zen 2 cores, that's what it would take to decompress the Kraken stream with a conventional CPU," Cerny reveals.
A dedicated DMA controller (equivalent to one or two Zen 2 cores in performance terms) directs data to where it needs to be, while two dedicated, custom processors handle I/O and memory mapping. On top of that, coherency engines operate as housekeepers of sorts.
"Coherency comes up in a lot of places, probably the biggest coherency issue is stale data in the GPU caches," explains Cerny in his presentation. "Flushing all the GPU caches whenever the SSD is read is an unattractive option - it could really hurt the GPU performance - so we've implemented a gentler way of doing things, where the coherency engines inform the GPU of the overwritten address ranges and custom scrubbers in several dozen GPU caches do pinpoint evictions of just those address ranges."
All of this is delivered to developers without them needing to do anything. Even the decompression is taken care of by the custom silicon. "You just indicate what data you'd like to read from your original, uncompressed file, and where you'd like to put it, and the whole process of loading it happens invisibly to you and at very high speed," Cerny explains.