What I would like for IceLake and Zen2... but is not happening

AMD and Intel have their own plans for their forthcoming products, and we have to accept their plans and purchase future hardware choosing among what both companies will offer us in the next years. My vision is different, as I wrote in a pair of recent tweets. This is what I would like.


AVX512 hardware occupies too much space: register files, caches, datapaths, and big execution units. AVX512 is important in HPC and servers, but it is less important gamers and mainstream customers. Even in HPC/servers, I don't think that wide SIMD is the optimal design point because of the divergence problem.

Personally I would eliminate both 512bit and 256bit hardware and make a more narrow core. The core I propose would measure about 4mm2 on Intel 10nm. This is about one half the size of Skylake client core. This thinning of the core has two advantages. First, the ability to get higher clocks, because the max frequency achievable by a core is a function of its size; do not expect miracles but some hundred of MHz extra are welcomed. Second, the possibility of adding more cores in the same die size; that is in the same cost and power slots. Who do not prefer more cores for general code rather than AVX512 hardware working only in half a dozen of customers workloads and zero games?

I would also like to see iGPU optional. Currently the mainstream client platform from Intel includes integrated graphics, whereas the enthusiast platform does not include one. Give people more choice! Let graphics be optional on both platforms. And for reducing costs use a multidie approach around EMIB. Below I add a illustration of the concept. This is a dual-die configuration. The SoC on top is made of two CPU dies for a total of eight CPU cores. The SoC at bottom has half the CPU cores replaced by three GPU cores.


AMD Zen core already lacks 256bit and 512bit SIMD units, but it comes on a weird 4-core CCX complex configuration with disjoint L3 caches. I would like both CCX in the Zeppelin die to be combined into a single module with eight cores sharing unified L3 cache.