@ProjectPhysX @fclc @chipsandcheese We actually do have data for an optimized MI300X run for FP32 where we set the grid resolution to 1188 x 1188 x 1188 and we got Peak MLUPs of 24299 and a bandwidth of 3718GB/s.....
But we didn't run with those numbers because we didn't optimize the Nvidia runs at all.... so it wouldn't have been a fair comparison.....