NVIDIA’s Upcoming GK110 GPU With Hyper-Q To More Than Double HPC Performance

By

NVIDIA has more than doubled High Performance Computing (HPC) performance in its upcoming Kepler architecture-based GK110 GPU due to a major new feature: Hyper-Q.

HPC performance can at times be limited with the previous Fermi architecture-based GF110 GPU, because the highly parallel GPU cannot be kept fully fed with data at all times from the CPU, leading to bottlenecks due to false dependencies between serialized work blocks, which causes slowdowns. However, Hyper-Q completely removes this bottleneck as NVIDIA explains, “Hyper-Q enables multiple CPU cores to launch work on a single GPU simultaneously, thereby dramatically increasing GPU utilization and slashing CPU idle times. This feature increases the total number of connections between the host and the Kepler GK110 GPU by allowing 32 simultaneous, hardware managed connections, compared to the single connection available with Fermi.


Hyper-Q is a flexible solution that allows connections for both CUDA streams and Message Passing Interface (MPI) processes, or even threads from within a process. Existing applications that were previously limited by false dependencies can see up to a 32x performance increase without changing any existing code.”

NVIDIA have benchmarked the kind of speed-up possible by using the traditionally difficult code for GPUs, called CP2K, a popular MPI-based molecular simulation code. Hyper-Q maximizes GPU utilization for the CP2K application, resulting in more than double the performance compared to running the same code without it. The graph below shows a dramatic performance boost, in the order of 2.5 times:


GK104’s “Big Brother” GK110 GPU will appear in NVIDIA’s Tesla K20 accelerator “by the end of the year”. It would be awesome if this GPU were to be built into a gaming graphics card, wouldn’t it? Let’s hope NVIDIA do this, regardless of price, as I’m sure the extreme performance would be worth it. Full details of the new GPU, along with Hyper-Q and its performance benefits are available at the link below, a blog entry by NVIDIA’s Peter Messmer, Senior devtech engineer.

CP2K is a widely used atomic and molecular simulation code that runs at many of the world’s supercomputing sites. CP2K is parallelized using MPI and OpenMP, and CUDA is used in some models where GPUs are targeted.

With Fermi-based GPUs, developers actually experienced reduced performance gains when MPI processes were limited to small amounts of work, particularly in strong scaling simulations. While the CPU was highly utilized, the GPU stayed completely inactive in substantial portions of the simulation.

Comments are closed.