June 16-20, 2013

Leipzig, Germany

Presentation Details

Name: Lattice QCD on Intel Xeon Phi Coprocessors
Time: Monday, June 17, 2013
2:30 PM - 3:00 PM
Room:   Hall 5
CCL - Congress Center Leipzig
Breaks:3:00 PM - 4:00 PM Coffee Break
Speakers:   Balint Joo, Jefferson Lab
Abstract:   Lattice Quantum Chromodynamics(LQCD) is currently the only known model independent, non perturbative computational method for calculations in the theory of the strong interactions, and is of importance in studies of nuclear and high energy physics. LQCD codes use large fractions of supercomputing cycles worldwide and are often amongst the first to be ported to new high performance computing architectures. The recently released Intel Xeon Phi architecture from Intel Corporation features parallelism at the level of many x86-based cores, multiple threads per core, and vector processing units. In this contribution, we describe our experiences with optimizing a key LQCD kernel for the Xeon Phi architecture. On a single node, using single precision, our Dslash kernel sustains a performance of up to 320 GFLOPS, while our Conjugate Gradients solver sus- tains up to 237 GFLOPS. Furthermore we demonstrate a fully ’native’ multi-node LQCD implementation running entirely on KNC nodes with minimum involvement of the host CPU. Our multi-node implementation of the solver has been strong scaled to 3.9 TFLOPS on 32 KNCs.

Paper authors:
Balint Joo1, Dhiraj D. Kalamkar2, Karthikeyan Vaidyanathan2, Mikhail Smelyanskiy3, Kiran Pamnany2, Victor W Lee3, Pradeep Dubey3, William Watson III1
1Thomas Jefferson National Accelerator Facility, USA
2Parallel Computing Lab, Intel Corporation, India
3Parallel Computing Lab, Intel Corporation, USA
Program may be subject to changes.