Chundoong A Low-Cost Energy-Efficient Heterogeneous Supercomputer
Chundoong is a heterogeneous CPU/GPU supercomputer designed and built by the THUNDER Research Group at Seoul National University and ManyCoreSoft in October 2012. It is used to evaluate parallel programming models (SnuCL), software techniques, and applications developed in Seoul National University.
After porting the double precision LINPACK (HPL) to Chundoong (written in MPI + OpenCL) and optimizing it with our software techniques for multiple GPUs, we have achieved 106.8 TFLOPS (1.907 TFLOPS per node). Chundoong is ranked 277th in the TOP500 list and 32nd in the Green500 list of November 2012. Its per-node performance is #1 among 412 clusters in TOP500 of November 2012.
The design of Chundoong is focused on achieving low cost and low power consumption. As a result, Chundoong costed only US$ 0.67 million (159 MFLOPS/dollar). Chundoong is referred as the 7th power efficient architecture in the TOP500 list of November 2012.
Chundoong is built using only commodity hardware. The detailed description of the Chundoong cluster is as follows:
- Each of 56 compute nodes contains
- 2 × 8-core Intel Xeon E5-2650 CPUs (i.e., 16 CPU cores)
- 4 × AMD Radeon HD 7970 graphics cards
- Main memory: 128 GB (1600 MHz DDR3)
- Single-port Mellanox InfiniBand QDR HCA
- Four storage nodes provide 88 TB HDD storage space
- 1 × metadata server
- 3 × object storage server
- 36-port Mellanox InfiniBand QDR switches
- 2 × level 2
- 4 × level 1
- Water cooling system for CPUs and GPUs
A self-made 4U chassis contains two compute nodes, or a single storage node with 23 HDDs. A 19-inch rack contains 8 chassis (up to 16 compute nodes), InfiniBand switches, and a pump unit.
Chundoong uses Red Hat Enterprise Linux 6.3 and Lustre 2.3.54. SnuCL and MPI libraries (Open MPI 1.6.4, MVAPICH2 1.9b) are provided to users. Chundoong also uses a self-made job scheduling and monitoring software, called thor.
GPU-Centric Design and Software Techniques
Chundoong features a higher ratio of GPUs to CPUs to fully obtain the benefits of accelerators - low cost and high energy efficiency. As a result, CPUs become a major performance bottleneck. Optimizations aimed at reducing the overhead of CPUs are applied to our HPL implementation. We are planning to incorporate the optimization techniques to SnuCL, an OpenCL framework for heterogeneous clusters, developed by our research group.
Gaming GPUs and Water Cooling System
AMD Radeon HD 7970 is a graphics card for desktop PCs, not for high performance computing. Thus, it does not have ECC memory and is not designed for high-density systems. To reduce memory errors and to improve the reliability of GPUs, Chundoong adopts a self-made water cooling system. It keeps CPUs and GPUs at a low temperature (below 50°C). Chundoong is the first supercomputer in the world that contains high-density gaming GPUs and ensures their reliability.