A Low-Cost Energy-Efficient Heterogeneous Supercomputer.
A Runtime for Heterogeneous Accelerator Clusters with CUDA Unified Memory
An OpenCL framework for FPGAs.
A CUDA framework that hides host-to-device memory copy time by exploiting CUDA Unified Memory and fault mechanisms.
The NASA Parallel Benchmark suite ported in C, OpenMP C, and OpenCL for both a single compute device and multiple compute devices.
An OpenCL framework for a heterogeneous cluster that consists of nodes with multiple heterogeneous devices (e.g., multicore CPUs, AMD GPUs, NVIDIA GPUs, and Intel Xeon Phi coprocessors).
An OpenCL framework for ARM CPUs, IBM Cell BE processors, and TI DSPs.
A lock-free and mostly synchronization-free dynamic memory allocator for manycores.
Fast and cycle-accurate ARM architecture simulator.
Coherent shared memory interface (runtime) for IBM Cell BE processors.