From: Sapporo2: a versatile direct N-body library

Time required to solve \(\pmb{N^{2}}\) force computations using different configurations. In both panels the number of source particles is equal to the number of sink particles, which is indicated on the x-axis. The y-axis indicates the required wall-clock time to execute the gravity computation and to perform the data transfers. Unless otherwise indicated we use CUDA for the NVIDIA devices. The left panel shows the performance of Sapporo1 on a K20m GPU and Sapporo2 on 5 different GPUs using a mixture of CUDA and OpenCL. The straight solid line indicates \(N^{2}\) scaling. The right panel shows the difference in performance between double-single and double precision. We show the performance for three different devices. The double-single timings are indicated by the filled symbols. The double-precision performance numbers are indicated by the lines with the open symbols. The straight solid line indicates \(N^{2}\) scaling.

