Skip to main content
Figure 2 | Computational Astrophysics and Cosmology

Figure 2

From: Sapporo2: a versatile direct N-body library

Figure 2

Performance for different numbers of active sink particles. The x-axis indicates the number of active particles and the y-axis the required time to compute the gravitational force using 131,072 source particles (\(N_{\mathrm{active}} \times N\) gravity computations). The presented time only includes the time required to compute the gravity, the data transfer times are not included. In both panels the linear striped line shows the ideal scaling from the most optimal configuration with 256 active particles to the worst case situation with 1 active particle for one of the shown devices. The left panel shows the effect on the performance when using 1D thread-blocks instead of 2D on AMD and NVIDIA hardware. It also we shows the effect of using OpenCL instead of CUDA on NVIDIA hardware. When using 1D thread-blocks the GPU becomes underutilized when \(N_{\mathrm{active}}\) becomes smaller than 128. This is visible as the execution time increases while \(N_{\mathrm{active}}\) becomes smaller. The right panel compares the performance of the five different GPUs as indicated. Furthermore, it shows that the performance of Sapporo2 is comparable to that of Sapporo1.

Back to article page