From: Sapporo2: a versatile direct N-body library

Performance for different thread-block configurations. The figure shows the required integration time (y-axis) for \(N=131{,}072\) source particles using different number of sink particles (number of threads, x-axis). Each line indicates a different configuration. In each configuration we changed the number of blocks launched per GPU multi-processor for different GPU architectures. Shown in panel (a) NVIDIA’s Fermi architecture, in panel (b) the NVIDIA Kepler, GK104 architecture in panel (c) the NVIDIA Kepler, GK110 and the AMD Tahiti architecture in panel (d). The AMD architectures are limited to 256 threads. The configurations that we have chosen as our default settings for the number of blocks are the lines with the filled circle markers.

