- Research
- Open Access
On the parallelization of stellar evolution codes
- David Martin^{1, 2},
- Jordi José^{1, 2}Email authorView ORCID ID profile and
- Richard Longland^{3}
https://doi.org/10.1186/s40668-018-0025-5
© The Author(s) 2018
- Received: 28 May 2018
- Accepted: 8 November 2018
- Published: 16 November 2018
Abstract
Multidimensional nucleosynthesis studies with hundreds of nuclei linked through thousands of nuclear processes are still computationally prohibitive. To date, most nucleosynthesis studies rely either on hydrostatic/hydrodynamic simulations in spherical symmetry, or on post-processing simulations using temperature and density versus time profiles directly linked to huge nuclear reaction networks.
Parallel computing has been regarded as the main permitting factor of computationally intensive simulations. This paper explores the different pros and cons in the parallelization of stellar codes, providing recommendations on when and how parallelization may help in improving the performance of a code for astrophysical applications.
We report on different parallelization strategies succesfully applied to the spherically symmetric, Lagrangian, implicit hydrodynamic code SHIVA, extensively used in the modeling of classical novae and type I X-ray bursts.
When only matrix build-up and inversion processes in the nucleosynthesis subroutines are parallelized (a suitable approach for post-processing calculations), the huge amount of time spent on communications between cores, together with the small problem size (limited by the number of isotopes of the nuclear network), result in a much worse performance of the parallel application compared to the 1-core, sequential version of the code. Parallelization of the matrix build-up and inversion processes in the nucleosynthesis subroutines is not recommended unless the number of isotopes adopted largely exceeds 10,000.
In sharp contrast, speed-up factors of 26 and 35 have been obtained with a parallelized version of SHIVA, in a 200-shell simulation of a type I X-ray burst carried out with two nuclear reaction networks: a reduced one, consisting of 324 isotopes and 1392 reactions, and a more extended network with 606 nuclides and 3551 nuclear interactions. Maximum speed-ups of ∼41 (324-isotope network) and ∼85 (606-isotope network), are also predicted for 200 cores, stressing that the number of shells of the computational domain constitutes an effective upper limit for the maximum number of cores that could be used in a parallel application.
Keywords
- Numerical methods
- Hydrodynamics
- Parallel computing
- Nuclear reactions
- Nucleosynthesis
- Abundances
- Stellar evolution
- Stellar explosions: classical novae
- Stellar explosions: X-ray bursts
1 Introduction
Computational astrophysics has revolutionized our knowledge of the physics of stars. Simultaneously to the progress achieved in observational astrophysics (through high-resolution spectroscopy and photometry, sometimes including multiwavelength observations with space-borne and ground-based observatories), cosmochemistry (isotopic abundance determinations in presolar meteoritic grains) and nuclear physics (determination of nuclear cross sections at or close to stellar energies), computers have provided astrophysicists with the appropriate arena in which complex physical processes operating in stars (e.g., rotation, convection and mixing, mass loss...) can be properly modeled (see, e.g., Ref. Bodenheimer et al. 2006).
Stellar evolution models are becoming increasingly sophisticated and complex. The dawn of supercomputing and multi-core machines has allowed to (partially) overcome the limitations imposed by the assumption of spherical symmetry. The pay-off, however, is still very expensive. Two-, and specially three-dimensional simulations are so computationally demanding that other simplifications, such as the use of truncated nuclear reaction networks, large enough to account for the energetics of the star, must be adopted. Multidimensional nucleosynthesis studies with hundreds of nuclear species linked through thousands of nuclear processes are still prohibitive. Accordingly, most of our understanding of element synthesis in stars relies either on hydrostatic/hydrodynamic simulations in spherical symmetry (1D), or on post-processing simulations using temperature and density versus time profiles extracted from stellar evolution models, and directly linked to huge nuclear reaction networks. Even such post-processing calculations can sometimes become computationally very intensive: for instance, the sensitivity study of the effect of nuclear uncertainties in X-ray bursts nucleosynthesis performed by Parikh et al. (2008), requiring 50,000 post-processing calculations, with a network containing 600 species (from H to ^{113}Xe), and more than 3500 nuclear reactions, took about 9 CPU months in a single-core computer.
In the 1D codes used in the modeling of a wide range of astrophysical scenarios, such as classical novae, X-ray bursts, supernovae, or asymptotic giant branch (AGB) stars (e.g., FRANEC Limongi and Chieffi 2003, Chieffi and Limongi 2013, MESA Paxton et al. 2011, 2013, SHIVA José and Hernanz 1998, José 2016), stars are divided into \({\sim}100\text{s} - 1000\text{s}\) of concentric shells. They also incorporate a similar number of nuclear processes, which link hundreds of nuclear species. The subroutines that handle the suite of different nuclear processes and the associated nucleosynthesis are often the most time-consuming components of a stellar evolution code (unless very small nuclear reaction networks are used). Different strategies have been adopted to reduce the computational cost of such simulations, therefore improving the performance of a code. One possibility relies on the use of more efficient numerical techniques to handle integration of large nuclear networks (Timmes 1999, Longland et al. 2014). Another possibility involves parallelization of the stellar code, so that the high computational cost can be split and handled by different cores working cooperatively.
Parallel computing has been regarded as the main permitting factor of more precise, computationally intensive simulations. Indeed, most of the existing multidimensional stellar codes have been parallelized. Naively, parallelization simply relies on applying several cores to the solution of a single problem, so that speed-ups are accomplished by executing independent, non-sequentional portions of the code. In practice, however, parallelization comes with a high cost in both engineering and programming efforts. And on top of that, it may turn out that parallelization does not pay off altogether, for specific applications. Therefore, the main goal of this paper is to explore the advantages (and disadvantages) associated with the parallelization of stellar codes, outlining recommendations on when and how parallelization may help in improving the performance of a code for astrophysical applications. We discuss speed-up factors ranging between 26 and 35 that allow the execution of hydrodynamic simulations coupled to large nuclear reaction networks in affordable times.
The structure of this paper is as follows: different strategies in the parallelization of a stellar evolution code (and of the matrix build-up and inversion processes in the nucleosynthesis subroutines) are described in Sects. 2 and 3. Special emphasis is devoted to the expected speed-ups obtained as a function of the size of the nuclear reaction network and the number of cores involved in the simulation. The performance of the parallelized version of SHIVA code is qualitatively compared with other codes, with similar or different architectures, in Sect. 4. The main results and conclusions of this work, together with a list of open issues, are summarized as well in Sect. 4.
2 Parallelization of a stellar code with a decoupled, time-explicit treatment of the nucleosynthesis subroutines
At each time-step, the set of 5N unknowns is determined from a system of 5N linearized equations (i.e., conservation of mass, momentum and energy, the definition of the Lagrangian velocity and an equation that accounts for energy transport), which is solved by means of an iterative technique—Henyey’s method (Henyey et al. 1964). The basic set of stellar structure equations, supplemented by a suitable equation of state (EOS, that includes radiation, ions, and electrons with different degrees of degeneracy), opacities and a nuclear reaction network, constitute the building blocks of any stellar evolution code. In SHIVA, convection and nuclear energy production are decoupled from the set of hydrodynamic equations, and handled by means of a time-explicit scheme. In general, partial differential equations involving time derivatives can be discretized in terms of variables evaluated (i.e., known) at the previous time-step (explicit schemes) or at the current time-step (implicit schemes). Explicit schemes are usually easier to implement than implicit schemes. However, in explicit schemes the time-step is limited by the Courant–Friedrichs–Levy condition that prevents any disturbance traveling at the sonic speed from traversing more than one numerical cell, thus leading to unphysical results. Implicit schemes allow larger time-steps than explicit schemes, with no precondition on the time-step, but they require an iterative procedure to solve the system at each step. In SHIVA, all compositional changes driven by nuclear processes or convective transport are evaluated at the end of the iterative procedure,^{1} once the temperature, density and the other physical variables have been determined at each computational shell. In particular, SHIVA implements a two-step, time-explicit scheme to calculate the new chemical composition at each time-step (see Ref. Wagoner 1969). While such decoupling of the nucleosynthesis subroutines from the hydrodynamic equations has a minor effect on the results, it has a huge impact on the speed-up factors that can be obtained after parallelization (see Sect. 4, for a more detailed discussion).
2.1 Parallelization strategy
A first analysis of SHIVA’s architecture suggests two main points where parallelization might be exploited: the solution of the linearized system of equations for the determination of the physical variables (i.e., Henyey’s method), and the multizone calculation of the nuclear energy generation rate and nucleosynthesis. The first one relies on the parallel solution of a system of 5N linear equations, where N is the number of shells adopted in the simulation. For a typical astrophysical application, \(N \sim100\text{--}1000\). However, as will be discussed later (see Sect. 3.3), such a parallel approach only achieves acceptable performance for ≥10,000 equations. Very modest speed-up factors are obtained otherwise (i.e., less than a factor of 2), which do not justify the effort. In contrast, the multizone calculation of nuclear energy generation and nucleosynthesis is computed independently at each shell, and can result in large speed-up factors if parallelized. This is the specific parallelization strategy adopted hereafter, and presented in this Section. Each core goes redundantly through almost all processing stages. However, with regard to the nucleosynthesis part, each core performs the computation on a non-overlapping subset of shells. After this, each core broadcasts its (partial) results, and from this stage onward, the simulation proceeds again on all cores redundantly. In this parallelization strategy adopted, there are only two points of communication: at the beginning of the simulation (where the root process broadcasts all the initial information and parameters to the rest of the processes), and repeatedly at each (successful) iteration, after the distributed computation of the nucleosynthesis has been performed. This choice maximizes parallel performance by keeping communication points to a minimum or, in other words, by maximizing the computation to communication ratio (McKenney 2011).
In order to obtain equivalent workloads on all cores, the total number of shells of the computational domain must be split up into approximately equally sized groups. The shells assigned to each core are consecutive, so that the different cores compute energy and nucleosynthesis for shells \(1 \ldots j\), \(j+1 \ldots i\), \(i+1 \ldots m\), and so on. The last core will have assigned shells \(m+1\) to N.
2.2 Performance prediction
2.3 Results
The results obtained are so good and approach the performance of a perfect parallel application; this means that the computation to communication ratio is large enough so that processing work can be distributed in an extremely efficient way amongst cores. Accordingly, larger speed-ups are expected if the number of cores used in the parallel execution is increased. Figure 2 displays as well the theoretical speed-ups expected for both simulations, as given by Eq. (1). Such theoretical estimates do not take into account the communication or synchronization times, and as a result, the observed performance always falls short compared to the theoretical, ideal speed-up.
As expected, higher speed-ups are obtained when we increase the problem size by using a nuclear reaction network with 606 isotopes and 3551 reactions (e.g., Model 2). The speed-up accomplished in this simulation exceeds by approximately 34% the performance of the execution with a reduced nuclear network (i.e., 26 versus 35 speed-up factors, respectively). This is a direct consequence of increasing the problem size, which is essentially equivalent to increasing the amount of parallelizable computation (that is, the nucleosynthesis calculation), and therefore the potential parallel content also increases (\(p = 0.99127\) for Model 1, whereas \(p = 0.99738\) for the simulation with a larger nuclear reaction network, i.e. Model 2). This, in turn, improves the curve of the modelled, theoretical speed-up, hence diminishing the gap from an ideal speed-up.
It is also important to note that the model of performance presented here is valid for the execution environment discussed, and cannot be extrapolated to other clusters which may have different latencies and communication bandwidths. That said, this model can be taken as a reference for the capabilities of a parallelized application, and can be used to decide whether access time at some supercomputing facility, where latencies and transmission bandwidths are highly optimized for parallel executions, must be requested. In those platforms, even better speed-up factors must be expected.
3 Parallelization of the nuclear energy generation and nucleosynthesis subroutines
In this section, we report on the expected speed-ups resulting from parallelization of the matrix build-up and inversion processes in the nucleosynthesis subroutines, for different sizes of the adopted nuclear reaction networks. This is a completely different parallelization approach compared to the one described in Sect. 2. In the strategy described for SHIVA, the method of solving the system of equations was not modified, but executed in parallel on a subset of non-overlapping shells. Now, it is the build-up and inversion of the matrix containing the rates of the different nuclear interactions (i.e., the solution of the system of equations) that is being parallelized. The strategy adopted in this section is of interest for stellar evolution models that rely on reasonably large nuclear reaction networks, and also for post-processing nucleosynthesis calculations, in which temperature and density versus time profiles (frequently extracted from stellar models) are directly coupled to huge nuclear networks.
3.1 Numerical treatment of nuclear abundances
Different methods have been reported to solve Eq. (5), such as Wagoner’s two-step linearization technique (Wagoner 1969), Bader–Deuflehard’s semi-implicit method (Bader and Deuflhard 1983), or Gear’s backward differentiation technique (Gear 1971). The performance of these different integration methods for stellar nucleosynthesis calculations has been been analyzed in a number of studies (see Refs. Timmes 1999, Longland et al. 2014, and references therein). Here, we will explore the gain in performance driven by parallelization od one particular method: Wagoner’s. As described in Ref. Prantzos et al. (1987), Wagoner’s two-step linearization procedure exploits the special properties of matrix A, which consists of an upper left square matrix, an upper horizontal band, a left vertical band, and a diagonal band. The sparse nature of matrix A results from the fact that the different isotopes, when ordered in terms of increasing atomic number, are only linked with close neighbors through nuclear interactions that usually involve light particles^{5} (e.g., n, p, α).
3.2 Parallelization strategy
- 1
Interpolation (calculation) of reaction rates from tables (analytic fits), for the specific temperature and density of each shell, at a given time.
- 2
Assembly of matrices \(\mathbf{X_{0}}\) and A.
- 3
Solution of Eq. (5), for the new abundances of all chemical species at each shell.
- 4
Convergence check; determination of the new time-step, Δt.
- 5
Determination of the overall nuclear energy released at each shell.
Reaction-rate determinations are partitioned amongst cores, such that at each iteration step each core performs the interpolation (calculation) of only those reactions rates that are strictly needed for the construction of the local partition of matrix A (Eq. (5)). Given a typical nuclear reaction, of the form \(i(j,k)l\), there are 8 possible combinations contributing to matrix A: \(\mathbf{A}(i,i)\), \(\mathbf{A}(i,j)\), \(\mathbf {A}(j,j)\), \(\mathbf{A}(j,i)\), \(\mathbf{A}(k,i)\), \(\mathbf{A}(k,j)\), \(\mathbf{A}(l,i)\), and \(\mathbf{A}(l,j)\), according to the linearization technique described in Ref. Wagoner (1969). The parallel solution of the system of equations is obtained using MUMPS^{6} (Amestoy et al. 2001, 2004), a widely used software for the solution of large sparse systems of linear algebraic equations, of the form \(\mathbf{A}\mathbf{x}=\mathbf {b}\), on distributed-memory (parallel) computers.
The right hand side of Eq. (5) is centralized in the root process. This requires that the complete solution from the previous iteration has to be gathered by the root process at some time during the simulation. In contrast, the solution of the system of equations is kept distributed, so that after solving the system of equations each of the cores holds a non-overlapping subset of elements of the solution (i.e., a subset of the new abundances). At this point, the solution must be exploited in its distributed form, which requires that subsequent processing stages (e.g., convergence and accuracy) must be executed independently between cores.
- 1
During the parallel solution of the system of equations (MUMPS).
- 2
Once the system of equations is solved; the distributed solution is shared amongst all cores.
- 3
To check convergence and accuracy of the solution.
- 4
To sum up energy contributions from the distributed reactions; every core computes only the energy released by a subset of reactions.
3.3 Results
Having such a loss in performance associated with the solution of the system of equations, it is compulsory to analyze whether the selection of MUMPS as a solver has been appropriate. MUMPS represents one of the few professional and supported public domain implementations of the multifrontal method. Amestoy et al. (2001) have shown that the MUMPS solver performance for large matrices is excellent. For matrices of order ≥100,000, very good speed-ups are accomplished (e.g., between 2.8 and to 3.7, with 4 cores; and between 7.1 and 10.6, with 16 cores). Note that speed-ups increase with the matrix size as the computation to communication ratio increases. For matrices of the order between 10,000 and 100,000, moderate speed-ups are accomplished with MUMPS (e.g., 2.4–3.1, with 4 cores, and 7.2–8.4, with 16 cores; Amestoy et al. 2001). Finally, not much data is available for matrices of order ≤10,000. This is due to the fact that as the problem dimension shrinks, the distributed computation time is also reduced, whilst communication time diminishes much less noticeably. Accordingly, the resulting speed-ups are dramatically reduced. For instance, Fox (2007), in solving a system with 5535 elements with the MUMPS solver, reports speed-ups of 1 (i.e., no speed-up at all) with 4 cores, and a speed-up of 1.8 for 16 cores. It seems clear that the poor performance reported in this work is mostly due to the size (order) of the nucleosynthesis matrix, too small to maximize the ratio between computation and communication times. Accordingly, the efficient parallelization of the matrix build-up and inversion processes in the nucleosynthesis subroutines is therefore not possible, unless ≥ 10,000 nuclear interactions are included.
4 Conclusions
This paper reports on several parallelization strategies that can be applied to stellar evolution codes, providing recommendations on when and how parallelization may help in improving the performance of a code for astrophysical applications. Parallelization frequently forces to think about a program in new ways and may virtually require partial or total rewriting of the serial code. It is therefore important to understand the potential benefits and risks beforehand, since sometimes parallelized codes may perform even worse than their sequential counterparts.
To this end, two different parallelization strategies have been reported in this work. With regard to the nucleosynthesis part, efforts have focused on the parallelization of the solution of the system of equations (that is, the build-up and inversion of the matrix containing the rates of the different nuclear interactions). In Wagoner’s two-step linearization technique, the integration method for stellar nucleosynthesis calculations discussed in this work, the iterative procedure places this application in the worst possible category for parallelization, in which all cores have to participate throughout the iteration, exchanging intermediate results on a regular basis. The huge amount ot time spent on communications between cores, together with the small problem size (limited by the number of isotopes of the nuclear network), result in a much worse performance of the parallel application than the 1-core, sequential version of the code. This stems from the fact that the communication and message passing times between processes largely outgrow the time spent on computation. It is therefore not advisable to parallelize the nucleosynthesis portion of a stellar code (or, by extension, a post-processing code) unless the number of isotopes adopted largely exceeds 10,000.
With regard to the parallelization of a complete stellar evolution code, efforts have focused on the spherically symmetric, Lagrangian, implicit hydrodynamic code SHIVA (José and Hernanz 1998, José 2016), in the framework of a 200-shell simulation of a typical type I X-ray burst. Two different nuclear reaction networks have been considered: a reduced one, consisting of 324 isotopes and 1392 reactions; and a more extended network, with 606 nuclides and 3551 nuclear interactions. The performance of the parallelized version of SHIVA turned out to be excellent: speed-up factors of 26 and 35 have been obtained, for the reduced (i.e., Model 1) and extended networks (Model 2), respectively, when 42 cores were used. These results, however, did not match the maximum expected values for a perfect parallel application (i.e., the computation to communication ratio was large enough so that processing work could be distributed in an extremely efficient way amongst processes). To put these results into context, in our execution environment, a parallel simulation using 42 cores took ∼5.7 hr to compute 200,000 time-steps with a reduced nuclear network (cf., 6.1 days in its sequential version). The computation time increased to ∼20 hr when the extended network (with 606 nuclides and 3551 nuclear reactions) was used, for the same number of time-steps (cf., 28.6 days in its sequential version). Such excellent results completely justify the time invested in the parallelization of the code. Moreover, maximum speed-ups of ∼41 and ∼85 have been predicted by the performance model when using 200 cores, for the reduced and extended nuclear networks, respectively.
A key ingredient in achieving the large speed-up factors reported above is the decoupling of the nucleosynthesis subroutines from the set of hydrodynamic/structure equations adopted in SHIVA. This approach, while having a minor effect on the expected energetics and chemical composition of a star, is essential to justify a parallelization effort. In sharp contrast, efforts to parallelize FRANEC (see Refs. Limongi and Chieffi 2003, Chieffi and Limongi 2013, and references therein), another Henyey-type code in which the nucleosynthesis and structure equations are solved simultaneously by means of a time-implicit scheme,^{8} yielded very poor speed-up factors (A. Chieffi, private com.).
In summary, parallelization of a fully coupled, time-implicit code can only result in large speed-factors if the most time-consuming parts of the code (e.g., the nucleosynthesis subroutines) are decoupled from the hydro equations, and therefore, can be handled in a time-explicit way. Most multidimensional, stellar evolution codes available to date (e.g., PROMETHEUS Fryxell et al. 1989; FLASH Fryxell et al. 2000; DJEHUTY Dearborn et al. 2005, 2006; GADGET2 Springel 2005) are (time) explicit. While, in general, explicit schemes are easier to implement than implicit schemes, the real pay-off is the huge speed-up factors achievable when parallelized, compared with their 1-core, sequential versions.
As for the nuclear energy production and nucleosynthesis, neutrino losses are also implemented explicitly in the SHIVA code. However, as they do not require intense computation efforts, subroutines handling neutrino losses have not been parallelized in this work.
Note that \(T_{\mathrm{S}} \equiv T_{\mathrm{in}}+T_{\mathrm{pp}}+T_{\mathrm{out}}\) and \(T_{\mathrm{P}} \equiv T_{\mathrm{in}}+T_{\mathrm{pp}}/N_{\mathrm{P}}+T_{\mathrm{comm}}+T_{\mathrm{out}}\).
All simulations reported in this paper have been executed in the 42-core Hyperion cluster of the Astronomy and Astrophysics Group at UPC.
A few exceptions involve reactions such as ^{12}C + ^{12}C, ^{16}O + ^{16}O, ^{20}Ne + ^{20}Ne, that take place during some stages of the evolution of stars. See Refs. Iliadis (2015), José (2016), for details.
The SHIVA code uses a number of convergence and accuracy criteria to guarantee, for instance, that the new solution satisfies the mass, momentum and energy conservation equations.
More recent versions of the FRANEC code, known as FUNS, contain several solver schemes in which the equations of nucleosynthesis, mixing and structure can be handled in a coupled or decoupled way (O. Straniero, private com.). The extensively used MESA code (Paxton et al. 2011, 2013) also solves the nucleosynthesis and composition equations directly coupled to the structure equations. Note, however, that MESA contains a number of explicit modules that can be computed in parallel using OpenMP (Paxton et al. 2011).
Declarations
Acknowledgements
This article benefited from discussions within the “ChETEC” COST Action (CA16117).
Availability of data and materials
A simplified version of the SHIVA code, freefall.f, is available at http://www.fen.upc.edu/users/jjose/CRC-Downloads.html. The code applies Henyey’s method to simulate the free-fall collapse of a homogeneous sphere. See Ref. José (2016), for details.
Funding
This work has been partially supported by the Spanish MINECO grant AYA2017–86274–P, by the E.U. FEDER funds, and by the AGAUR/Generalitat de Catalunya grant SGR-661/2017.
Authors’ contributions
All authors have equally contributed to this work. All authors read and approved the final manuscript.
Competing interests
The authors declare that they have no competing interests.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
- Amdahl, G.M.: Validity of the single processor approach to achieving large scale computing capabilities. In: Proceedings of the April 18–20, 1967, Spring Joint Computer Conference, pp. 483–485. ACM Publ., New York (1967) Google Scholar
- Amestoy, P., Guermouche, A., L’Excellent, J.Y., Pralet, S.: Hybrid scheduling for the parallel solution of linear systems. CERFACS, Tech. Rep., Toulouse, France (2004) Google Scholar
- Amestoy, P.R., Duff, I.S., L’Excellent, J.Y., Koster, J.: A fully asynchronous multifrontal solver using distributed dynamic scheduling. SIAM J. Matrix Anal. Appl. 23, 15–41 (2001) MathSciNetView ArticleGoogle Scholar
- Amestoy, P.R., Duff, I.S., L’Excellent, J.Y., Li, X.S.: Performance and tuning of two distributed memory sparse solvers. In: Meza, J., Koelbel, C. (eds.) Proceedings of the 10th SIAM Conference on Parallel Processing for Scientific Computing. Society for Industrial & Applied Mathematics, Portsmouth (2001) Google Scholar
- Bader, G., Deuflhard, P.: A semi-implicit mid-point rule for stiff systems of ordinary differential equations. Numer. Math. 41, 373–398 (1983) MathSciNetView ArticleGoogle Scholar
- Bodenheimer, P., Laughlin, G.P., Rózyczka, M., Yorke, H.W.: Numerical Methods in Astrophysics: An Introduction. CRC/Taylor and Francis, Boca Raton (2006) MATHGoogle Scholar
- Chieffi, A., Limongi, M.: Pre-supernova evolution of rotating solar metallicity stars in the mass range 13–120 M_{⊙} and their explosive yields. Astrophys. J. 764, 21–36 (2013) ADSView ArticleGoogle Scholar
- Dearborn, D.S.P., Lattanzio, J.C., Eggleton, P.P.: Three-dimensional numerical experimentation on the core helium flash of low-mass red giants. Astrophys. J. 639, 405–415 (2006) ADSView ArticleGoogle Scholar
- Dearborn, D.S.P., Wilson, J.R., Mathews, G.J.: Relativistically compressed exploding white dwarf model for Sagittarius A East. Astrophys. J. 630, 309–320 (2005) ADSView ArticleGoogle Scholar
- Foster, I.: Designing and Building Parallel Programs. Addison-Wesley, Boston (1995) MATHGoogle Scholar
- Fox, J.: Fully-kinetic PIC simulations for hall-effect thrusters. PhD thesis, Massachusetts Institute of Technology (2007) Google Scholar
- Fryxell, B., Müller, E., Arnett, W.D.: Hydrodynamics and nuclear burning. Max-Planck Inst. for Astrophysics. Rep. 449, Garching, Germany (1989) Google Scholar
- Fryxell, B., Olson, K., Ricker, P., Timmes, F.X., Zingale, M., Lamb, D.Q., MacNeice, P., Rosner, R., Truran, J.W., Tufo, H.: FLASH: an adaptive mesh hydrodynamics code for modeling astrophysical thermonuclear flashes. Astrophys. J. Suppl. Ser. 131, 273–334 (2000) ADSView ArticleGoogle Scholar
- Gear, C.W.: The automatic integration of ordinary differential equations. Commun. ACM 14, 176–179 (1971) MathSciNetView ArticleGoogle Scholar
- Graham, R.: MPI: a message-passing interface standard, V3.0. University of Tennessee, Tech. Rep., Knoxville (2012) Google Scholar
- Henyey, L.G., Forbes, J.E., Gould, N.L.: A new method of automatic computation of stellar evolution. Astrophys. J. 139, 306–317 (1964) ADSView ArticleGoogle Scholar
- Iliadis, C.: Nuclear Physics of Stars, 2nd edn. Wiley-VCH Verlag, Weinheim (2015) Google Scholar
- José, J.: Stellar Explosions: Hydrodynamics and Nucleosynthesis. CRC/Taylor and Francis, Boca Raton (2016) View ArticleGoogle Scholar
- José, J., Hernanz, M.: Nucleosynthesis in classical novae: CO versus ONe white dwarfs. Astrophys. J. 494, 680–690 (1998) ADSView ArticleGoogle Scholar
- José, J., Moreno, F., Parikh, A., Iliadis, C.: Hydrodynamic models of type I X-ray bursts: metallicity effects. Astrophys. J. Suppl. Ser. 189, 204–239 (2010) ADSView ArticleGoogle Scholar
- Limongi, M., Chieffi, A.: Evolution, explosion, and nucleosynthesis of core-collapse supernovae. Astrophys. J. 592, 404–433 (2003) ADSView ArticleGoogle Scholar
- Longland, R., Martin, D., José, J.: Performance improvements for nuclear reaction network integration. Astron. Astrophys. 563, 67–113 (2014) ADSView ArticleGoogle Scholar
- McKenney, P.E.: Is Parallel Programming Hard, and, If so, What Can You do About It?. Paper Linux Technology Center, New York (2011) Google Scholar
- Pacheco, P.: Parallel Programming with MPI. Morgan Kaufmann Publ., San Francisco (1997) MATHGoogle Scholar
- Parikh, A., José, J., Moreno, F., Iliadis, C.: The effects of variations in nuclear processes on type I X-ray burst nucleosynthesis. Astrophys. J. Suppl. Ser. 178, 110–136 (2008) ADSView ArticleGoogle Scholar
- Paxton, B., Bildsten, L., Dotter, A., Herwig, F., Lesaffre, P., Timmes, F.: Modules for experiments in stellar astrophysics (MESA). Astrophys. J. Suppl. Ser. 192, 3–35 (2011) ADSView ArticleGoogle Scholar
- Paxton, B., Cantiello, M., Arras, P., Bildsten, L., Brown, E.F., Dotter, A., Mankovich, C., Montgomery, M.H., Stello, D., Timmes, F.X., Townsend, R.: Modules for experiments in stellar astrophysics (MESA): planets, oscillations, rotation, and massive stars. Astrophys. J. Suppl. Ser. 208, 4–42 (2013) ADSView ArticleGoogle Scholar
- Prantzos, N., Arnould, M., Arcoragi, J.P.: Neutron capture nucleosynthesis during core helium burning in massive stars. Astrophys. J. 315, 209–228 (1987) ADSView ArticleGoogle Scholar
- Springel, V.: The cosmological simulation code GADGET-2. Mon. Not. R. Astron. Soc. 364, 1105–1134 (2005) ADSView ArticleGoogle Scholar
- Thakur, R., Gropp, W.: Improving the performance of MPI collective communication on switched networks. In: Dongarra, J.D.L., Orlando, S. (eds.) Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 257–276. Springer, Berlin (2003) View ArticleGoogle Scholar
- Timmes, F.X.: Integration of nuclear reaction networks for stellar hydrodynamics. Astrophys. J. 124, 241–263 (1999) ADSView ArticleGoogle Scholar
- Wagoner, R.V.: Synthesis of the elements within objects exploding from very high temperatures. Astrophys. J. Suppl. Ser. 18, 247–295 (1969) ADSView ArticleGoogle Scholar