![]() For a sufficiently large number of integers, tuning the parameters of our SYCL implementations achieves 1.4X speedup over the open-source implementations on an Intel UHD630 integrated GPU. This paper presents several SYCL implementations of integer sum reduction-using atomic functions, shared local memory, vectorized memory accesses and parameterized workload sizes-to compare the performance and maturity of SYCL against open-source vendor-specific implementations of the same reduction. One significant task performed on these accelerators is a primitive operation for integer sum reduction. SYCL is a promising programming model for heterogeneous computing-allowing a single-source code to target devices from multiple vendors. In conclusion, the source code can be accessed through the HOOMD-blue web page for free by any interested user. Compared with another popular software LAMMPS running on 32 cores of AMD Opteron 6220 processors, the GPU/CPU performance ratio can reach as high as 4.6. Our code scales well with the size of the simulating system on NVIDIA Tesla more » M40 and P100 GPUs. We demonstrate that single-precision floating point operations efficiently implemented on GPUs can produce sufficient accuracy when compared against double-precision codes, as demonstrated in test simulations of calculations of the glass-transition temperature of Cu 64.5Zr 35.5, and pair correlation function of liquid Ni 3Al. We first discuss the details of our implementation and then report extensive benchmark tests. We present an implementation of EAM and FS interatomic potentials, which are widely used in simulating metallic systems, in HOOMD-blue, a software designed to perform classical molecular dynamics simulations using GPU accelerations. Convergence with respect to time step is found to depend on the property being computed and the chosen active space. Simulations of decacene using the 6-31G(d) basis set and a 12 electrons/12 orbitals active space took 20.1 h to propagate for 100 fs with a 1 attosecond time step on a single NVIDIA K40 GPU. Our simulations predict that chirped pulses can be used to induce dipole-forbidden transitions. We demonstrate the applicability of our approach by computing the response of a large molecule with a strongly correlated ground state, decacene (C 42H 24), to various pulses (δ-function, transform limited, chirped). A symplectic split operator propagator yields long-time norm conservation. Graphics processing unit (GPU) acceleration enables fast solution of the TDSE even for large active spaces-up to 12 electrons in 12 orbitals (853776 determinants) in this work. Such extension is enabled by use of a direct configuration interaction approach that eliminates the need to explicitly build, store, or diagonalize the Hamiltonian matrix. In this work, we present a novel TD-CI approach that extends TD-CI to large complete more » active-space configuration expansions. Time-dependent configuration interaction (TD-CI) offers several advantages over the widely used real-time time-dependent density functional theory: namely, that it correctly models Rabi oscillations it offers a spin-pure description of open-shell systems and a hierarchy of TD-CI methods can be defined that systematically approach the exact solution of the time-dependent Schrodinger equation (TDSE). In this paper, time-dependent electronic structure methods are growing in popularity as tools for modeling ultrafast and/or nonlinear processes, for computing spectra, and as the electronic structure component of mean-field molecular dynamics simulations. Lastly, several active spaces were investigated to assess the dependence of spectral features on orbital space = , In this work, we demonstrate the utility of our program by generating the absorption spectrum for diphenyl acetylene at the floating occupation molecular orbital complete active space configuration interaction level of theory. ![]() These advances enable routine molecular dynamics simulations, geometry optimizations, and absorption spectrum calculations for molecules with large configuration spaces, a task that has heretofore required massive computational effort. ![]() Our parallel algorithm enables the calculation of arbitrarily large configuration spaces (limited only by available system memory), with iteration times of 13 min for an active space of 18 electrons in 18 orbitals (2.4 billion determinants) using six consumer grade NVIDIA 1080Ti GPUs. Similar improvements in the one- and two-particle reduced density matrix formation allow for fast analytical energy gradients and electronic properties. In this study, we have extended our graphical processing unit (GPU)-accelerated direct configuration interaction program to multiple devices, reducing iteration times for configuration spaces of 165 million determinants to only 3 s using NVIDIA P100 GPUs.
0 Comments
Leave a Reply. |