Accelerated X-Ray Analysis for Nanoscale Imaging (XANI) of Novel Materials
A massive-scale X-ray free-electron laser (XFEL) enables tracking structural and electron dynamics in novel systems, including fusion materials, semiconductors, batteries, and catalysis. It produces ultrashort X-ray pulses that can record the movements of atoms and electrons. These instruments can detect the smallest change in material structure caused by defects and other influences. The high repetition rate of these bright X-ray bursts can reach up to 1 million shots per second with 35-million-pixel cameras. The acquired multidimensional datasets contain rich physical information about the fastest microscopic movements of electrons and atoms, which can help identify defects in materials. Processing and analyzing these datasets to extract the physics has conventionally required more than nine months of computational time. XFEL research facilities include SwissFEL in Switzerland, Spring-8 Angstrom Compact free-electron Laser (SACLA) in Japan, Linac Coherent Light Source (LCLS-II) at SLAC, European XFEL in Germany, and Pohang Accelerator Laboratory (PAL) in Korea. This post highlights new technical breakthroughs of the Accelerated X-ray Analysis for Nanoscale Imaging (XANI) workflow. The NVIDIA team demonstrated on characterization of quantum materials to reconstruct the phonon dispersion from ultrafast femtosecond laser pump/hard X-ray probe experiments. Specifically, the team accelerated the XANI workflow and compressed the computational time to process and analyze 42 terabytes (TBs) of data shrinks from nine months to less than four hours on 32 NVIDIA GB200 Grace Blackwell Superchips, while preserving the same precision of acquired data. The XANI project has been adopted by different communities, from quantum physics to materials chemistry, demonstrating the ability of CUDA Python and distributed computing to accelerate scientific discoveries. What are the challenges of single-node Python for exascale science? Massive-scale XFEL facilities can operate up to megahertz (MHz) rates and generate hundreds of TBs to petabytes (PBs) of data. This massive volume of data must be processed and analyzed in real time to steer scientific experiments and accelerate discovery. Traditional CPU-bound pipelines require significant manual parameter tuning and subsampling, often only processing 10% of a dataset during an experiment. For high-resolution imaging of new phases in quantum materials, the computational cost of nonlinear fitting and 3D reconstruction previously relegated analysis to the post-experiment phase. A single experiment could require nine months of computational time. How does XANI accelerate numerical computation and I/O performance? From the originally vectorized NumPy and SciPy, the NVIDIA team accelerated the XANI workflow 43x on a single GPU on a GB200 Grace Blackwell Superchip and 1,000x on 64 GPUs. As a result, the computational time to process and analyze 42 TBs of data shrinks to less than four hours, while preserving the same precision of acquired data. To achieve this improvement, new cuPyNumeric libraries were developed, including LMFIT and multithreaded Hierarchical Data Format 5 (HDF5). These libraries further improve GPU utilization for numerical computation and 165x acceleration in I/O throughput with GPUDirect Storage (GDS) and multithreaded HDF5. What are the benefits of XANI architecture? XANI facilitates migrations from a CPU-orchestrated workflow to a GPU-centric distributed model using cuPyNumeric. This approach enables live-feedback and automated experimental…

