
Acerca de
Add-On Workshop: NVIDIA
February 27, 2025
Add-on workshops will be available as part of the 18th annual Energy HPC Conference. Each workshop will take place at Rice University's BRC on Thursday, February 27, 2025 - they will occur simultaneously, so only one workshop can be chosen per registration.
​
Building an Optimized Elastic Finite-difference Propagator from Scratch for FWI on NVIDIA's Latest GPUs
Exhibit Hall | 8:30 am - 5:00 pm
Limited Seating (30 registrants​​) - Sold Out
​​​​
Speakers:
-
Guillaume Barnier (NVIDIA)
-
Guillaume Thomas-Collignon (NVIDIA)
-
Igor Terentyev (NVIDIA)
​
Schedule:
-
8:30 am - 9:00 am: Check-in + Breakfast
-
9:00 am - 10:00 am: Introduction, Theory Review (PDE + Numerical Scheme)
-
10:00 am - 11:30 am: Initial Implementation + Profiler Report Introduction and Analysis
-
11:30 am - 12:30 pm: Lunch
-
12:30 pm - 1:30 pm: Optimization #1 Using Shared Memory
-
1:30 pm - 2:30 pm: Optimization #2 Using Asynchronous Shared Memory Loads
-
2:30 pm - 3:30 pm: Optimization #3 Using TMA
-
3:30 pm - 4:00 pm: Break
-
4:00 pm - 5:00 pm: Theory Review on Adjoint System of Equations for FWI, Numerical Implementation, and Differences with Forward​​
Materials: It is highly recommended for attendees to bring their own laptop, but the speakers will still try to make the workshop understandable and adapted for people that do not have a computer. There will be power, but please charge in advance as some outlets may need to be shared.
Abstract: Elastic full waveform inversion (FWI) is becoming the industry's standard for subsurface model parameter estimation. However, this technique requires to simulate hundreds of thousands of wave propagations by numerically solving a system of partial differential equations (PDE). Consequently, implementing an efficient numerical scheme on GPUs is critical.
In this workshop, we propose to teach the attendees how to gradually build finite-difference (FD) propagators for elastic media (ISO and VTI) optimized for Nvidia's latest GPUs (Ampere, Hopper, and Blackwell).
We provide a brief theoretical review, and we describe the numerical scheme we implement, which is based on a staggered-grid approach for both time and space. We then gradually implement multiple versions of the forward propagator, starting from a baseline implementation that requires minimum GPU hardware knowledge, to our fastest version using asynchronous load to shared memory. At each step, we use our profiling tool - Nsight Compute (NCU) - to identify bottlenecks in our kernels and we show how to leverage Nvidia's new hardware features to mitigate these bottlenecks. Finally, we show how to derive and efficiently implement the adjoint propagator required for the elastic FWI gradient computation.​
​
​​​​​​​