Acerca de
Add-On Workshops
February 27, 2025
Add-on workshops will be available as part of the 18th annual Energy HPC Conference. Each workshop will take place at Rice University's BRC on Thursday, February 27, 2025 - they will occur simultaneously, so only one workshop can be chosen per registration. Details about each workshop will be listed below when available.
​
1. Scientific Machine Learning
Auditorium
​
Organized by Rice University's Beatrice Riviere and Matthias Heinkenschloss
​​
Schedule
8:30 - 9:00 am: Check-in + Breakfast
9:00 - 10:00 am: Marta D’Elia (Pasteur Lab, previously at Meta and at Sandia National Lab)
10:00 - 11:00 am: Charbel Farhat (Stanford)
11:00 - 11:20 am: Adrian Celaya (Rice University)
11:20 - 11:40 am: Jonathan Cangelosi (Rice University)
11:40 am - 1:00 pm: Lunch
1:00 - 2:00 pm: Elizabeth Qian (Georgia Tech)
2:00 - 3:00 pm: Benjamin Peherstorfer (NYU)
​
​​​​2. Best Practices in HPC Systems Management
​2nd Floor, Room 280
​
Organizers: Jonathan Anderson (CIQ), Kent Blancett (bp), Donny Cooper (TotalEnergies), John DeSantis (TACC), Keith Gray (TotalEnergies), Shawn Hall (Jump Trading), Adam Hough (Shell), Tommy Minyard (TACC), Timothy D Osborne (ORNL), and Wade Vinson (NVIDIA).
​
Speakers: Practitioners and Experts from Industry, Academia, and National Labs
​
Schedule
8:00 - 8:30 am: Check-in + Breakfast
8:30 - 12:00 pm: Talks
12:00 - 1:00 pm: Lunch
1:00 - 3:45 pm: Talks​
​​​​​​
​
3. Devito Codes
10th Floor, Room 1003
Limited seating; 50 registrants
​
Organizers: Gerard Gorman (Devito Codes), Paul Holzhauer (Devito Codes), Fabio Luporini (Devito Codes)
​
Schedule
8:00 - 8:30 am: Check-in + Breakfast
8:30 - 9:00 am: Workshop Start
​​​
There will be power, but please charge in advance as some outlets may need to be shared.​​​​
​​
​
4. Building an Optimized Elastic Finite-difference Propagator from Scratch for FWI on NVIDIA's Latest GPUs
Exhibit Hall
Limited seating; 30 registrants
​​
There will be power, but please charge in advance as some outlets may need to be shared.​​
​​
​
5. Performance Evaluation of GPU-accelerated HPC and AI applications using HPCToolkit, TAU, and ParaTools Pro for E4S(TM)
1st Floor, Room 106
Limited seating; 20 registrants
​
Speakers:
- John Mellor-Crummey, PhD, Professor of Computer Science and of Electrical and Computer Engineering, Rice University
- Sameer Shende, PhD, Research Professor and Director of the Performance Research Laboratory, University of Oregon
​
Course Skill Level: 25% basic content, 25% intermediate content, and 50% advanced content
​
Materials: Attendees will need to bring their laptops to access materials during the workshop. There will be power, but please charge in advance as some outlets may need to be shared.
​
Abstract:
The hands-on workshop will present two performance evaluation tools, HPCToolkit and TAU, to evaluate and optimize the performance of GPU-accelerated HPC and AI applications.
HPCToolkit (https://hpctoolkit.org) is an integrated suite of tools for profiling and tracing of parallel programs on computers ranging from multicore desktop systems to GPU-accelerated supercomputers and cloud platforms. HPCToolkit can measure and analyze executions of fully optimized, dynamically linked parallel applications on tens of thousands of CPU cores and GPUs. It supports multi-lingual codes with external binary-only libraries. It collects sampling-based measurements of CPU codes with a controllable overhead. It measures GPU performance using vendor APIs to collect fine-grained measurements using PC sampling or instrumentation and monitors asynchronous GPU operations using activity APIs. HPCToolkit can attribute performance measurements to rich, dynamic calling contexts containing procedures, inlined functions, loop nests, and source lines on both CPUs and GPUs.
The TAU Performance System [http://tau.uoregon.edu] is a versatile performance evaluation toolkit supporting both profiling and tracing modes of measurement. It supports performance evaluation of applications running on CPUs and GPUs and supports runtime-preloading of a Dynamic Shared Object (DSO) that allows users to measure the performance without modifying the source code or binary. This tutorial will describe how TAU may be used with MVAPICH and support advanced performance introspection capabilities at the runtime layer. TAU's support for tracking the idle time spent in implicit barriers within collective operations will be demonstrated. TAU also supports event-based sampling at the function, file, and statement level. TAU's support for runtime systems such as CUDA (for NVIDIA GPUs), Level Zero (for Intel oneAPI DPC++/SYCL), ROCm (for AMD GPUs), OpenMP with support for OMPT and Target Offload directives, Kokkos, and MPI allow instrumentation at the runtime system layer while using sampling to evaluate statement-level performance data.
HPCToolkit and TAU will be demonstrated on AWS using the ParaTools Pro for E4S(TM) image. The Extreme-scale Scientific Software Stack (E4S) [https://e4s.io] is a curated, Spack-based software distribution of 100+ HPC and AI/ML packages. The Spack package manager is a core component of E4S, and it is a platform for product integration and deployment of performance evaluation tools such as HPCToolkit, TAU, DyninstAPI, PAPI, etc., and supports both bare-metal and containerized deployment for CPU and GPU platforms. E4S provides a Spack binary cache and a set of base and full-featured container images with vendor runtimes to support GPU architectures from NVIDIA, Intel, and AMD. E4S is a community effort to provide open-source software packages for developing, deploying, and running scientific applications and tools on HPC platforms.