The Exascale Computing Project and the Future of HPC

Title: The Exascale Computing Project and the Future of HPC
Date: Tuesday, April 30, 2019 10:00 AM ET / 7:00 AM PT
Duration: 1 hour

SPEAKER: Doug Kothe, Director, Exascale Computing Project at Oak Ridge National Laboratory

TechTalk Registration
High-Performance Computing on Complex Environments (Skillsoft book, free book for ACM Members)
Parallel Programming for Modern High Performance Computing Systems (Skillsoft book, free book for ACM Members)
High-Performance Computing on the Intel Xeon Phi: How to Fully Exploit MIC Architectures (Skillsoft book, free book for ACM Members)
Research and Applications in Global Supercomputing (Skillsoft book, free book for ACM Members)
High Performance Computing: Modern Systems and Practices (ScienceDirect book, free book for ACM Members)
High Performance Computing and the Discrete Element Model: Opportunity and Challenge (ScienceDirect book, free book for ACM Members)

It is requested that the exascale project management make strong-scaling benchmarks publicly available, sampled at roughly 1/64, 1/32, 1/16, 1/8, 1/4, 1/2, and the whole machine.

By way of example, strong-scaling results on Summit are published only up to 25% of the machine (light blue represents missing data):

See: “Scalable Molecular Dynamics with NAMD on the Summit System”.

1 Like

Really enjoyed the event. I come from a Power energy (Bulk Electric System) industry background and was interested to learn if there were projects/initiatives underway solving interesting challenges in that space using HPC.

1 Like

I missed the HPC webcast and I am trying to register now. “Password” field is required, apparently for the webcast, but it was not communicated in the invite for the event, nor at Can somebody please advise how to obtain the password?
Thank you,
-Alexander Mitchem (

Hello Alexander,

Due to the nature of the material presented, the video archive is undergoing review. We hope to make it available in the next two weeks and will notify all registrants once it’s available. Thank you for your patience.

Thanks for replying! Will be looking forward to getting the updated info on this. -AM

1 Like

Q&A from TechTalk

  1. Focusing on the SW “tool chain”: programming languages, math libraries, user interface SW, compilers/assemblers, open source versus closed source, debugger tools, etc.

The breadth and depth of the Software Technology (ST) effort in ECP can be best scrutinized by consulting the ST Capability Assessment Report (CAR) at:

  1. How does a person with a standard undergraduate computer science education, several years of experience in software engineering, but no HPC experience, switch gears and get started in an HPC career?

Take advantage of the training courses offered by ECP or consult good online resources (e.g., Coursera). Consult the many job openings in HPC at DOE labs, universities, and industry. Attend key conferences like SuperComputing (next one is SC19), or just contact any of the ECP leadership for help.

  1. Slide 9 says: “At least two diverse system architectures”. What kind of architectures are you referring to? Are we talking about communication vs. computation architectures? CPU vs. co-processor architectures?

In that case we are referring to the architectures of the computer systems themselves that DOE is deploying. Architectural diversity has many dimensions: hardware architecture, system prime contractor and subcontractor, component supplier, software stack, and programming models/algorithms, etc.

  1. Focusing on computer graphics: graphics tools, animation tools, user interface to graphics manipulations, animation speed control & zoom & transparency & color gradient, data storage for replay, etc.

The breadth and depth of the Software Technology (ST) effort in ECP can be best scrutinized by consulting the ST Capability Assessment Report (CAR) at:

  1. What are the most common programming languages currently used for HPC applications?

While many programming languages are used in HPC, (C++, C, Fortran, Python, UPC, UPC++, CUDA, Julia, etc.) C++ is currently the most predominant. The ECP ST effort and its integration into software stacks at DOE HPC facilities must support many languages and therefore be more or less “language agnostic”.

  1. ECP is targeting software for DOE’s exascale’s computing, but will other organizations be able to get those tools and make use of them?

Yes, most of the software and application efforts in ECP are building on existing or have created new open source repositories. Details can be found at the ECP website or more specifically the latest ECP Software Technology Capability Assessment Report (CAR) at

  1. How did you decide on the application portfolio that you have chosen to work on?

A number of technical and programmatic criteria were used, but this was still a very difficult choice because there are literally hundreds of viable candidates. The applications chosen, viewed as “first movers” to exascale, are envisioned to “show the way” for many other applications. ECP applications are also viewed to be very general capabilities, able to tackle many S&T problems in their respective domains other than the specific “exascale challenge problem” being focused on in ECP. The overarching criterion for selection was whether the application was targeting a strategic DOE mission problem of national interest and whether appropriate DOE stakeholders (program managers) supported the effort.

  1. How will the general HPC community benefit from this project if all the focus is on exascale apps and an exascale software stack - when in reality - many of us will never get to run our codes on an exascale system?

Yes, the general audience should benefit from ECP’s ultimate products and solutions. Most of the hard core software development occurs on laptops and desktops whose hardware reasonably emulates a single node in the envisioned exascale computers. The software is designed to recognize the hardware it is being compiled for (say a laptop) and make automatic adjustments accordingly.

  1. Is an HPC software career stable and secure or does it require moving from site to site looking for new projects?

I worked for 22 years at one HPC center…the sites tend to represent very long term investments because of the dollars involved. The center I work at now (TACC) has been around for over 25 years (through a couple of name changes).

  1. There are two fundamental avenues to facilitate HPC: RAM-resident computing, and massively-parallel partition of both code and data. How is ECP addressing (1) massively parallel shared memory and (2) access to large data partitions on the part of multiple code partitions?

Most ECP and DOE HPC software in general is designed to partition the computational domain across distributed memory at run time, as the simulations typically need the aggregate memory available in a distributed system to execute the simulations of interest. For shared memory hardware (that may have lots of processors), the software is also designed to partition tasks up into a number of independent execution threads that each access a portion of the shared memory at any given time during the execution. This kind of architectural software design is not unique to ECP, having been addressed by HPC researchers since the late 1980s, but is always a concern as programming models evolve or change, as well as the underlying hardware architecture.

  1. I am teaching an intro to HPC class in the fall. Do you think the ECP application would be good problems for senior students of a CS BS to look at or would that be too difficult? Any hint on where to start looking at using ECP application in classes?

The application proxies that Doug talked about are helpful for HPC developers because they do real work, but don’t have any export control or sensitivity issues. Depending on the level of your class you may find that those apps are a bit complex for an intro class. Feel free to contact the ECP leadership for possibilities, however, because many of the ECP applications have simpler “proxies” that might be more amenable to students in an introductory HPC class.

  1. How practical are directed graph data organizations in MPP grids versus more array-, row-, or column-based structured physical data models?

In many cases, much more practical and appropriate, particularly in working with task-based runtime programming models where directed acyclic graphs (DAGs) are used to map tasks & associated data onto the hardware. I suggest you contact the ECP Software Technology (ST) leadership for more info.

  1. Processing at such a large scale leads to high energy consumption. What would be the workaround for reducing energy consumption?

The first exascale systems will like consume on the order of 20 MW each. One item to note is in today’s (and future exascale) HPC systems, the power cost associated with moving data (say DP DRAM read to register) is two to three orders of magnitude greater than the power cost of compute (say a DP fused multiply-add). So ECP software and applications must be developed to be as “communications avoiding” as possible (to save on power and therefore be more efficient).

  1. The software technologies depend heavily on whether the platform provides FLOPS primarily via GPUs.

In general ECP software and applications must be able to recognize (either at compile or run time) if the hardware has or doesn’t have GPUs and adjust accordingly for the most effective hardware utilization. Focus typically occurs in the most compute intensive kernels of the code, where conditional (branch-based) programming and software design is available that essentially says “oh this is a GPU - I’ve seen this before so I know how to lay out my data and compute on it.”

  1. How can you build an effective exascale software stack if you don’t know the architecture of the forthcoming exascale systems?

Very good question. You really cannot. That said, the ECP works very closely with DOE HPC facilities and US HPC vendor companies, so IT has a pretty good idea of the hardware architectures to target. Details of the planned US exascale systems are presently emerging and will be known in detail very soon.

  1. Focusing on the “metal”: how deep into the HW could users orchestrate (i.e., firmware, OS kernel extensions, ethernet/infiniband/thunderbolt buffering & switching, timing, syncing, GPUs, FPGAs, ASICs), what OSs are supported? What options are there for segmenting HW for multiple users at a time (including debugger new SW)? How will you include new HW, SW, and algorithms?

While US HPC vendors now have and provide very efficient and robust operations systems (that DOE softare development efforts build on), there are still possibilities to get “deep” into the OS and HW if the situation warrants and the arrangements with vendors allow this kind of work. In general, if “going deep” results in big payoffs in terms of application and software stack efficiency, then that’s what happens. This kind of work, however, is not nearly as needed as it was, say, a decade or more ago.

  1. A particularly important aspect of HPC is the unwanted “turnaround” in strong scaling that arises with increased problem size and with increased communication. Will the exascale management team make strong-scaling benchmark data available, with codes running on portions of the machine reaching the range of 50% or more of the machine?

Yes indeed. Strong scaling remains a big scientific driver, ranging from engineering (mesh-based) continuum simulations down to atomistic (MD) and quantum (electronic structure) simulations. Benchmarks have been, are, and will be made available.

  1. Do you know the best place(s) to look for open opportunities in this field?

In terms of job opportunities, I would start at the HPC center websites and DOE Lab job opening websites. You can also contact the leaders of SIGHPC and they may have some ideas for you. Send me an email at and I’ll help if I can.

  1. A metric was listed as <1 fault/week (I believe slide 9 or 10, and I believe it was under the HW section). Is this per machine, and will the solutions for this be reliable hardware, OS, apps, or a combination?

This metric is one that is very important from an application perspective, often referred to as “MTTI” (Mean Time To Interrupt). This is per machine and usually requires a shared-fate combination of resilient hardware and fault-tolerant software. High MTTIs are always important as well as challenging.

  1. Can Dr. Kothe please comment on the role of modern Fortran in exascale computing e.g., current Fortran 2018 standard with COARRAYs and teams toward parallel computing? As a chemical engineer working in computational fluid-phase thermodynamics and phase equilibria, I’m motivated to come back to Fortran where I started in grad school, since I find other programming languages tend to move me away from basic science and domain knowledge when I’m trying to develop scientific software. Does modern Fortran have a future in HPC?

Fortran is still heavily used in the DOE - now mostly for what we call “legacy” applications (those that have existed and been in “production” for a number of years), so knowing Fortran (and its latest 2018 standard) remains a valued skill. That said, the risk with Fortran (especially embracing all the bells and whistles of the latest standard) is having a large suite of robust compilers with adequate front ends. There just aren’t that many good Fortran compilers around anymore, and this list may likely decline in the coming years.

  1. The speaker mentioned RAJA to abstract multiple execution platforms at the programming level. How does that relate/compete with other approaches like kokkos?

This is a good place to start: Kokkos - Performance Portability

  1. We are working at my university through an NSF industry/university cooperative research center with an industry group developing new standards and methods for data center and server control directly applicable to HPC (Redfish). Is there a way to engage with the ECP hardware & integration team on this project?

Indeed. Please contact ECP Leadership (Hardware & Integration leads) or, since this is an NSF activity, John West can be a POC and liaison with ECP as well.

  1. Can you provide some details about hardware that is going to be used in ECP?

Consult the DOE ASCR and ASC websites or the DOE Lab HPC facility web sites (e.g., NERSC, OLCF, ALCF) for more information. The most recent and detailed information can be found there.

  1. There are probably a lot of computing professionals attending this webinar who come from outside DOE, computational science, and exascale (or even petascale) computing. Nevertheless, do you think ECP is developing technologies or processes that might benefit them directly? You mentioned ECP participation in standards bodies as one example, I’m wondering if there are other strategic examples?

Yes! Lots of the work DOE and ECP is doing should and frankly will benefit staff, companies, research institutions, universities, etc. outside DOE. If you can get us examples of what you are thinking about, we can provide you with more details.

  1. How do you consider the difference between ExaML and what industry is doing regarding distributed ML?

There are indeed differences, albeit some subtle. ECP’s ExaLearn co-design center does not want to recreate or overlap with useful existing industry frameworks, but instead build on them, augment, and if necessary develop new capabilities to fill in gaps. I suggest you consult recent public presentations on ExaLearn, e.g., the one recently given by the ExaLearn PI at the HPC User Forum in Santa Fe, NM (early April 2019).


  1. How are the ISV analysis companies (CAE, CFD, etc.) reacting to this effort?

ECP is in regular contact with ISVs in the US, who are in general enthusiastic. In fact, a few key ISVs are on the ECP Industry Council ( ECP is not out to directly compete with ISVs, but instead help them advance their products and technologies by making ECP’s software broadly available for possible integration into an ISV’s future releases, products, etc. Success for ECP would in fact be an ISV’s ingestion of ECP products and solutions directly into its next generation release(s).

  1. For Chameleon: how does ECP SW migrate to Chameleon at TACC and UChicago?

A great question. ECP is currently engaging more closely with NSF’s HPC efforts, and in particular TACC (through Dan Stanzione and John West). Through this engagement ECP envisions being able to deploy its SDKs and selected applications on the Chameleon cloud for testing, performance analysis, and possibly production work. Details remain to be worked out here, but ECP is interested in more intimate collaborations and co-devleopment with important NSF facilities like TACC.

  1. Is ECP mainly targeting performance portability in their technologies?

Performance portability is very tough (more of a journey than a destination), but important. Attention not paid to this issue will rear its ugly head later. While performance portability is not a hard metric for ECP, we will be checking to ensure that all of ECP’s software and applications can execute everywhere (at least on all DOE HPC platforms), and performance thereof will be closely monitored. Performance portability work is often encapsulated in smaller kernels of the software (e.g., the computer-intensive portions) with conditions or compiler directives, but more and more ECP expects to take advantages of software abstractions layers (e.g., Kokkos from SNL or RAJA from LLNL), portions of which l hope to incorporate into future programming language/model standards.

  1. Why are these powerful computer systems being used to address national critical problems such as predicting recessions, solving the spectrum shortage, and global warming?

For most (if not all) problems of national interest, high-end modeling and simulation or data analytic computing (e.g., AI/ML) solutions provide valuable insight for scientists, and decision/policy-makers. Currently many national critical problems can be addressed with higher (and in some cases much higher) confidence such that very consequential decisions can be made that are in part reliant upon these results. Exascale computing resources offer many benefits: enable much faster turnaround on these simulations, performing more simulation ensembles (for more reliable statistics), and perhaps most importantly allow for the reduction of simplifying assumptions in the physical model(s) implemented on the computer, making the simulation “error bars” smaller. In the end, however, ECP acknowledges that simulation results never replace the subject matter expert who is always needed to use those results to take the next step in the scientific process or to help make a key decision.

Hi, Alexander. The archive is now available. You can use the link at the top of this page to access the on-demand webcast.

The request was not for the benchmarks themselves.

The request was for the benchmark data showing the strong scaling of various (individual) codes running on machines managed by the exascale project.

Thus far, there appears to be no published strong-scaling data for any code using more than 50% of the Summit machine.

Without published strong-scaling data reaching to most of the processing elements, there is no meaningful way to track improvements across time and across updates to code and to algorithms.

If there are published strong-scaling measures for codes running on Summit (reaching to more than 50% of the machine), it would be appreciated by the community if a link to those results could be provided by the exascale management team.

I am the inventor of Topological-triple Iterable Topology of a Directed Acyclic Graph
Recently, I extended the algorithm to support general directed graphs.
Topological-triple does not only reflect the topology of a given directed graph, but also its structure. It contains not only information about all paths thru the graph in a compressed form, but also organization of its vertices. Therefore it makes sub-graph decomposition readily available. If needed, one could trace all paths simply by iterating through a compact data-structure in the direction of binary relations represented by the edges, or in the reverse direction. Recurring traversals are not necessary even if new edges are inserted. The time complexity of insertion of an edge is linear in terms of the vertices.
The proof of the algorithm will be published in the proceedings of “FICC 2021”. A reference implementation is about 21 KB and less than 1 KLOC in size.
Can a data-structure with these features be useful in HPC space?