This is the final post summarizing the fourth section of the IESP Roadmap document. It is discussing some important crosscutting dimensions of the upcoming exascale systems. It discusses areas of a concern of all users and engineers of exascale system. This section focuses on: (1) Resilience, (2) Power Management, (3) Performance Optimization, and (4) Programmability. Although they are critical it is very difficult to study them independently from all other components of the exascale systems. I think they should be integral parts of each software layer in next generation of HPC systems. They are already thought of in current and past HPC systems, but they are done at very limited scale. For example, power management is considered only at the OS layer and performance optimization at the application level.

Resilience

Original contributors of this subsection are: Franck Cappello (INRIA, FR), Al Geist (ORNL) Sudip Dosanjh (SNL), Marc Snir (UIUC), Bill Gropp (UIUC), Sanjay Kale (UIUC), Bill Kramer (NCSA), Satoshi Matsuoka (TITECH), David Skinner (NERSC)

Before summarizing this section I followed some of the authors and found out an interesting white paper by same authors, except Satoshi Matsuoka, discussing software resilience in more depth. I highly recommend you to read it. Its title is: Toward Exascale Resilience, and you can find it here.

The main upcoming challenge in building resilient systems for the era of exascale computing is the inapplicability of traditional checkpoint/restart techniques. Having millions of threads would consume considerable time, space, and emerging to checkpoint their states. New resilience techniques are required to minimize overheads of resilience. Given this general picture, authors believe the following would be the main drivers of R&D in resilient exascale computing:

  • The increased number of components (hardware and software components) will increase the likelihood of having failures even in short execution tasks.
  • Silent soft errors will become significant and raise the issues of result and end-to-end data correctness.
  • New storage and memory technologies, such as the SSD and phase change memory, will bring with them great opportunities for faster and more efficient state management and check-pointing.

I recommend you to read the authors’ original white paper to read more about these challenges.

Authors also did a quick gap analysis to quickly pinpoint in more details areas of fault-tolerance that need rethinking. Among these points:

  • The most common programming model, MPI, does not offer a paradigm for resilient programming.
  • Most of the present applications and system software are not fault tolerant nor fault aware and are not designed to confine errors/faults.
  • Software layers are lacking communication and coordination to handle faults at different levels inside the system.
  • Deeper analysis of the root causes of different faults is mandatory to find efficient solutions.
  • Efficient verifications of global results from long executions are missing as well.
  • Standardized matrices to measure and compare resilience of different applications against are missing so far.

Authors see many possibilities and a lot of complexities to reach resilient exascale systems. However, they conclude of focusing research in two main threads:

  • Extend the applicability of rollback toward more local recovery.
  • Fault avoidance and fault oblivious software to limit the recovery from rollback.

Power Management

Original contributors of this subsection are: John Shalf (LBNL), Satoshi Matsuoka (TITECH, JP)

Power management for exascale systems is to keep best attainable performance with minimum power consumption. This comprises allocating power to system components actively involved in application or algorithm execution. According to the authors, existing power management infrastructure has been derived from consumer electronic devices, and fundamentally never had large-scale systems in mind. Existence of cross-cutting power management infrastructure is mandatory. Absence of such infrastructure will force the reduction of exascale systems scale and feasibility. For large HPC systems power is part of the total-cost-of-ownership. It will be a critical part of exascale systems management. Accordingly, authors are proposing two alternative R&D strategies:

  • Power down components when they are underutilized. For example, the OS can reduce the frequency and operating voltage of a hardware component when it is not used for relatively long time.
  • Explicitly manage data movement, which is simply avoid unnecessary data movement. This should reduce power consumption in networks, hard-disks, memory, etc.

Authors suggest five main research areas for effective power management inside exascale systems:

  • OS based power management. Authors believe that two changes should be considered: (1) Fair shared resources management among hundreds or thousands of processors on the same machine, (2) Ability to manage power levels for heterogeneous architectures inside the same machine, such as GPGPUs
  • System-Scale Resource Management. Standard interfaces need to be developed allowing millions of cores work in complete synchrony to implement effective power management policies.
  • Algorithms. Power aware algorithms are simply those algorithms that would reduce communication overhead for each FLOP. Libraries should be considered to articulate the tradeoffs between communication, power, and FLOPs.
  • Libraries. According to the authors, library designers need to use their domain-specific knowledge of the algorithm to provide power management and policy hints to the power management infrastructure.
  • Compilers. Compilers should make it easier to program for power management by automatically instrument code for power management.
  • Applications. Applications should provide power aware systems and libraries hints about their power related policies for best power optimization.

Given these possible research areas going across the whole software stack, authors believe that the following should be the key metrics to get effectively manage power consumption of exascale systems:

  • Performance. Ability to predict execution pattern inside applications would help in reducing power consumption while attaining the best possible performance.
  • Programmability. Applications developers are not expected to do power management explicitly inside their applications. Coordination between all layers of the software stack should be possible for power management.
  • Composability. Power management components built by different teams should be able to work in harmony when it comes to power management.
  • Scalability, which requires integration of power management information for system wide power management policies.

Performance Optimization

Original contributors of this subsection are: Brend Mohr (Juelich, DE), Adolfy Hoisie (LANL), Matthias Mueller (TU Dresden, DE), Wolfgang Nagel (Dresden, DE), David Skinner (LBL) Jeffrey Vetter (ORNL)

That’s one of my favorite subsections. Expected increase in hardware and software stack complexity makes performance optimization a very complex task. Having millions or billions of threads working on the same problem requires different ways to measure and optimize performance. Authors believe that these areas are important in performance optimization for exascale systems: statistical profiling, techniques like automatic or automated analysis, advanced filtering techniques, on-line monitoring, clustering and analysis as well as data mining. Also authors believe that self-monitoring, self-tuning frameworks, middle ware, and runtime schedulers, especially at node levels, are necessary. Capturing system’s performance under constraints of power and reliability need to be radically changed. Significant overhead may take place to aggregate performance measurements and analyze them while system is running if not properly designed with the new tools. Authors believe that the complexity of exascale systems makes performance optimization in many configurations beyond humans’ manual abilities to monitor and optimize performance. They see that auto-tuning will be an important technique for performance optimization. Hence, authors believe that research in performance optimization should be directed to these areas:

  • Support for modeling, measurement, and analysis of heterogeneous hardware systems.
  • Support for modeling, measurement and analysis of hybrid programming models (mixing MPI, PGAS, OpenMP and other threading models, accelerator interfaces).
  • Automated / automatic diagnosis / autotuning.
  • Reliable and accurate performance analysis in presence of noise, system adaptation, and faults requires inclusion of appropriate statistical descriptions.
  • Performance optimization for other metrics than time (e.g. power).
  • Programming models should be designed with performance analysis in mind. Software and runtime systems must expose their model of execution and adaptation, and its corresponding performance through a (standardized) control mechanism in the runtime system.

Programmability

Original contributors of this subsection are: Thomas Sterling (LSU), Hiroshi Nakashima (Kyoto U., JP)

Programmability of exascale systems is another critical factor for their success. It is quite difficult to benchmark it and find a baseline to set and measure our objectives in this area. However, authors identified the following basic challenges of systems’ programmability:

  • Massive parallelism though millions or billions of concurrent collaborating threads.
  • Huge number of distributed resources and difficulty of allocation and locality management.
  • Latency hiding by overlapping computations with communications.
  • Hardware Idiosyncrasies. Different models will emerge with significant differences in ISA, memory hierarchy, etc.
  • Portability. Application programs must be portable across machine types, machine scales, and machine generations.
  • Synchronization Bottlenecks of millions of threads trying to synchronize control or data access.
  • Data structures representation and distribution.

If you have read the other postings summarizing rest of these document, you will realize how complicated programmability is. It is cross cutting all the layers of the software stack, starting from ISA & operating systems and ending with applications. Going through author’s suggested research agenda, I found out that they are recommending all R&D directions proposed by the rest of the authors in their corresponding stack layer/component. I would recommend you to read the other related posting to realize challenges waiting for researchers to make exascale systems easier to program and utilize.

This posting is part of a series summarizing the roadmap document of the Exascale Software Project:

Advertisements