In this blog post I’m summarizing the two remaining subsections of the development environment: numerical libraries, and Debugging tools. In my last posting I summarized the first three subsections: programming models, frameworks, and compilers.

Numerical Libraries

Original Contributors are: Jack Dongarra (U. of Tennessee), Bill Gropp (UIUC), Mike Heroux (SNL), Anne Trefethen (Oxford U., UK) Aad van der Steen (NCF, NL)

Numerical libraries are necessary to separate HPC applications developers from a lot of architectural details. However, numerical libraries can be considered as reusable applications. Hence, their technology drivers are the common drivers mentioned in the programming model and systems area. Specifically, numerical libraries are driven by: hybrid architectures, programming models, accuracy, fault detection, energy budget, memory hierarchy and the relevant standards. Authors believe that numerical libraries have three main R&D alternatives:

  • Message Passing Libraries
  • Develop Global Address Space Libraries
  • Message Driven Work Queues

Authors believe that “the existing numerical libraries will need to be rewritten and extended in the light of the emerging architectural changes.” Therefore, the following should be the R&D agenda for the coming 10 years:

  • Develop hybrid and hierarchal-based libraries/applications.
  • Develop auto-tuning mechanisms inside these libraries to get best possible performance.
  • Consider fault oblivious and error tolerant implementations.
  • Link precision and performance with energy saving
  • Build algorithms that adapt with the architectural properties with minimal or no changes.
  • Build algorithms that minimize communication inside these parallel libraries.
  • Enhance current shared memory models as they will continue in parallel programming for many years to come.


Debugging Tools

Original contributors of this subsection are: David Skinner (LBL), Wolfgang Nagel (Dresden, DE).

Massive parallelism inside Exascale systems is making a big challenge for debugging tools. Debugging tools are built to help discovering unintended behavior inside applications and assist in solving it. However, inside applications running on thousands of cores with thousands of concurrent threads it is a very difficult task. Debuggers may have to look into the operating system and possible hardware interactions that may cause failures inside applications. The authors believe that the following are the technology drivers for the debugging tools:

  • Concurrency driven overhead in debugging
  • Scalability of debugger methodologies (data and interfaces)
  • Concurrency scaling of the frequency of external errors/failures
  • Heterogeneity and lightweight operating systems

The authors did not talk about R&D alternatives for debugging tools. However, they stressed on two possible improvement areas: (1) Include debugging tools and APIs at each layer, which can emit necessary debugging information to the upper layers, and (2) Integrate debugging information from all layers and provide different debugging contexts to pinpoint root causes of bugs. The authors then move into their suggested research agenda and propose these research areas for Exascale debugging tools:

  • Finding ways to cluster or group threads for easier monitoring and root cause identification.
  • Interoperate debugging tools with fault-tolerance mechanisms. This should guarantee continuous debugging, which makes it easier to find a bug that is very hard to reproduce.
  • Vertical integration of debug and performance information across software layers.
  • Automatically triggered debugging – Instead of debugging test cases in a separate session, some exascale debugging must be delivered just-in-time as problems unfold.


Next Time

In my next posting I will summarize the third section of this document, the applications. It is an interesting section for end users since it discusses challenges in algorithms, data analysis and visualization, and scientific data management in the exascale computing. It is a pragmatic perspective for potential exascale users.

This posting is part of a series summarizing the roadmap document of the Exascale Software Project: