Welcome to the sixth part summarizing the roadmap document of the international exascale software project (IESP). This part is moving higher in the software stack and discussed the challenges, technology drivers and suggested research agenda for the exascale applications. It focuses on algorithms, data analysis and visualization, and scientific data management.


Original authors of this subsection are: Bill Gropp (UIUC), Fred Streitz (LLNL), Mike Heroux (SNL), Anne Trefethen (Oxford U., UK)

Authors see the following important drivers for the exascale systems algorithms:

  • Scalability to millions or billions of threads working collaboratively or independently.
  • Fault tolerance and fault resilience. Algorithms should be able to detect faults inside systems and recover from them with least possible resources away from the current traditional check pointing techniques.
  • Match algorithm’s compute demands with available capabilities. Authors are mainly focusing on power aware algorithms. This is attainable by considering accurate performance modeling, utilizing heterogeneous architectures, awareness of memory hierarchy and performance constraints of each component inside the exascale systems.

Authors are then discussing the critical challenges. However, they stress the importance of developing performance models against which algorithm developments can be evaluated.

I could not find in this section clear research agenda. However, authors are suggesting the following critical challenges for applications in the era of exascale computing:

  • Gap analysis – need to perform a detailed analysis of the applications, particularly with respect to quantitative models of performance and scalability.
  • Scalability, particularly relaxing synchronization constraints
  • Fault tolerance and resilience, including fault detection and recovery
  • Heterogeneous systems – algorithms that are suitable for systems made of functional units with very different abilities

Data Analysis and Visualization

Original contributors of this subsection are: Michael E. Papka (ANL), Pete Beckman (ANL), Mark Hereld (ANL), Rick Stevens (ANL), John Taylor(CSIRO, Australia)

According to Wikipedia Visualization is the transformation, selection or representation of data from simulations or experiments, with an implicit or explicit geometric structure, to allow the exploration, analysis and understanding of the data. As simulations become more complex generating more data, visualization is gaining more importance in scientific and high performance computing. The authors accordingly believe that the following are the possible alternative R&D options:

  • Develop new analysis and visualization algorithms to fit well within the new large and complex architectures of exascale computing.
  • Identify new mathematical and statistical approaches for data analysis
  • Develop integrated adaptive techniques to enable on the fly and learned pattern performance optimization from fine to coarse grain.
  • Expand the role of supporting visualization environments to include more pro-active software: model and goal aware agents, estimated and fuzzy results, and advanced feature identification.
  • would be advantageous to invest in methods and tools for
  • Keep track of the processes and products of exploration and discovery. These will include aids to process navigation, hypothesis tracking, workflows, provenance tracking, and advanced collaboration and sharing tools.
  • Plan deployment of global system of large scale high resolution (100 Mpixel) visualization and data analysis systems based on open source architecture to link universities and research laboratories

Hence, authors are shooting for the following R&D strategy to make data analysis and visualization develop at the right pace of developing the new exascale systems:

  • Identification of features of interest in exabytes of data
  • Visualization of streams of exabytes of data from scientific instruments
  • Integrating simulation, analysis and visualization at the exascale

Scientific Data Management

Original contributor of this subsection is: Alok Choudhary (Northwestern U.)

Managing scientific data has been identified by the scientific community as one of the most important emerging needs because of the sheer volume and increasing complexity of data. Effectively generating, managing, and analyzing this information requires a comprehensive, end-to-end approach to data management that encompasses all of the stages from the initial data acquisition to the final analysis of the data. Although file systems are scalable now to store huge number of files and scientific data, this is considered not enough for the scientific computing community. Another layer should be built on top of these file systems to provide easy tools to store, retrieve and analyze huge data sets generated by the scientific analysis algorithms.

Accordingly, the author is suggesting considering these alternatives and R&D strategies to build effective scientific data management tools and frameworks:

  • Building new data analysis and mining tools for knowledge discovery from massive datasets produced and/or collected.
  • Designing scalable workflow tools with easy-to-use interfaces would be very important for exascale systems both for performance and productivity of scientists as well as effective use of these systems.
  • Investigate new approaches to build database systems for scientific computing that scale in performance, usability, query, data modeling and an ability to incorporate complex data types in scientific applications; and that eliminate the over-constraining usage models which are impediments to scalability in traditional databases.
  • Design new Scalable Data Format and High-level Libraries without binding these efforts with the backward compatibility. Exascale systems will use different I/O architectures and different processing paradigm. Spending significant effort on backward compatibility would put these systems’ viability on the edge.
  • New technology is required for efficient and scalable searching and filtering of large-scale scientific multivariate datasets with hundreds of searchable attributes to deliver the most relevant data and results would be important.
  • Wide-area data access is becoming an increasingly important part of many scientific workflows. In order to most seamlessly interact with wide-area storage systems, tools must be developed that can span various data management techniques across wide area integrated with scalable I/O, workflow tools, query and search techniques.

This posting is part of a series summarizing the roadmap document of the Exascale Software Project: