Research: finding a way to advance machine learning techniques for HPC optimizations

Goal

We are working on building a highly efficient adaptation of the MPDATA algorithm for the extremely powerful hybrid (CPU+GPU) node provided by our partner – Megware. The goal is to find the best configuration of the MPDATA algorithm regarding the user accepted criterion: execution time and energy consumption.

Search space

  • The MPDATA is a finite-difference solver for geophysical applications. It belongs to a class of methods of the numerical simulation of fluid flows based on the sign-preserving properties of upstream differencing.
  • Our MPDATA implementation represents a highly parameterized code that can be configured in many ways, with more than 1B possible configurations. Example parameters are: nodes count, accelerators per node, cpu cores, processors/nodes topology, memory policy, memory alignment, buffering types, streams count, and many others.

Research

  • We are developing a machine learning module in order to select the most fitting configuration. This module utilizes, among the others, the supervised learning method with the random forest algorithm. The main functionality of the module is to prune the search space in order to eliminate the worst configurations. In this way we achieve a small set (of size 100-250) that at 90% contains the best configuration.
  • The adaptation is performed for the node equipped with 2xIntel Xeon CPU E5-2680 and 4xNVIDIA Tesla P100-PCIE GPU. Each CPU contains 14 cores. In addition, each of four Tesla P100 contains 3584 CUDA cores, 16GB of CoWoS HBM2 Stacked Memory with 732GB/s memory bandwidth. The peak performance of double precision arithmetic per a single GPU is 4.7 TeraFLOPS.
  • To extract the full power of the device, we utilize a hybrid code that is a mixture of OpenMP and CUDA standards.
  • To make an efficient adaptation, we use machine learning methods, which allow us to calibrate the code taking into account all the features of the CPU+GPU architecture.

The implementation details and performance results of our work will be widely described in our journal paper with a working name: Performance-Energy tradeoff for CPU-GPU simulations.