header software2

This website uses cookies to manage authentication, navigation, and other functions. By using our website, you agree that we can place these types of cookies on your device.

View e-Privacy Directive Documents

The MSA architecture provides and unprecedented level of flexibility, efficiency and performance by combining modules with different characteristics. Moreover, some module can be also heterogeneous, combining different compute, memory and network devices on the same node. These two levels of intra- and inter-node heterogeneity are hard to leverage with a programming models that rely only on traditional fork-join and/or Single Program Multiple Data (SPMD) execution models.

In the DEEP projects, we have developed a hybrid programming model that leverages the OmpSs-2 dataflow execution model to orchestrate computations and memory transfers between multi-cores and accelerators, as well as, intra-node communications using MPI.

 

CUDA Support

We have extended OmpSs-2 to support CUDA C kernels that can be invoked like regular tasks, easing the development of hybrid applications. The synchronization of CUDA C kernels and other tasks are transparently managed by the runtime and the memory transfers are managed by the hardware if CUDA Unified Memory (UM) is used. Otherwise, the OmpSs-2 runtime uses a software directory and cache to explicitly manage the memory copies between the host and the accelerator and vice-versa. CUDA-enabled libraries such as cuBLASs or cuFFT are also supported.

 

Task-Aware MPI

A clean integration of OmpSs-2 with ParastationMPI has been achieved using the Task-Aware MPI (TAMPI) library. This library improves the interoperability between task-based programming models and MPI by allowing the use of both blocking and non-blocking MPI operations inside tasks. On the one hand, the library avoids potential dead-locks between blocking MPI primitives. On the other hand, non-blocking primitives are directly integrated into the dataflow model, linking the release of the dependencies of a given task to the completion of all non-blocking MPI operations that have been executed inside it.

 

Multi-Core Support

OmpSs-2 has been extended to support a nested dataflow execution model  that relies on fine-grained synchronizations between task nesting-levels to unveil additional parallelism. Additionally, the dataflow execution model has been extended to support task reductions  —including scalar and array types—. Finally, the OmpSs-2 tasking model has been extended to support malleable tasks, which can be executed concurrently by several cores.