Acciones de Documento

Defensa de la tesis doctoral de Juan Antonio Rico, dirigida por Juan Carlos Díaz del grupo de investigación GIM - viernes 29 de enero

Lugar:  Salón de Actos del Edificio de Institutos Universitarios de Investigación

 

Fecha:  29 de enero de 2016, 12:00h.

 

Director:  Juan Carlos Díaz Martín

 

Tribunal:

Jesús Carretero Pérez (Universidad Carlos III)

José Daniel García Sánchez (Universidad Carlos III)

Alexey L. Lastovetsky (University College Dublin)

Antonio Plaza Miguel (Universidad de Extremadura)

José Luis González Sánchez (Centro de Supercomputación de Extremadura)

 

Autor:

Juan Antonio Rico Gallego

Departamento de Ingeniería de Sistemas Informáticos y Telemáticos.

 

Título:

τ–Lop: Scalably and Accurately Modeling Contention and Mapping Effects in Multi-core Clusters

Resumen:

Modern HPC multi-core platforms are complex systems composed of heterogeneous processors and a hierarchy of shared communication channels. Achieving optimal performance of MPI applications on those platforms is not trivial. Formal analysis using parallel performance models contributes to depict algorithms behavior and communication complexities, with the goal of predicting their cost and improving their performance.

Current accepted communication models, as the representative LogGP, were initially conceived to predict the cost of algorithms in mono-processor clusters as a sequence of point-to-point transmissions characterized by network latency and bandwidth parameters. Although multiple extensions have been proposed for covering issues derived from current platforms complexities, as contention and channels hierarchy, such specific extensions are not enough to meaningfully and accurately model more than simple algorithms. As modern supercomputers are built upon cheap commodity boards with a growing number of cores, intra-node communication becomes progressively more relevant, as well as the derived contention in the communication channels. These heterogeneous high performance computing platforms need new approaches for the communication performance modeling to address their complexities.

This work unveils the reasons for the poor fit of the cited representative models in this domain, and proposes a new model named τ–Lop, which addresses the challenge of accurately modeling MPI communications on heterogeneous multi-core clusters. τ–Lop is based on the concept of concurrent transfers, and applies it to meaningfully represent the behavior of algorithms in platforms with hierarchical shared communication channels, taking into account the effects of contention and deployment of processes on the processors. It demonstrates the ability to predict the cost of advanced algorithms and communication mechanisms used by mainstream MPI implementations, such as MPICH or Open MPI, with a high accuracy. In addition, an exhaustive and reproducible methodology for measuring the parameters of the model is described.