Contract type : Fixed-term contract
Level of qualifications required : PhD or equivalent
Fonction : Post-Doctoral Research Visit
Grenoble Rhône-Alpes Research Center groups together a few less than 800 people in 39 research teams and 8 research support departments.
Staff is localized on 5 campuses in Grenoble and Lyon, in close collaboration with labs, research and higher education institutions in Grenoble and Lyon, but also with the economic players in these areas.
Present in the fields of software, high-performance computing, Internet of things, image and data, but also simulation in oceanography and biology, it participates at the best level of international scientific achievements and collaborations in both Europe and the rest of the world.
The research context for this work is the Joint Laboratory for Extreme Scale Computing (JLESC) (https://jlesc.github.io/), and more particularly the cooperation between Inria and ANL (https://jlesc.github.io/projects/energy_autonomic/).
Behaviour of HPC systems, such as performance, power consumption, thermal distribution, are increasingly hard to predict due to manufacturing process variations and dynamic voltage management in processors (dark silicon), which conflicts with conventional bulk parallelism software models such as MPI. Additionally computing
facilities are now interested in limiting computing resources by power/energy budget rather than CPU time. This idea is negatively affected by unpredictable behavior of modern HPC systems. Hardware-level power-limiting capability is becoming a standard feature in modern processors and accelerators. Intel running average power limit (RAPL), for example, adjusts CPU core frequency (and CPU states if needed) periodically in order to limit the CPU power consumption not to exceed user-specified per-socket power budget. While RAPL, in fact, is an excellent mechanism for independent workloads, there are multiple considerations for HPC usage. These considerations range from variation in performance across sockets/nodes for the same power budget, synchronizing and balancing control across nodes involved in the same computation, and power balancing between compute elements of the same nodes. The effective range of user-specifiable power budget can also be limited, issues around extreme values (dip in efficiency at low power).
The work will contribute a software-level solution to provide additional control mechanisms that increase hardware power-limiting features and work for multiple nodes by applying Autonomic Computing techniques and targeting HPC workloads.
The adopted approach features the use of Control Theory, to identify significant sensors and actuators (e.g., switching on/off cores, changing CPU frequency), build adaptive models of the process to be controlled, and
define robust adaptive control objectives w.r.t. Appropriate metrics.
The objective is then to design and experimentally validate controllers regulating consumption while ensuring performance. Variants will be explored, beyond simple threshold-based control e.g., predictive control to avoid overshooting or adaptive control for robustness to the natural variability, considering costs of actions, avoiding oscillations and over-reaction, handle multiple criteria, coordinate multiple autonomic loops. Previous experience in Cloud-oriented Autonomic Computing will be generalized, while exploring novel issues in adapting such approaches to specificities of HPC and power management.
The work will benefit from preliminary experience at ANL on instrumentation of HPC applications, and on a existing infrastructure for runtime adaptation of HPC systems on Intel processors. This infrastructure (called Argo Node Resource Manager) also offers the ability the offload control to a remote jupyter notebook, allowing
collaborators to independently design and implement their own control algorithm using a simple interface and a high level language. This infrastructure can be deployed on a wide range of servers, and can be extended with new sensors and actuators easily.
Short term plans are to finalize the design of some of the controllers and their implementation, in order to integrate them in the NRM platform for experimental evaluation. The plan is then to consider more elaborate
control problems, taking into account variables other than power in the system (e.g. thermal aspects), and to consider more elaborate control techniques, to obtain controllers that are more robust or give a more
efficient use of the system.
On a longer term, another feedback loop approach for HPC will be considered at a higher-level than processor level, starting with complex node topologies that include multiple independently controlled compute elements (processors and accelerators). This will lead to exploring the coordination of multiple management feedback loops, and opportunities in combining control-based methods and scheduling-based techniques for the general problem of runtime allocation and placement.
experience in research amongst the following :
- High Performance Computing,
- Self-adaptive systems and Autonomic Computing,
- Control of Computing Systems
2 653 euros gross salary / month before income taxes.
An interest for cooperation with researchers in other teams, with expertise on other domains, and development and experimental work is particularly relevant to this position.
Inria is the French national research institute dedicated to digital science and technology. It employs 2,600 people. Its 200 agile project teams, generally run jointly with academic partners, include more than 3,500 scientists and engineers working to meet the challenges of digital technology, often at the interface with other disciplines. The Institute also employs numerous talents in over forty different professions. 900 research support staff contribute to the preparation and development of scientific and entrepreneurial projects that have a worldwide impact.
Defence Security :
This position is likely to be situated in a restricted area (ZRR), as defined in Decree No. 2011-1425 relating to the protection of national scientific and technical potential (PPST).Authorisation to enter an area is granted by the director of the unit, following a favourable Ministerial decision, as defined in the decree of 3 July 2012 relating to the PPST. An unfavourable Ministerial decision in respect of a position situated in a ZRR would result in the cancellation of the appointment.
Recruitment Policy :
As part of its diversity policy, all Inria positions are accessible to people with disabilities.
Warning : you must enter your e-mail address in order to save your application to Inria. Applications must be submitted online on the Inria website. Processing of applications sent from other channels is not guaranteed.Apprenez-en davantage
|Intitulé||2020-02816 - Post-Doctoral Research Visit F/M Improving the Performance and Energy Efficiency of HPC Applications Using Autonomic Computing Techniques|
|Job location||655 Avenue de l’Europe - CS 90051, 38334 Montbonnot CEDEX|
|Publié||juillet 8, 2020|
|Date limite d'inscription||septembre 7, 2020|
|Types d'emploi||Post doc  |
|Domaines de recherche :||Architecture informatique,   Informatique dans les mathématiques, les sciences naturelles, l'ingénierie et la médecine,   Informatique distribuée,   Systèmes d'information (informatique de gestion),   Langages de programmation,   Génie logiciel,   Ingénierie des systèmes de commande,   Ingénierie informatique  |