Close

Presentation

Job Grouping-Based Intelligent Resource Recommendation Framework
DescriptionIn the current large-scale computing systems, users from various scientific backgrounds submit batch jobs with a set of requested resources. Manual resource selection in HPC facilities leads to early job terminations and out-of-memory errors due to underestimation of resources, or compute and memory resources sitting idle because of overallocation. In this work, we provide a recommendation framework based on job grouping and intelligent prediction methods to provision HPC application resource needs before they are submitted to the system. Our work achieves less than 2\% of cases experiencing underpredicted resource requests, and results in fewer overestimations compared to the baseline methods. We also implement a module to deploy the framework on a real HPC system, which comprises the future plans of this work.