Close

Presentation

Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling
DescriptionHigh-Performance Computing job scheduling involves balancing conflicting objectives such as minimizing makespan, reducing wait times, optimizing resource use, and ensuring fairness. Heuristic-based methods, e.g., FJFS and SJF or intensive optimization techniques, often lack adaptability to dynamic workloads and cannot simultaneously optimize multiobjectives in HPC systems. We propose a novel LLM-based scheduler using a ReAct-style framework, enabling iterative, interpretable decision-making. It incorporates a scratchpad memory to track scheduling history and refine decisions via natural language feedback, while a constraint enforcement module ensures feasibility and safety.
We evaluate our approach using OpenAI's O4-Mini and Anthropic's Claude 3.7 across seven workload scenarios; heterogeneous mixes, bursty patterns, etc. The comparisons reveals that LLM-based scheduling effectively balances multiple objectives while offering transparent reasoning through natural language traces. The method excels in constraint satisfaction and adapts to diverse workloads without domain-specific training. However, a trade-off between reasoning quality and computational overhead challenges real-time deployment.