Close

Presentation

Compute System Simulator: Modeling the Impact of Allocation Policy and Hardware Reliability on HPC Cloud Resource Utilization
DescriptionWe have developed a comprehensive simulation tool to model the launching, progression, and completion of virtual machines and corresponding workloads within a cloud cluster of arbitrary size. The simulator employs various policies to allocate computational resources for these virtual machines, simulates hardware failures and workload interruptions, and reallocates new resources as needed. The primary goal of this work is to test the interaction of allocation policy design with various types of hardware failures, analyzing the expected resource utilization and workload delay in these scenarios. The modular design of the simulator provides the framework for implementing and analyzing cutting-edge allocation policies as they emerge. Through a series of experiments, the simulator demonstrates the effectiveness of different policies in managing resource allocation amidst failing hardware, providing valuable insights into the optimization of cloud infrastructure and the development of resilient resource management strategies.