Presentation
PhySiViT: A Physics Simulation Vision Transformer
DescriptionModern scientific computing generates massive simulation data across physics domains, yet researchers lack general-purpose tools for efficient analysis. While vision transformers like CLIP and DINO have revolutionized natural image analysis, no equivalent exists for physics simulation data. This project trains a custom vision transformer on “the Well” dataset, a 15 TB collection of diverse physics simulations. Using only 7 million images (compared to >100 million for CLIP/DINOv2), we trained our physics foundation model in 22 hours on a single Cerebras CS-3 server. Despite reduced training scale, our model demonstrates competitive classification performance while exceeding at physics-specific tasks: temporal forecasting (𝑅2 = 0.33 vs. DINOv2’s 0.23) and physics clustering (silhouette score = 0.232 vs. DINOv2’s 0.195). This work demonstrates that efficient, domain-focused foundation models can achieve better performance in specialized scientific domains.

Event Type
Research and ACM SRC Posters
TimeThursday, 20 November 20258:00am - 5:00pm CST
LocationSecond Floor Atrium
Archive
view
