Close

Presentation

Leveraging Large Language Models for Property Prediction in Polymorphic Organic Semiconductors
DescriptionOrganic semiconductors (OSCs) are promising for next-generation electronics, but polymorphism complicates accurate property prediction and makes traditional methods costly. We investigate transformer-based large language models (LLMs) for predicting energy gaps in polymorphic OSC crystals. A Pegasus-managed workflow is deployed across heterogeneous hardware (PSC Bridges-2 and Neocortex Cerebras CS-2) to evaluate three crystal text encodings: Materials String, SLICES, and SLICES-PLUS against a baseline XGBoost Regressor model. The results show that the LLM-analyzed Materials String achieves the highest accuracy, particularly in polymorph-rich datasets, outperforming other representations in both pretraining efficiency and downstream tasks, as well as the baseline XGBoost results. These findings highlight the potential of LLM-driven crystal encodings to accelerate materials discovery and enable the scalable, data-driven design of organic semiconductors.