Presentation
SIGN IN TO VIEW THIS PRESENTATION Sign In
Advancing Search and Automated Workflows for Light-Source Data: Progress, Challenges, and Future Outlook
DescriptionEnabling automated data annotation and efficient search is critical to workflow automation at experimental user facilities such as the Linac Coherent Light Source (LCLS) that produce large amounts of data annually. Current data annotation methods are primarily manual, limiting scalability. To this end, we investigate the potential of using automated ML pipelines such as Sciencesearch for the task of generating metadata from unstructured text sources such as experiment descriptions and logbook entries. Early results demonstrate that natural language processing pipelines can effectively produce good keywords, paving the way for making light source data searchable. We identify critical challenges — data sharing policies that hinder access to data, lack of heterogeneity in logbook formats, vocabulary drift, and the evolving role of generative AI - that must be addressed. We also propose some potential short and long-term solutions to these challenges, with the long-term goal of improving metadata management for AI-enabled workflows.
