Monitoring Distributed ML Systems at Massive Scale with Spark & Databricks

1.21K subscribers

90 views

About
Share

Published On Streamed live on Jun 26, 2024

In this hands-on workshop, we will explore monitoring and observability for machine learning (ML) and AI systems when using a distributed data processing system like Apache Spark. While these tasks could differ in a distributed environment, WhyLabs is purpose-built for massive production systems and touts a key feature that many other ML monitoring approaches don't offer at scale - mergeability. We'll see first-hand the impact of mergeability on large datasets common to production AI systems.

You'll learn the following in this workshop:
- Basics of distributed ML systems and best practices
- Using Databricks and Apache Spark dataframes
- Collecting ML telemetry using Apache Spark on Databricks

What you'll need to participate:
- Free trial account for Databricks (https://www.databricks.com/try-databr...)
- Free account for WhyLabs (https://www.whylabs.ai/free)

About the Speaker:
Bernease Herman is a Sr. Data Scientist at WhyLabs. At WhyLabs, she is building model and data monitoring solutions using approximate statistics techniques. Earlier in her career, Bernease built ML-driven solutions for inventory planning at Amazon and conducted quantitative research at Morgan Stanley. Her ongoing academic research focuses on evaluation metrics for machine learning and LLMs. Bernease serves as faculty for the University of Washington Master’s Program in Data Science program and as chair of the Rigorous Evaluation for AI Systems (REAIS) workshop series. She has published work in top machine learning conferences and workshops such as NeurIPS, ICLR, and FAccT. She is a PhD student at the University of Washington and holds a Bachelor’s degree in mathematics and statistics from the University of Michigan.

About WhyLabs:
WhyLabs, Inc. (www.whylabs.ai / @whylabs) enables teams to harness the power of AI with precision and control. From Fortune 100 companies to AI-first startups, teams have adopted WhyLabs’ tools to secure and monitor real-time predictive and generative AI applications. WhyLabs’ open source tools and SaaS observability platform surface bad actors, bias, hallucinations, performance decay, data drift, and data quality issues. With WhyLabs, teams reduce manual operations by over 80% and cut down time-to-resolution of AI incidents by 20x.

Published On Streamed live on Jun 26, 2024

Share/Embed

Video Link