Discover Cloud Native community groups and events

Virtual Project Events (Hosted by CNCF)

Kubeflow Virtual Planning Symposium 2025

Capacity: 1000 (Remaining: 610)

virtual

Event date

July 9, 2025

11:00 AM EDT

Location

Virtual event

About this event

This event unites the Kubeflow Leadership, Working Group Leads, Project Leads, and Contributors to define the strategic direction of the Kubeflow Project. It includes updates from Working Groups and Teams, roadmaps, marketing and outreach strategies, and a call to action for increased participation. Join us to learn how Kubeflow community drive GenAI and MLOps/LLMOps innovations in cloud native ecosystem.

Kubeflow 2025 VIRTUAL topics include:

Updates from Working Groups and Teams, roadmaps, marketing and outreach strategies, and a call to action for increased participation.

Kubeflow 101
GenAI
MLOps/LLMOps innovations
New Sub-Projects (Spark Operator, Model Registry)
Proposed New Sub-Projects Projects (Feast, Arrow Cache)
Working Group Updates
Curated Talks
Project Growth + Community Outreach

Agenda

11:00 AM EDT

Opening Remarks - Chase Christensen, Kubeflow Outreach Chair
virtual

Join us as Chase Christensen, one of the Chairs of the Kubeflow Outreach Committee, opens the day with a warm welcome from the Kubeflow community. He’ll introduce Kubeflow, how to get involved, where to share your ideas, and how to connect with the community. We’ll also walk through the day’s schedule and highlight the exciting talks and discussions ahead.
11:05 AM EDT

Bringing Kubeflow Training Local: SDK-Driven “Local-exec” Mode - Saad Zaher & Anna Kramar & Eoin Fennessy, OpenShift AI
virtual

Kubeflow’s Python SDK makes it easy to define and submit training jobs on remote clusters—but developing, debugging, and iterating on your training code still often requires a full Kubernetes round-trip. In this session you’ll learn how the SDK’s new local_exec execution mode lets you run your training job locally on your machine before submitting it to kubernetes.
11:20 AM EDT

Transition Time
virtual

Take five to grab a snack, refill your coffee, or send that quick email while our next speaker gets set up.
11:25 AM EDT

Inferencing LLMs in production with Kubernetes and KubeFlow - Chamod Perera, Circles & Suresh Peiris, Articom.io
virtual

Large Language Models (LLMs) are powerful but deploying them reliably, cost-effectively, and at scale in production is a different challenge altogether. In this session, we’ll walk through how to operationalize LLM inference using Kubeflow on Kubernetes, leveraging open-source and cloud-native tools to build resilient, scalable, and observable GenAI infrastructure.
11:40 AM EDT

Transition Time
virtual

Take a few minutes to stretch, refuel, or catch up on messages while we get ready for the next session.
11:45 AM EDT

Streamline LLM Fine-Tuning on Kubernetes with Kubeflow LLM Trainer - Shao Wang, Kubeflow Maintainer
virtual

Fine-tuning LLMs on Kubernetes is challenging for data scientists due to the complex Kubernetes configurations, diverse fine-tuning techniques, and different distributed strategies like data and model-parallelism.

It’s crucial to hide the complex infrastructure configurations from users, and allow them to gracefully shift among diverse models, datasets, fine-tuning techniques and distributed strategies.

This talk will introduce Kubeflow LLM Trainer, a tool that leverages pre-configured blueprints and flexible configuration overrides to streamline the LLM fine-tuning lifecycle on Kubernetes.

Shao Wang ( Kubeflow WG Training/AutoML) will demonstrate how Kubeflow LLM Trainer integrates with multiple fine-tuning techniques and distributed strategies, while offering a simple yet flexible Python API.

Attendees will see how LLMs can be fine-tuned on Kubernetes with just a single line of code, highlighting how the Kubeflow LLM Trainer streamlines, simplifies, and scales LLM fine-tuning on Kubernetes.
12:15 PM EDT

Break
virtual

We’re taking a 15-minute break! Use this time to grab a bite, take a walk, or recharge before we jump back into the next session.
12:30 PM EDT

Kubeflow for enabling AI Powered Drug Discovery and development in AstraZeneca - Shrinidhi Venkataraman & Nithin R, AstraZeneca
virtual

AstraZeneca’s robust AI platform, Azimuth—their first enterprise cloud-native machine learning platform—relies heavily on Kubeflow to power scalable and efficient AI workflows. In this session, we’ll explore how Kubeflow supports diverse AI use cases, with each project operating in its own dedicated namespace and persistent volumes ensuring durable data storage.

We’ll cover how to enable cross-namespace volume access, build custom Kubeflow notebook images with VS Code and other editors, and use self-hosted GitHub runners to trigger pipelines. You’ll also see how integrations with tools like Grafana, ArgoCD, and Argo Workflows enhance the platform’s functionality.

To maintain security and compliance, custom image governance is enforced through Kyverno policies. Finally, we’ll introduce the GreenOps framework—a set of practices focused on building sustainable AI solutions.

Join us for an in-depth look at how Kubeflow powers enterprise-scale AI at AstraZeneca.
1:00 PM EDT

Transition Time
virtual

We’ll get started in just a few minutes—take five to stretch, top off your drink, or get settled before the next session begins.
1:05 PM EDT

Spark Operator - Feature Engineering with Spark on Kubeflow - Vikas Saxena, RAICS.AI
virtual

Real-world ML rarely deals with clean tables—more often, it involves messy inputs like PDFs, scanned documents, images, ZIP files, and data from enterprise warehouses.

In this session, we’ll explore how to transform that diverse data into model-ready features using Apache Spark with the Kubeflow Spark Operator, all orchestrated through Kubeflow Pipelines.

We’ll walk through how this approach bridges a previous gap in Kubeflow: extracting actionable insights from massive volumes of raw data—hundreds of terabytes—using fully open-source tools and technologies.

Target Audience: Data and ML engineers with basic Spark or Kubernetes experience.
1:35 PM EDT

Transition Time
virtual

We’re taking five! Step away for a moment while we prepare for the upcoming session.
1:40 PM EDT

Simplifying Generative AI Model Training on Kubernetes using Helm Charts - Ajay Vohra & Omri Shiv, AWS
virtual

Training generative AI models on Kubernetes offers a wide range of frameworks, tools, and orchestration options. While this diversity fuels innovation, it also introduces significant complexity.

In this talk, we present a Helm-based approach that simplifies AI model training using Kubeflow Training Operators. This method abstracts much of the underlying complexity while preserving flexibility in choosing training technologies.

Our solution is accelerator-agnostic and provides a consistent YAML interface across various training frameworks. We’ll also introduce a new Kubeflow Pipeline component that enables the construction of complex, end-to-end training workflows using Helm charts.

Through real-world examples, we’ll showcase training pipelines using Accelerate, Ray Train + Lightning, and NVIDIA’s NeMo-Megatron libraries. We’ll also demonstrate automatic scaling of accelerator infrastructure using Karpenter.
2:10 PM EDT

Closing Remarks - Valentina Rodriguez Sosa, Red Hat
virtual

Join Valentina Rodriguez Sosa, one of the Chairs of the Kubeflow Outreach Committee and Principal Architect at Red Hat, as she closes out the day and shares a heartfelt farewell—for now. She’ll highlight upcoming events, calls to action, and ways you can stay involved in the Kubeflow community.

Speakers

Organizers

Discover Cloud Native community groups and events

Kubeflow Virtual Planning Symposium 2025

Opening Remarks - Chase Christensen, Kubeflow Outreach Chair

Bringing Kubeflow Training Local: SDK-Driven “Local-exec” Mode - Saad Zaher & Anna Kramar & Eoin Fennessy, OpenShift AI

Transition Time

Inferencing LLMs in production with Kubernetes and KubeFlow - Chamod Perera, Circles & Suresh Peiris, Articom.io

Transition Time

Streamline LLM Fine-Tuning on Kubernetes with Kubeflow LLM Trainer - Shao Wang, Kubeflow Maintainer

Break

Kubeflow for enabling AI Powered Drug Discovery and development in AstraZeneca - Shrinidhi Venkataraman & Nithin R, AstraZeneca

Transition Time

Spark Operator - Feature Engineering with Spark on Kubeflow - Vikas Saxena, RAICS.AI

Transition Time

Simplifying Generative AI Model Training on Kubernetes using Helm Charts - Ajay Vohra & Omri Shiv, AWS

Closing Remarks - Valentina Rodriguez Sosa, Red Hat