Discover Cloud Native community groups and events
Kubeflow Virtual Planning Symposium 2025
This event unites the Kubeflow Leadership, Working Group Leads, Project Leads, and Contributors to define the strategic direction of the Kubeflow Project. It includes updates from Working Groups and Teams, roadmaps, marketing and outreach strategies, and a call to action for increased participation. Join us to learn how Kubeflow community drive GenAI and MLOps/LLMOps innovations in cloud native ecosystem.
Kubeflow 2025 VIRTUAL topics include:
Updates from Working Groups and Teams, roadmaps, marketing and outreach strategies, and a call to action for increased participation.
-
Kubeflow 101
-
GenAI
-
MLOps/LLMOps innovations
-
New Sub-Projects (Spark Operator, Model Registry)
-
Proposed New Sub-Projects Projects (Feast, Arrow Cache)
-
Working Group Updates
-
Curated Talks
-
Project Growth + Community Outreach
-
11:00 AM EDT
Opening Remarks - Chase Christensen, Kubeflow Outreach Chair
virtualJoin us as Chase Christensen, one of the Chairs of the Kubeflow Outreach Committee, opens the day with a warm welcome from the Kubeflow community. He’ll introduce Kubeflow, how to get involved, where to share your ideas, and how to connect with the community. We’ll also walk through the day’s schedule and highlight the exciting talks and discussions ahead.
-
11:05 AM EDT
Bringing Kubeflow Training Local: SDK-Driven “Local-exec” Mode - Saad Zaher & Anna Kramar & Eoin Fennessy, OpenShift AI
virtualKubeflow’s Python SDK makes it easy to define and submit training jobs on remote clusters—but developing, debugging, and iterating on your training code still often requires a full Kubernetes round-trip. In this session you’ll learn how the SDK’s new local_exec execution mode lets you run your training job locally on your machine before submitting it to kubernetes.
-
11:20 AM EDT
Transition Time
virtualTake five to grab a snack, refill your coffee, or send that quick email while our next speaker gets set up.
-
11:25 AM EDT
Inferencing LLMs in production with Kubernetes and KubeFlow - Chamod Perera, Circles & Suresh Peiris, Articom.io
virtualLarge Language Models (LLMs) are powerful but deploying them reliably, cost-effectively, and at scale in production is a different challenge altogether. In this session, we’ll walk through how to operationalize LLM inference using Kubeflow on Kubernetes, leveraging open-source and cloud-native tools to build resilient, scalable, and observable GenAI infrastructure.
-
11:40 AM EDT
Transition Time
virtualTake a few minutes to stretch, refuel, or catch up on messages while we get ready for the next session.
-
11:45 AM EDT
Streamline LLM Fine-Tuning on Kubernetes with Kubeflow LLM Trainer - Shao Wang, Kubeflow Maintainer
virtualFine-tuning LLMs on Kubernetes is challenging for data scientists due to the complex Kubernetes configurations, diverse fine-tuning techniques, and different distributed strategies like data and model-parallelism.
It’s crucial to hide the complex infrastructure configurations from users, and allow them to gracefully shift among diverse models, datasets, fine-tuning techniques and distributed strategies.
This talk will introduce Kubeflow LLM Trainer, a tool that leverages pre-configured blueprints and flexible configuration overrides to streamline the LLM fine-tuning lifecycle on Kubernetes.
Shao Wang ( Kubeflow WG Training/AutoML) will demonstrate how Kubeflow LLM Trainer integrates with multiple fine-tuning techniques and distributed strategies, while offering a simple yet flexible Python API.
Attendees will see how LLMs can be fine-tuned on Kubernetes with just a single line of code, highlighting how the Kubeflow LLM Trainer streamlines, simplifies, and scales LLM fine-tuning on Kubernetes.
-
12:15 PM EDT
Break
virtualWe’re taking a 15-minute break! Use this time to grab a bite, take a walk, or recharge before we jump back into the next session.
-
12:30 PM EDT
Kubeflow for enabling AI Powered Drug Discovery and development in AstraZeneca - Shrinidhi Venkataraman & Nithin R, AstraZeneca
virtualAstraZeneca’s robust AI platform, Azimuth—their first enterprise cloud-native machine learning platform—relies heavily on Kubeflow to power scalable and efficient AI workflows. In this session, we’ll explore how Kubeflow supports diverse AI use cases, with each project operating in its own dedicated namespace and persistent volumes ensuring durable data storage.
We’ll cover how to enable cross-namespace volume access, build custom Kubeflow notebook images with VS Code and other editors, and use self-hosted GitHub runners to trigger pipelines. You’ll also see how integrations with tools like Grafana, ArgoCD, and Argo Workflows enhance the platform’s functionality.
To maintain security and compliance, custom image governance is enforced through Kyverno policies. Finally, we’ll introduce the GreenOps framework—a set of practices focused on building sustainable AI solutions.
Join us for an in-depth look at how Kubeflow powers enterprise-scale AI at AstraZeneca.
-
1:00 PM EDT
Transition Time
virtualWe’ll get started in just a few minutes—take five to stretch, top off your drink, or get settled before the next session begins.
-
1:05 PM EDT
Spark Operator - Feature Engineering with Spark on Kubeflow - Vikas Saxena, RAICS.AI
virtualReal-world ML rarely deals with clean tables—more often, it involves messy inputs like PDFs, scanned documents, images, ZIP files, and data from enterprise warehouses.
In this session, we’ll explore how to transform that diverse data into model-ready features using Apache Spark with the Kubeflow Spark Operator, all orchestrated through Kubeflow Pipelines.
We’ll walk through how this approach bridges a previous gap in Kubeflow: extracting actionable insights from massive volumes of raw data—hundreds of terabytes—using fully open-source tools and technologies.
Target Audience: Data and ML engineers with basic Spark or Kubernetes experience.
-
1:35 PM EDT
Transition Time
virtualWe’re taking five! Step away for a moment while we prepare for the upcoming session.
-
1:40 PM EDT
Simplifying Generative AI Model Training on Kubernetes using Helm Charts - Ajay Vohra & Omri Shiv, AWS
virtualTraining generative AI models on Kubernetes offers a wide range of frameworks, tools, and orchestration options. While this diversity fuels innovation, it also introduces significant complexity.
In this talk, we present a Helm-based approach that simplifies AI model training using Kubeflow Training Operators. This method abstracts much of the underlying complexity while preserving flexibility in choosing training technologies.
Our solution is accelerator-agnostic and provides a consistent YAML interface across various training frameworks. We’ll also introduce a new Kubeflow Pipeline component that enables the construction of complex, end-to-end training workflows using Helm charts.
Through real-world examples, we’ll showcase training pipelines using Accelerate, Ray Train + Lightning, and NVIDIA’s NeMo-Megatron libraries. We’ll also demonstrate automatic scaling of accelerator infrastructure using Karpenter.
-
2:10 PM EDT
Closing Remarks - Valentina Rodriguez Sosa, Red Hat
virtualJoin Valentina Rodriguez Sosa, one of the Chairs of the Kubeflow Outreach Committee and Principal Architect at Red Hat, as she closes out the day and shares a heartfelt farewell—for now. She’ll highlight upcoming events, calls to action, and ways you can stay involved in the Kubeflow community.