.avif)
.avif)
EvalOps Unfiltered

LLMs behave differently with the slightest prompt tweak, context change, or input variation. If you're building anything real with GenAI, you already know the outputs can surprise you: and not in a good way. That’s why testing isn’t optional: it’s essential.
EvalOps Unfiltered is a practical event series for GenAI teams tackling the real-world challenges of evaluating LLM applications. Focused on the emerging field of EvalOps, it goes beyond benchmarks to address unpredictable model behavior, adversarial risks, and production readiness.
Each session features live experiments, tool deep-dives, breakout discussions, and honest conversations about what truly works when deploying LLMs.
What to expect (on 17. Sept):
🔧 Lightning talks from three teams presenting their real testing challenges — the kind that don't show up in research papers
🧠 Breakout sessions where you'll dig deep into one challenge, discuss solutions, share experiences, and test ideas with fellow builders
🍺 Drinks while the conversations continue
No panels, no pitches — just builders sharing what's actually broken and collaborating on what might work. This isn't about theory. It's about the unglamorous, critical work of making Gen AI systems reliable enough for the real world.
Target Audience:
- Gen AI engineers wrestling with evaluation pre-release
- Technical leads managing LLM-powered products
- Data scientists designing and fine-tuning LLM-based applications
- Product owners responsible for delivering reliable AI-driven features
🎟️ Apply to attend below!
Become a part of the AI Campus.
There are many ways to join our community. Sign up to our newsletter below, or select one of the other two options and get in touch with us: