This page contains press release content distributed by XPR Media. Members of the editorial and news staff of the USA TODAY Network were not involved in the creation of this content.

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

New benchmark shows top LLMs achieve only 29% pass rate on OpenTelemetry instrumentation, exposing the gap between coding ability and real-world SRE work.

OTelBench shows that while LLMs are impressive at generating code snippets, they’re not yet capable of the cross-cutting reasoning required for production engineering.”

— Jacek Migdał, founder of Quesma

WARSAW, POLAND, January 20, 2026 /EINPresswire.com/ — Quesma, Inc. announced the release of OTelBench, the first comprehensive benchmark for evaluating LLMs on OpenTelemetry instrumentation tasks. The open-source dataset tests 14 state-of-the-art models across 23 real-world tasks in 11 programming languages, revealing significant gaps in AI’s ability to handle production-grade Site Reliability Engineering (SRE) work.

While frontier LLMs have demonstrated impressive coding capabilities, the benchmark reveals a stark reality: the best-performing model, Claude Opus 4.5, achieved only a 29% pass rate on OpenTelemetry instrumentation tasks, compared to 80.9% pass rate in the SWE-Bench. This gap highlights a critical distinction between writing code and performing the complex, cross-cutting engineering work required for production systems.

The $1.4 Million Per Hour Problem
Enterprise outages cost an average of $1.4 million per hour, making production visibility mission-critical. Distributed tracing, the gold standard for debugging complex microservices, allows teams to link user actions to every underlying service call. However, implementing this visibility remains difficult, with 39% of organizations citing complexity as their top observability obstacle. OpenTelemetry has emerged as the industry standard with backing from 1,100+ organizations, yet configuring it correctly remains a major source of toil for SRE teams.

Fundamental Limitations Exposed
The benchmark tested models on agentic coding tasks where they were given source code from realistic applications, an interactive Linux terminal, and clear instrumentation objectives. The results revealed several critical failure modes:

Context propagation, passing trace context between services to maintain parent-child span relationships, proved to be an insurmountable barrier for most models. This is particularly concerning because context propagation is fundamental to distributed tracing.

“The backbone of the software industry consists of complex, high-scale production systems with mission-critical reliability, and seasoned engineers are architecting, evolving, and troubleshooting them,” said Jacek Migdał, founder of Quesma. “OTelBench shows that while LLMs are impressive at generating code snippets, they’re not yet capable of the cross-cutting reasoning and sustained problem-solving required for production engineering. This gap matters because many vendors are marketing AI SRE solutions with bold claims but no independent verification. We need benchmarks like this to separate reality from hype.”

Language Ecosystems Matter
Success rates varied dramatically across programming languages, revealing that AI generalization is far weaker than human engineers. Models had some moderate success with Go and, quite surprisingly, C++. A few tasks were completed for JavaScript, PHP, .NET, and Python. Just a single model solved a single task in Rust. None of the models solved a single task in Swift, Ruby, or (to our biggest surprise, due to a build issue) – Java.

Why This Matters for AI Development
OTelBench reveals several reasons why OpenTelemetry instrumentation challenges current LLMs:
– Reliability-critical applications reside in private repositories at companies like Apple, Airbnb, and Netflix, limiting training data.
– Instrumentation requires cross-cutting changes across codebases, rather than sequential additions.
– Some tasks required 50+ commands over 10+ minutes. Models consistently performed worse as tasks lengthened.

Migdał added, “AI SRE in 2026 is what DevOps Anomaly Detection was in 2016—lots of marketing, huge budgets, but lacking independent benchmarks. Just as SWE-Bench became the standard for coding evaluation, we need SRE-style benchmarks to determine what actually works. That’s why we’re releasing OTelBench as open-source: to create a North Star for navigating the AI hype and to enable the community to track real progress.”

A Path Forward
Despite the challenges, the benchmark reveals promising signals. Claude Opus 4.5, GPT-5.2, and Gemini 3 models show capability on specific tasks, with go-otel-microservices-traces reaching a 52% pass rate. With more environments for Reinforcement Learning with Verified Rewards, OpenTelemetry instrumentation appears to be a solvable problem for future AI systems.

Until then, organizations requiring distributed tracing across services should expect to write that code themselves—or work alongside AI assistants that understand their limitations.

OTelBench is available today as an open-source project at https://quesma.com/benchmarks/otel/, enabling researchers and practitioners to reproduce results and contribute additional test cases.

Lucie Šimečková
Quesma
press@quesma.com

Legal Disclaimer:

EIN Presswire provides this news content “as is” without warranty of any kind. We do not accept any responsibility or liability
for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this
article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Information contained on this page is provided by an independent third-party content provider. XPRMedia and this Site make no warranties or representations in connection therewith. If you are affiliated with this page and would like it removed please contact pressreleases@xpr.media

Gloabi Introduces the World’s First AI Digital Organism With Autonomous Social AI, iOS App Launch, and Open Sign-Ups

Gloabi Introduces the World’s First AI Digital Organism With Autonomous Social AI, iOS App Launch, and Open Sign-Ups

DETROIT, MI, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Gloabi, a groundbreaking social network that pairs

January 29, 2026

Leopard Aviation CFI Andrew Hamouda Wins 2026 AOPA Best Flight Instructor Award for Western Region

Leopard Aviation CFI Andrew Hamouda Wins 2026 AOPA Best Flight Instructor Award for Western Region

Winning this award four years in a row demonstrates the culture of quality, personalized instruction, and mentorship

January 29, 2026

Management and Strategy Institute Launches New Certification – AI Quality Management Professional (AIQMP)™

Management and Strategy Institute Launches New Certification – AI Quality Management Professional (AIQMP)™

New AI Quality Management certification helps professionals integrate artificial intelligence into quality improvement,

January 29, 2026

Virgil AI Launches Virgil for SAP, a New AI Sales Copilot for SAP Partners

Virgil AI Launches Virgil for SAP, a New AI Sales Copilot for SAP Partners

GREENWICH, CT, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Virgil AI today announced the launch of Virgil for

January 29, 2026

Historical Fiction Author M. E. Torrey Discusses Slavery, Moral Responsibility, and Memory on Down Under Interviews

Historical Fiction Author M. E. Torrey Discusses Slavery, Moral Responsibility, and Memory on Down Under Interviews

In a long-form interview, the author of Fox Creek reflects on writing slavery, historical responsibility, and the moral

January 29, 2026

South Florida’s Economic Engine: AERO Business Expo 2026 Unites 20+ Powerhouse Agencies to Scale Local Industry

South Florida’s Economic Engine: AERO Business Expo 2026 Unites 20+ Powerhouse Agencies to Scale Local Industry

Over 20 public agencies to convene February 5 to provide South Florida businesses with direct access to capital,

January 29, 2026

Amy Chinian Releases New Children’s Book Everybody Gets Lice, Now Available on Amazon in Kindle and Paperback

Amy Chinian Releases New Children’s Book Everybody Gets Lice, Now Available on Amazon in Kindle and Paperback

A comforting, kid-friendly resource that demystifies lice, reduces shame, and helps families approach one of

January 29, 2026

Florida’s Best Reverse Mortgage Company Names Jay Nauta Employee of the Year

Florida’s Best Reverse Mortgage Company Names Jay Nauta Employee of the Year

Jay Nauta is recognized for outstanding performance and dedication to reverse mortgages in Florida. JACKSONVILLE, FL,

January 29, 2026

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

New benchmark shows top LLMs achieve only 29% pass rate on OpenTelemetry instrumentation, exposing the gap between

January 29, 2026

RealReports Kicks Off 2026 with Gulf Coast MLS Launch via FMLS Collaboration

RealReports Kicks Off 2026 with Gulf Coast MLS Launch via FMLS Collaboration

First partnership announcement of 2026 builds on a historic year of growth, bringing AI-powered property intelligence

January 29, 2026

Giving Compass Launches API to Enable Smarter Philanthropy Where Giving Decisions are Made

Giving Compass Launches API to Enable Smarter Philanthropy Where Giving Decisions are Made

New API suite embeds trusted nonprofit data, AI-ready search, impact insights, and timely updates directly into advisor

January 29, 2026

2026 Study Finds Auto Insurance Is Now a Deciding Factor in the Car Deal

2026 Study Finds Auto Insurance Is Now a Deciding Factor in the Car Deal

Polly’s 2026 Embedded Auto Insurance Study shows customers want insurance in the car buying experience, and when they

January 29, 2026

Kathy O’Neill of What’s Next Transition Coaching Recently Featured on Close Up Radio

Kathy O’Neill of What’s Next Transition Coaching Recently Featured on Close Up Radio

SHELTER ISLAND, NY, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Kathy O’Neill is the owner of a practice

January 29, 2026

Reeracoen Group Launches Fully Renewed Global Job Platform

Reeracoen Group Launches Fully Renewed Global Job Platform

JAPAN, January 20, 2026 /EINPresswire.com/ — “ABROADERS CAREER” After 9 YearsSolving Information Overload for Japanese

January 29, 2026

Acclaimed Attorney John Leonard of John S. Leonard Law Recently Featured on Close Up Radio

Acclaimed Attorney John Leonard of John S. Leonard Law Recently Featured on Close Up Radio

BOSTON, MA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Imagine owning your business, municipality, or agency

January 29, 2026

ClearGov and Gravity Announce Merger, Combining Forces to Create End-to-End Modern Finance Platform for Public Sector

ClearGov and Gravity Announce Merger, Combining Forces to Create End-to-End Modern Finance Platform for Public Sector

BOSTON, MA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Today, ClearGov and Gravity announced the completion

January 29, 2026

Bishop-Wisecarver Appoints Norm Williams as Chief Operating Officer

Bishop-Wisecarver Appoints Norm Williams as Chief Operating Officer

Bishop-Wisecarver names Norm Williams COO, expanding leadership to drive growth, strengthen operations, and advance

January 29, 2026

New Book, ‘The Hospitality Advantage,’ Shows Competitive Power of Making People Feel Valued

New Book, ‘The Hospitality Advantage,’ Shows Competitive Power of Making People Feel Valued

Bruce Craul’s “The Hospitality Advantage” shows leaders how intentional, people-centered hospitality creates a powerful

January 29, 2026

PeopleGuru™ Earns G2 Winter 2026 Award for ‘Best Support for Mid-Market’ in Payroll and Timekeeping

PeopleGuru™ Earns G2 Winter 2026 Award for ‘Best Support for Mid-Market’ in Payroll and Timekeeping

TAMPA, FL, UNITED STATES, January 20, 2026 /EINPresswire.com/ — PeopleGuru™, a leading provider of HCM software for

January 29, 2026

My Real Profit Expands Profit Intelligence for Amazon Sellers

My Real Profit Expands Profit Intelligence for Amazon Sellers

Advanced analytics built around real Amazon business models NEW YORK, NY, UNITED STATES, January 20, 2026

January 29, 2026

Capitol Convening on Human Trafficking Awareness: Nine Years of Justice Advocacy

Capitol Convening on Human Trafficking Awareness: Nine Years of Justice Advocacy

Capitol Convening Highlights Human Trafficking Awareness, Community Action, and Global Leadership: Nine Years of

January 29, 2026

Sunstall Launches SunRobi™, a Robotic-Assisted Solar Installation Platform for Utility-Scale Projects

Sunstall Launches SunRobi™, a Robotic-Assisted Solar Installation Platform for Utility-Scale Projects

A technology-agnostic RAIS™ framework combining robotics, digital site intelligence, and field-proven construction

January 29, 2026

IntelliSystems Recognized with 2025 Best of Georgia Award

IntelliSystems Recognized with 2025 Best of Georgia Award

AUGUSTA, GA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — IntelliSystems, a leading provider of

January 29, 2026

U.S. Department of Education Guidance Encourages Inclusive Practices Aligned with IDEA and ESEA

U.S. Department of Education Guidance Encourages Inclusive Practices Aligned with IDEA and ESEA

New Education Department guidance reinforces that inclusive education must remain individualized, data-driven, and

January 29, 2026

PopCandi Launches PopCandi.ai Recruiting Platform to Help Companies Hire Qualified Talent Faster and More Efficiently

PopCandi Launches PopCandi.ai Recruiting Platform to Help Companies Hire Qualified Talent Faster and More Efficiently

PORTLAND, OR, UNITED STATES, January 20, 2026 /EINPresswire.com/ — PopCandi today announced the launch of PopCandi.ai,

January 29, 2026

The Griffin Opera House Celebrated as a 2025 Best of Georgia Award Winner

The Griffin Opera House Celebrated as a 2025 Best of Georgia Award Winner

GRIFFIN, GA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — The Griffin Opera House, known locally as The OH,

January 29, 2026

HARIO UNVEILS THE V60 DRIPPER NEO

HARIO UNVEILS THE V60 DRIPPER NEO

Hario brings innovative design to the coffee and tea pour-over and drip consumers. The V60 Dripper NEO honors the

January 29, 2026

Pulsar launches Symbion – multispectral binoculars for 24/7 use

Pulsar launches Symbion – multispectral binoculars for 24/7 use

The latest Pulsar binoculars combine 4K digital imaging, premium thermal detection, infrared night visibility, and an

January 29, 2026

New AI Eraser in Aiarty Image Enhancer V3.8 Simplifies Photo Cleanup and Enhancement

New AI Eraser in Aiarty Image Enhancer V3.8 Simplifies Photo Cleanup and Enhancement

Aiarty Image Enhancer V3.8 introduces AI Eraser, expanding its enhancement workflow with intelligent object removal and

January 29, 2026

Style Your Modern Homes with Affordable Latest Marble Countertops from Keystone Marble

Style Your Modern Homes with Affordable Latest Marble Countertops from Keystone Marble

Keystone Marble offers affordable, modern marble countertops that elevate kitchens and bathrooms with timeless style,

January 29, 2026

Black Mesa awarded ARPA-H funding to develop a ‘Good AI Practice’ (GAIP) framework

Black Mesa awarded ARPA-H funding to develop a ‘Good AI Practice’ (GAIP) framework

Black Mesa announces funding from the Advanced Research Projects Agency for Health (ARPA-H) to develop a Good AI

January 29, 2026

The Orthopedic Clinic’s Dr. Sims, First Physician on Florida’s East Coast to Receive Intracept™ Center of Excellence

The Orthopedic Clinic’s Dr. Sims, First Physician on Florida’s East Coast to Receive Intracept™ Center of Excellence

Richard C. Sims, M.D. has been designated a Center of Excellence for the Intracept™ Procedure by Boston Scientific, the

January 29, 2026

A New Framework for Entrepreneurs That Transforms Taxes into Profit

A New Framework for Entrepreneurs That Transforms Taxes into Profit

“Don’t Just Pay Taxes: How Conscious Entrepreneurs Profit from Taxes” by Divakar Vijayasarathy is released with Forbes

January 29, 2026

New AI Platform Helps Real Estate Investors Identify High-Quality Deals and Quantifies Risks in Under 60 Seconds

New AI Platform Helps Real Estate Investors Identify High-Quality Deals and Quantifies Risks in Under 60 Seconds

InvestFusion uses AI to protect investors from losing thousands in non-refundable due diligence fees by identifying

January 29, 2026

We Are the Toledo Troopers Launches on Amazon, Taking Toledo’s Football Legacy Nationwide

We Are the Toledo Troopers Launches on Amazon, Taking Toledo’s Football Legacy Nationwide

Documentary Honoring the Winningest Team in Pro Football History Debuts Across Major TVOD Platforms TOLEDO, OH, UNITED

January 29, 2026

Strategic Partnership with Noetics and White Label Communications to Revolutionize ERP-Driven Communications Solutions

Strategic Partnership with Noetics and White Label Communications to Revolutionize ERP-Driven Communications Solutions

DALLAS, TX, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Noetics (www.noeticerp.com) today announced a

January 29, 2026

America Documents a Global Leader: Dr. Sonnie Badu and Eight Years of RockHill Church Impact

America Documents a Global Leader: Dr. Sonnie Badu and Eight Years of RockHill Church Impact

Top Ten African Gospel Artist and Global Leadership Voice expands U.S. influence as RockHill Church marks eight years

January 29, 2026

Muscogee (Creek) Nation Child Support Announces Partnership with GreenCourt

Muscogee (Creek) Nation Child Support Announces Partnership with GreenCourt

The Muscogee (Creek) Nation Office of Child Support Enforcement Announces Partnership with GreenCourt Legal

January 29, 2026

Navrogen Granted Composition and Use Patent to Humoral Immuno-Oncology Factor Antagonist NAV-005 to Treat Cancer

Navrogen Granted Composition and Use Patent to Humoral Immuno-Oncology Factor Antagonist NAV-005 to Treat Cancer

Our NAV-005 antagonist binds the immunosuppressive HIO-1 factor and blocks its ability to suppress mAb, TCE and ADC

January 29, 2026

BlueCloudX Donates $60 Million in Assets To Nigerian Medical Association To Build ‘Clinical Research as a Care Option’

BlueCloudX Donates $60 Million in Assets To Nigerian Medical Association To Build ‘Clinical Research as a Care Option’

Partnership will help Africa access new forms of income by implementing clinical trials as a care option infrastructure

January 29, 2026