Breaking News

OpenAI’s new FrontierScience benchmark to assess AI’s expert-level scientific reasoning

by VARINDIA 2025-12-17

As AI models increasingly demonstrate their ability to support real scientific research, OpenAI has announced FrontierScience, a new benchmark designed to evaluate artificial intelligence systems on expert-level scientific reasoning across physics, chemistry and biology. As AI systems grow more capable, OpenAI said the key question is how deeply they can meaningfully contribute to scientific discovery. The company said reasoning lies at the heart of scientific work, going beyond factual recall to include hypothesis generation, testing, refinement and cross-disciplinary synthesis.

In the past year, OpenAI’s models have reached major milestones, including gold-medal-level performance at the International Math Olympiad and the International Olympiad in Informatics. At the same time, advanced systems such as GPT-5 are already being used by researchers to accelerate scientific workflows.

According to OpenAI, scientists are deploying these models for tasks such as cross-disciplinary literature searches, multilingual research reviews and complex mathematical proofs. In many cases, work that once took days or weeks can now be completed in hours.

This progress was detailed in OpenAI’s November 2025 paper, Early science acceleration experiments with GPT-5, which presented early evidence that GPT-5 can measurably speed up scientific workflows.

OpenAI said that as models’ reasoning and knowledge capabilities scale, existing scientific benchmarks are no longer sufficient. Many prior benchmarks focus on multiple-choice questions, have become saturated, or are not centered on real scientific reasoning.

FrontierScience was created to fill this gap by measuring expert-level scientific capabilities using difficult, original and meaningful questions written and verified by domain experts.