AI outperforms authorities in predicting research outcomes.

Summary: A new study demonstrates that large language models ( LLMs) can predict the outcomes of neuroscience studies more accurately than human experts, achieving 81 % accuracy compared to 63 % for neuroscientists.

Experts tested LLMs and animal experts on the identification of actual study abstracts using a device called BrainBench, and found that AI models excelled even when neuroscientists had domain-specific experience. A professional neuroscience-focused LLM, dubbed BrainGPT, achieved yet higher precision at 86 %.

The research highlights the potential of AI in designing research, predicting outcomes, and accelerating technological progress across fields.

Important Facts:

  • LLMs were more accurate than human neuroscientists in terms of study outcomes prediction ( 81 % vs. 63 % ).
  • A neuroscience-specific LLM, BrainGPT, achieved 86 % forecast accuracy.
  • Findings suggest that technological advancement and experimental design could be improved by using AI tools.

Origin: UCL

Large speech models, a type of AI that analyses words, can determine the results of proposed neuroscience research more effectively than people experts, finds a new study led by UCL ( University College London ) &nbsp, experts.

The findings, published in&nbsp, Nature Human Behaviour, demonstrate that large language models ( LLMs) trained on vast datasets of text can distil patterns from scientific literature, enabling them to forecast scientific outcomes with superhuman accuracy.

Despite the focus of our study on science, we believed that our strategy may be applicable to all branches of science. Credit: Neuroscience News

The researchers claim that this demonstrates how effective they are at accelerating research that goes beyond simply information retrieval.

Lead author Dr Ken Luo (UCL Psychology &amp, English Sciences ) said:” Since the introduction of relational AI like ChatGPT, little research has focused on LLMs ‘ question-answering features, showcasing their extraordinary skill in summarising information from extensive training information.

But, more than emphasising their backward-looking ability to get previous information, we explored whether LLMs was replicate knowledge to determine future outcomes.

Scientific advancement frequently depends on trial and error, but time and resources are needed for each meticulous experiment. Even the most knowledgeable researchers may overlook important literary discoveries.

Our work examines whether LLMs can predict experiments ‘ outcomes and identify patterns in disparate scientific texts.

The international research team’s work began with BrainBench, a tool to assess the accuracy of large-label models ( LLMs) ‘ ability to predict neuroscience results.

Numerous pairs of neuroscience study abstracts make up BrainBench. In each pair, one version is a real study abstract that briefly describes the background of the research, the methods used, and the study results.

The same background and methods are used in the other version, but experts in the field of neuroscience have altered the results to produce a plausible but incorrect conclusion.

To determine whether the AI or the person could correctly determine which of the two paired abstracts was the real one based on the actual study results, the researchers tested 15 different general-purpose LLMs and 171 human neuroscience experts ( who had all passed a screening test to confirm their expertise ).

All of the LLMs outperformed the neuroscientists, with the LLMs averaging 81 % accuracy and the humans averaging 63 % accuracy.

The accuracy of the neuroscientists was still below the LLMs, at 66 %, despite the study team’s restriction on human responses to those with the highest level of expertise for a particular area of neuroscience ( based on self-reported expertise ).

Additionally, the researchers found that when LLMs were more confident in their decisions, they were more likely to be correct.

This finding, according to the researchers, opens the door for human experts to work with well-calibrated models in the future.

The researchers then adapted an existing LLM ( a version of Mistral, an open-source LLM) by training it on neuroscience literature specifically.

The new LLM specialising in neuroscience, which they dubbed BrainGPT, was even better at predicting study results, attaining 86 % accuracy ( an improvement on the general-purpose version of Mistral, which was 83 % accurate ).

According to senior author Professor Bradley Love of UCL Psychology &amp, Language Sciences,” We think it wo n’t be long before scientists start creating the most efficient experiment possible for their question.” Our study focused on neuroscience, but our approach should be applicable to all branches of science.

How accurately LLMs can predict the neuroscience literature is remarkable. This success points to the fact that a lot of science is not truly novel but rather complies with published literature’s patterns of results. We’re left to wonder if scientists are being sufficiently creative and exploratory.

Dr Luo added:” Building on our results, we are developing AI tools to assist researchers. In the future, researchers can input their proposed experiment designs and anticipated outcomes, with AI making forecasts about the likelihood of various outcomes. This would speed up experiment design and make for more educated decision-making.

Funding: The study was supported by the Economic and Social Research Council ( ESRC ), Microsoft, and a Royal Society Wolfson Fellowship, and involved researchers in UCL, University of Cambridge, University of Oxford, Max Planck Institute for Neurobiology of Behavior ( Germany ), Bilkent University ( Turkey ) and other institutions in the UK, US, Switzerland, Russia, Germany, Belgium, Denmark, Canada, Spain and Australia.

Note: Based on both its own learned knowledge and the context ( background and method ), the LLM calculates the likelihood of each abstract when presented with two abstracts.

The researchers measured LLMs’ confidence by determining whether real or fake abstracts were found to be accurate by measuring how surprising or perplexing they were in terms of confidence.

About this news about neuroscience and AI research

Author: Chris Lane
Source: UCL
Contact: Chris Lane – UCL
Image: The image is credited to Neuroscience News

Original Research: Open access.
By Ken Luo and colleagues,” Large language models outperform human experts when it comes to predicting neuroscience results.” Nature Human Behavior


Abstract

Large language models outperform human experts when it comes to predicting neuroscience results

Synthesising decades of research, a task that might outweigh human information processing capabilities, is a common component of scientific discoveries. Large language models ( LLMs) offer a solution.

LLMs who have extensive experience with the vast body of scientific literature might be able to better predict novel outcomes than human experts who have been trained in noisy yet interrelated findings.

Here, to evaluate this possibility, we created BrainBench, a forward-looking benchmark for predicting neuroscience results.

We discover that LLMs predict experimental outcomes more accurately than experts. BrainGPT, an LLM we tuned on the neuroscience literature, performed better yet.

Like human experts, when LLMs indicated high confidence in their predictions, their responses were more likely to be correct, which presages a future where LLMs assist humans in making discoveries.

Our approach is not neuroscience specific and is transferable to other knowledge-intensive endeavours.

[ihc-register]