Summary: Researchers have developed a computational model that drawings how the brain processes talk during real-world meetings. Using electrocorticography (ECoG ) and AI speech models, the study analyzed over 100 hours of brain activity, revealing how different regions handle sounds, speech patterns, and word meanings.
The results show that the mental processes speech in a sequence—moving from thought to discourse before speaking and working back to view spoken words. The platform accurately predicted mind activity also for new conversations, surpassing earlier models.
These insights may boost speech recognition technology and help individuals with communication disorders. The study provides a deeper knowledge of how the mind easily engages in talk.
Important Information:
- Split Processing: The head handles speech at three ranges —sounds, speech patterns, and word meaning.
- Chronological Processing: Before speaking, the head forms words into sounds, after experiencing, it deciphers significance.
- Real-World Insights: The Artificial type accurately predicted mental action during normal conversations.
Origin: Hebrew University of Jerusalem
A new study led by , Dr. Ariel Goldstein, from the Department of Cognitive and Brain Sciences and the Business School at the Hebrew University of Jerusalem, Google Research, in partnership with the , Hasson Lab , at the Neuroscience Institute at Princeton University,  , Dr. Flinker and Dr. Devinsky from the , NYU Langone Comprehensive Epilepsy Center, has developed a unified computing platform to explore the neural basis of human interactions.
This study bridges sound, speech, and word-level language structures, offering exceptional insights into how the brain processes common speech in real-world settings.
The study, published in , Nature Human Behaviour,  , recorded brain activity over 100 hours of natural, open-ended conversations using a technique called electrocorticography (ECoG ).
To analyze this data, the team used a speech-to-text type called Whisper, which helps break down speech into three levels: basic sound, speech patterns, and the meaning of words. These pieces were then compared to head action using sophisticated computer models.
The findings showed that the construction could anticipate mental exercise with great accuracy. Yet when applied to conversations that were not part of the original information, the concept properly matched different parts of the brain to certain language works.
For example, areas involved in hearing and speaking aligned with good and speech patterns, while sections involved in higher-level knowledge aligned with the meanings of words.
The study also found that the mental processes speech in a sequence. Before we speak, our mind moves from thinking about terms to forming looks, while after we listen, it works downward to create impression of what was said.
The model used in this research was more powerful than older methods at capturing these complicated processes.
” Our conclusions help us understand how the brain processes meetings in real-life configurations”, said Dr. Goldstein.
” By connecting various levels of language, we’re uncovering the concepts behind something we all do naturally—talking and understanding each another”.
This study has possible practical uses, from improving speech recognition technology to developing better resources for people with communication problems. It also offers new insights into how the mind makes conversation feeling so smooth, whether it’s chatting with a colleague or engaging in a conversation.
The study marks an important step toward building more advanced tools to study how the brain handles language in real-world situations.
About this speech processing and neuroscience research news
Author: Yarden Mills
Source: Hebrew University of Jerusalem
Contact: Yarden Mills – Hebrew University of Jerusalem
Image: The image is credited to Neuroscience News
Original Research: Open access.
” A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations” by Ariel Goldstein et al. Nature Human Behavior
Abstract
A unified acoustic-to-speech-to-language embedding space captures the neural basis of natural language processing in everyday conversations
This study introduces a unified computational framework connecting acoustic, speech and word-level linguistic structures to study the neural basis of everyday conversations in the human brain.
We used electrocorticography to record neural signals across 100 h of speech production and comprehension as participants engaged in open-ended real-life conversations. We extracted low-level acoustic, mid-level speech and contextual word embeddings from a multimodal speech-to-text model ( Whisper ).
We developed encoding models that linearly map these embeddings onto brain activity during speech production and comprehension.
Remarkably, this model accurately predicts neural activity at each level of the language processing hierarchy across hours of new conversations not used in training the model.
The internal processing hierarchy in the model is aligned with the cortical hierarchy for speech and language processing, where sensory and motor regions better align with the model’s speech embeddings, and higher-level language areas better align with the model’s language embeddings.
The Whisper model captures the temporal sequence of language-to-speech encoding before word articulation ( speech production ) and speech-to-language encoding post articulation ( speech comprehension ). The embeddings learned by this model outperform symbolic models in capturing neural activity supporting natural speech and language.
These findings support a paradigm shift towards unified computational models that capture the entire processing hierarchy for speech comprehension and production in real-world conversations.