إعادة بناء اللغة التوليدية من تسجيلات الدماغ Generative language reconstruction from brain recordings

المجلة: Communications Biology، المجلد: 8، العدد: 1
DOI: https://doi.org/10.1038/s42003-025-07731-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40025160
تاريخ النشر: 2025-03-01
المؤلف: Ziyi Ye وآخرون
الموضوع الرئيسي: مراقبة الحركة والتزامن

نظرة عامة

تتناول الأبحاث تحدي إعادة بناء اللغة من تسجيلات الدماغ غير الغازية، والتي تم التعامل معها تقليديًا من خلال طرق التصنيف التي تعتمد على مرشحات اللغة المُعدة مسبقًا. يقترح المؤلفون طريقة جديدة للتوليد الذاتي تعتمد على تمثيلات تم فك تشفيرها من التصوير بالرنين المغناطيسي الوظيفي (fMRI) كمدخل مباشر لنموذج لغة كبير (LLM). تلغي هذه الطريقة الحاجة إلى مرشحات محددة مسبقًا وتعزز توافق المحتوى المُولد مع المحفزات البصرية أو السمعية التي أثارت تسجيلات الدماغ، خاصةً للمحتوى غير المتوقع.

تظهر الطريقة المقترحة القدرة على إعادة بناء محفز لغوي مدته 10 دقائق بطريقة ذاتية، متفوقةً أو مطابقةً لأداء التقنيات القائمة على التصنيف عبر إعدادات المهام المختلفة. بالإضافة إلى ذلك، توفر ميزة تقدير احتمالية توليد أي محتوى دلالي. تسلط النتائج الضوء على إمكانيات واجهات لغة الدماغ في إطار توليدي، مما يوفر وسيلة فعالة لرسم التمثيلات الوظيفية لإدراك اللغة في الدماغ.

طرق

في هذا القسم، يحدد المؤلفون المنهجية المستخدمة في دراستهم، التي تركز على توليد اللغة من تسجيلات الدماغ. يقدمون الإطار المعروف باسم BrainLLM، موضحين مكوناته المختلفة ومقدمين مبررات لاختيارات تصميمهم. علاوة على ذلك، يصف المؤلفون مجموعات البيانات المستخدمة في البحث، بالإضافة إلى عمليات التدريب ومعايير التقييم التي تم تنفيذها لتقييم أداء النموذج. تهدف هذه الطريقة المنظمة إلى تعزيز فهم كيفية ترجمة النشاط العصبي إلى مخرجات لغوية متماسكة.

نتائج

في هذا القسم، يقيم المؤلفون أداء BrainLLM باستخدام ثلاث مجموعات بيانات fMRI، مع التركيز على قدرته على توليد استمرارات لغوية بناءً على تسجيلات الدماغ أثناء إدراك المحفزات البصرية أو السمعية. تم تدريب النموذج واختباره على مشاركين فرديين عبر مجموعات بيانات مختلفة، مستخدمين مهمة توليد اللغة حيث كان النص السابق بمثابة تحفيز وكانت الاستمرارية المدركة تتكون من 3-10 كلمات. شملت التقييمات مقارنات مع نموذج تحكم (PerBrainLLM)، وطرق متزامنة لفك تشفير اللغة ذات المفردات المفتوحة، ونموذج لغة قياسي بدون مدخلات دماغية (StdLLM). أظهرت النتائج أن BrainLLM تفوق بشكل كبير على PerBrainLLM، محققًا معدلات فوز بلغت 64.9%، 78.9%، و66.5% عبر مجموعات البيانات، مع معدل اكتشاف خاطئ (FDR) < 0.05. دعمت التقييمات البشرية هذه النتائج، حيث أظهرت تفضيلًا لمخرجات BrainLLM على مخرجات PerBrainLLM في 48.4% من الحالات. بالإضافة إلى ذلك، أظهر BrainLLM أداءً متفوقًا في مقاييس تشابه اللغة، مع تحسينات تجاوزت 40.2% في درجات BLEU-1 مقارنةً بالطرق الحديثة. تم أيضًا تحليل أداء النموذج فيما يتعلق بمستويات المفاجأة للاستمرارات المُولدة، مما كشف أن BrainLLM حافظ على أداء أكثر استقرارًا مع زيادة المفاجأة. من الجدير بالذكر أن النموذج تفوق حتى بدون تحفيز نصي، محققًا معدلات فوز بلغت 0.8885، 0.8816، و0.6728 عبر مجموعات البيانات، مما يشير إلى أن مستويات المفاجأة العالية في غياب التحفيز عززت قدراته على التوليد. ومع ذلك، كانت مقاييس تشابه اللغة أقل في هذه الحالات، مما يبرز التحديات المتمثلة في توليد لغة متماسكة فقط من مدخلات الدماغ. بشكل عام، تشير النتائج إلى أن دمج تسجيلات الدماغ في مهام توليد اللغة يمكن أن يعزز بشكل كبير أداء النموذج، خاصةً في السياقات التي تحتوي على استمرارات غير متوقعة.

نقاش

يسلط قسم النقاش في ورقة البحث الضوء على النتائج المهمة المتعلقة بأداء نماذج اللغة الكبيرة (LLMs) في توليد اللغة من تسجيلات الدماغ. استخدمت الدراسة بشكل أساسي Llama-2 (7 مليارات معلمة) وقارنتها مع نماذج مختلفة الأحجام، بما في ذلك متغيرات GPT-2. أشارت النتائج إلى أن مقاييس تشابه اللغة تحسنت مع زيادة عدد معلمات النموذج، مما يعزز الفكرة القائلة بأن نماذج LLM الأكبر أكثر فعالية في مهام توليد اللغة. من الجدير بالذكر أن BrainLLM، الذي يدمج مدخلات الدماغ، أظهر أداءً محسنًا مقارنةً بـ PerBrainLLM، خاصةً في مناطق معالجة اللغة مثل منطقة بروكا، مما يشير إلى أن نماذج LLM تستفيد من البيانات المستمدة من الدماغ.

كما تؤكد الورقة على أهمية حجم بيانات النشاط العصبي للتدريب، كاشفةً أن مجموعات البيانات الأكبر، مثل مجموعات بيانات هاث وNarratives، تؤدي إلى نتائج أفضل في توليد اللغة. يتماشى هذا مع الأدبيات الحالية التي تدعم الفكرة القائلة بأن توسيع مجموعات بيانات النشاط العصبي يعزز التوافق بين تمثيلات الدماغ واللغة. علاوة على ذلك، تتيح الطريقة المبتكرة لـ BrainLLM توليد اللغة مباشرة من تسجيلات الدماغ، مما يتناقض مع الطرق التقليدية التي تعتمد على اختيارات مرشحة محددة مسبقًا. لا يوسع هذا النموذج التوليدي نطاق إعادة بناء اللغة فحسب، بل يفتح أيضًا آفاقًا للتطبيقات في واجهات الدماغ-الكمبيوتر (BCIs)، خاصةً للأفراد الذين يعانون من إعاقات في التواصل. قد تتضمن الأعمال المستقبلية دمج BrainLLM مع أنماط BCI الأخرى لتعزيز قابليته العملية وفعاليته في السيناريوهات الواقعية.

Journal: Communications Biology, Volume: 8, Issue: 1
DOI: https://doi.org/10.1038/s42003-025-07731-7
PMID: https://pubmed.ncbi.nlm.nih.gov/40025160
Publication Date: 2025-03-01
Author(s): Ziyi Ye et al.
Primary Topic: Action Observation and Synchronization

Overview

The research addresses the challenge of language reconstruction from non-invasive brain recordings, traditionally approached through classification methods that rely on pre-constructed language candidates. The authors propose a novel autoregressive generation method that utilizes representations decoded from functional magnetic resonance imaging (fMRI) as direct input for a large language model (LLM). This approach eliminates the need for pre-defined candidates and enhances the alignment of generated content with the visual or auditory stimuli that elicited the brain recordings, particularly for unexpected content.

The proposed method demonstrates the capability to reconstruct a 10-minute language stimulus in an autoregressive manner, outperforming or matching the performance of existing classification-based techniques across various task settings. Additionally, it provides the advantage of estimating the likelihood of generating any semantic content. The findings highlight the potential of brain language interfaces in a generative framework, offering an efficient means to map the functional representations of language perception in the brain.

Methods

In this section, the authors outline the methodology employed in their study, which focuses on generating language from brain recordings. They introduce the framework known as BrainLLM, detailing its various components and providing justifications for their design choices. Furthermore, the authors describe the datasets utilized in the research, along with the training processes and evaluation metrics implemented to assess the model’s performance. This structured approach aims to enhance the understanding of how neural activity can be translated into coherent language outputs.

Results

In this section, the authors evaluate the performance of BrainLLM using three fMRI datasets, focusing on its ability to generate language continuations based on brain recordings during the perception of visual or auditory stimuli. The model was trained and tested on individual participants across different datasets, employing a language generation task where the preceding text served as a prompt and the perceived continuation comprised 3-10 words. The evaluation involved comparisons with a control model (PerBrainLLM), concurrent methods for open-vocabulary language decoding, and a standard language model without brain input (StdLLM). The results indicated that BrainLLM significantly outperformed PerBrainLLM, achieving win rates of 64.9%, 78.9%, and 66.5% across the datasets, with a false discovery rate (FDR) < 0.05. Human evaluations further supported these findings, showing a preference for BrainLLM's outputs over PerBrainLLM's in 48.4% of cases. Additionally, BrainLLM demonstrated superior performance in language similarity metrics, with improvements exceeding 40.2% in BLEU-1 scores compared to state-of-the-art methods. The model's performance was also analyzed concerning the surprise levels of the generated continuations, revealing that BrainLLM maintained a more stable performance as surprise increased. Notably, the model excelled even without text prompts, achieving win rates of 0.8885, 0.8816, and 0.6728 across the datasets, indicating that high surprise levels in the absence of prompts enhanced its generation capabilities. However, the language similarity metrics were lower in these cases, highlighting the challenges of generating coherent language solely from brain input. Overall, the findings suggest that integrating brain recordings into language generation tasks can significantly enhance model performance, particularly in contexts with unexpected continuations.

Discussion

The discussion section of the research paper highlights the significant findings regarding the performance of large language models (LLMs) in generating language from brain recordings. The study primarily utilized Llama-2 (7 billion parameters) and compared it with various models of different sizes, including GPT-2 variants. Results indicated that language similarity metrics improved with an increase in model parameters, reinforcing the notion that larger LLMs are more effective in language generation tasks. Notably, BrainLLM, which integrates brain input, demonstrated enhanced performance over PerBrainLLM, particularly in language processing regions such as Broca’s area, suggesting that LLMs benefit from brain-derived data.

The paper also emphasizes the importance of the size of neural activity data for training, revealing that larger datasets, such as Huth’s and Narratives datasets, lead to better language generation outcomes. This aligns with existing literature that supports the notion that expanding neural activity datasets enhances the mapping between brain and language representations. Furthermore, BrainLLM’s innovative approach allows for direct language generation from brain recordings, contrasting with traditional methods that rely on pre-defined candidate selections. This generative paradigm not only broadens the scope of language reconstruction but also opens avenues for applications in brain-computer interfaces (BCIs), particularly for individuals with communication impairments. Future work may involve integrating BrainLLM with other BCI modalities to enhance its practical applicability and effectiveness in real-world scenarios.