التخصص الوظيفي المشترك في نماذج اللغة المعتمدة على المحولات والدماغ البشري Shared functional specialization in transformer-based language models and the human brain

المجلة: Nature Communications، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41467-024-49173-5
PMID: https://pubmed.ncbi.nlm.nih.gov/38951520
تاريخ النشر: 2024-06-29
المؤلف: Sreejan Kumar وآخرون
الموضوع الرئيسي: دراسات الاتصال الوظيفي في الدماغ

طرق

في هذه الدراسة، قام المؤلفون بتقييم النماذج باستخدام مجموعتين من البيانات من مجموعة “السرديات” المتاحة للجمهور من بيانات fMRI، حيث استمع المشاركون إلى قصص محكية طبيعية. كانت مجموعة البيانات الأولى، “مالك الأحياء الفقيرة” و”الوصول إلى النجوم خطوة صغيرة في كل مرة”، تضم 18 مشاركًا (متوسط العمر 21 عامًا، 9 إناث)، بينما تضمنت مجموعة البيانات الثانية، “كنت أعلم أنك أسود”، 45 مشاركًا (متوسط العمر 23.3 عامًا، 33 إناث). قدم جميع المشاركين موافقة مستنيرة، والتزمت الدراسة بالبروتوكولات المعتمدة من قبل مجلس مراجعة المؤسسات في جامعة برينستون. تم جمع بيانات fMRI الوظيفية مع زمن تكرار (TR) قدره 1.5 ثانية، وشملت خطوات المعالجة المسبقة تصحيح تشوه الحساسية، وتصحيح توقيت الشرائح، وتسجيل الحجم، والتطبيع المكاني إلى مساحة MNI.

لتحليل البيانات، أعاد المؤلفون أخذ عينات من مجموعات بيانات fMRI الوظيفية إلى تقسيم قشري مكون من 1000 قطعة مستمدة من الاتصال الوظيفي الداخلي، مع متوسط سلاسل الزمن عبر الفوكسلات داخل كل قطعة. قاموا بإنشاء عشرة مناطق اهتمام (ROIs) تمتد عبر التسلسل الهرمي القشري لمعالجة اللغة والسرد، بدءًا من المناطق الحسية ذات المستوى المنخفض (مثل تلافيف هيسكل) إلى مناطق الارتباط ذات المستوى العالي (مثل القشرة الحركية الأولية). تم تعريف ROIs بناءً على الأطالس الموجودة وشملت مناطق ذات نشاط موثوق مدفوع بالتحفيز أثناء الاستماع إلى القصص. استخدم المؤلفون أدوات برمجية مفتوحة المصدر متنوعة لتصور البيانات وتحليلها.

نتائج

في هذه الدراسة، استخدم المؤلفون إطار عمل قائم على النماذج للتحقيق في العلاقة بين ميزات Transformer ونشاط الدماغ، كما تم قياسه بواسطة fMRI، بينما استمع المشاركون إلى قصص محكية طبيعية. كان التركيز على التحولات داخل النموذج، التي تكون مسؤولة عن دمج المعلومات السياقية من الكلمات المحيطة في الكلمة الحالية. افترض المؤلفون أن هذه التحولات، بينما قد تقارب العمليات النحوية، لا تفصل بشكل واضح بين النحو والمعنى، مما يعكس تعقيد شبكة اللغة القشرية التي لا تميز أيضًا بوضوح بين هذه الميزات اللغوية.

كانت الأبحاث تهدف إلى معالجة سؤالين رئيسيين: أولاً، افترض المؤلفون أن التحولات ستتنبأ بنشاط الدماغ بشكل أكثر فعالية مقارنةً بميزات اللغة الأخرى، وأنها ستظهر خريطة أكثر تحديدًا على المناطق القشرية للغة مقارنةً بالتضمينات، التي تجمع المعلومات السياقية عبر الطبقات. ثانيًا، استكشفوا ما إذا كانت التخصص الوظيفي لرؤوس الانتباه الفردية تتوافق مع نشاط الدماغ بطريقة منظمة، مع تنفيذ ذلك كارتباط بين التحولات الرأسية التي تشفر الاعتماد اللغوي وتنبؤات نشاط الدماغ. لتحليل ذلك، استخدم المؤلفون أطلس وظيفي دقيق لتقليل بيانات الدماغ إلى 1000 قطعة قشرية، مع تقدير نماذج الترميز لكل قطعة من خلال الانحدار المتقطع وتقييم أدائها عبر الارتباطات مع سلاسل الزمن الفعلية لنشاط الدماغ.

مناقشة

في هذا القسم، يستكشف المؤلفون التشريح الوظيفي لنموذج BERT-base Transformer، مع تسليط الضوء على هيكله ودور التضمينات والتحولات في معالجة اللغة الطبيعية. يستخدم BERT آلية انتباه ذاتي عبر 12 طبقة، كل منها تحتوي على 12 رأس انتباه، مما يسمح له بمعالجة ما يصل إلى 512 رمزًا بالتوازي. يولد كل رأس انتباه متجه تحويل من خلال حساب أوزان الانتباه بناءً على صلة الكلمات في السياق، والتي يتم دمجها بعد ذلك في تضمين مخرجات عالي الأبعاد. تؤكد الدراسة أن التضمينات والسياقات السياقية لـ BERT تتفوق على الميزات اللغوية التقليدية وتضمينات GloVe غير السياقية في التنبؤ بنشاط الدماغ عبر مناطق اللغة المختلفة، مما يشير إلى أن المعلومات السياقية الغنية التي تلتقطها هذه التحولات ضرورية لفهم الاستجابات العصبية أثناء معالجة اللغة.

يقوم المؤلفون بمزيد من تحليل أداء التضمينات والتحولات عبر طبقات مختلفة، كاشفين أن التضمينات تجمع المعنى السياقي بشكل تدريجي، بينما تظهر التحولات تقلبات أكثر تحديدًا للطبقات. من الجدير بالذكر أن التحولات تصل إلى ذروتها في الأداء في الطبقات المبكرة مقارنةً بالتضمينات، مما يشير إلى أدوار حسابية متميزة. يكشف تحليل رؤوس الانتباه عن تخصص وظيفي ناشئ، حيث تتوافق رؤوس معينة مع اعتمادات نحوية معينة، متفوقة على المؤشرات اللغوية التقليدية. وهذا يشير إلى أن تحولات BERT لا تقارب فقط العمليات النحوية التقليدية ولكنها أيضًا تلتقط مجموعة أوسع من العلاقات اللغوية، مما يوفر رؤى حول الترميز العصبي للغة في الدماغ. تؤكد النتائج على أهمية فحص المساهمات الدقيقة لميزات Transformer لفهم خصائصها الوظيفية وعلاقتها بمعالجة اللغة القشرية بشكل أفضل.

Journal: Nature Communications, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41467-024-49173-5
PMID: https://pubmed.ncbi.nlm.nih.gov/38951520
Publication Date: 2024-06-29
Author(s): Sreejan Kumar et al.
Primary Topic: Functional Brain Connectivity Studies

Methods

In this study, the authors evaluated models using two datasets from the publicly available “Narratives” collection of fMRI data, where subjects listened to naturalistic spoken stories. The first dataset, “Slumlord” and “Reach for the Stars One Small Step at a Time,” involved 18 participants (mean age 21 years, 9 females), while the second dataset, “I Knew You Were Black,” included 45 participants (mean age 23.3 years, 33 females). All subjects provided informed consent, and the study adhered to protocols approved by the Princeton University Institutional Review Board. Functional MRI data were collected with a repetition time (TR) of 1.5 seconds, and preprocessing steps included susceptibility distortion correction, slice-timing correction, volume registration, and spatial normalization to MNI space.

To analyze the data, the authors resampled the functional MRI datasets to a 1000-parcel cortical parcellation derived from intrinsic functional connectivity, averaging time series across voxels within each parcel. They constructed ten regions of interest (ROIs) spanning the cortical hierarchy for language and narrative processing, ranging from low-level sensory areas (e.g., Heschl’s gyrus) to high-level association areas (e.g., premotor cortex). The ROIs were defined based on existing atlases and included areas with reliable stimulus-driven activity during story listening. The authors utilized various open-source software tools for data visualization and analysis.

Results

In this study, the authors employed a model-based encoding framework to investigate the relationship between Transformer features and brain activity, as measured by fMRI, while participants listened to naturalistic spoken stories. The focus was on the transformations within the model, which are responsible for integrating contextual information from surrounding words into the current word. The authors posited that these transformations, while potentially approximating syntactic operations, do not distinctly separate syntax from meaning, reflecting the complexity of the cortical language network that similarly does not cleanly differentiate these linguistic features.

The research aimed to address two primary questions: First, the authors hypothesized that the transformations would more effectively predict brain activity compared to other language features, and that they would exhibit a more layer-specific mapping onto cortical language areas than embeddings, which aggregate contextual information across layers. Second, they explored whether the functional specialization of individual attention heads corresponded to brain activity in a structured manner, operationalizing this as a correlation between headwise transformations that encode linguistic dependencies and brain activity predictions. To analyze this, the authors utilized a fine-grained functional atlas to downsample brain data into 1000 cortical parcels, estimating parcelwise encoding models through banded ridge regression and evaluating their performance via correlations with actual brain activity time series.

Discussion

In this section, the authors explore the functional anatomy of the BERT-base Transformer model, highlighting its architecture and the role of embeddings and transformations in processing natural language. BERT employs a self-attention mechanism across 12 layers, each with 12 attention heads, allowing it to process up to 512 tokens in parallel. Each attention head generates a transformation vector by computing attention weights based on the relevance of words in context, which are then fused into a higher-dimensional output embedding. The study confirms that BERT’s contextual embeddings and transformations outperform classical linguistic features and non-contextual GloVe embeddings in predicting brain activity across various language regions, suggesting that the rich contextual information captured by these transformations is crucial for understanding neural responses during language processing.

The authors further dissect the performance of embeddings and transformations across different layers, revealing that embeddings accumulate contextual meaning progressively, while transformations exhibit more layer-specific fluctuations. Notably, transformations peak in performance at earlier layers compared to embeddings, indicating distinct computational roles. The analysis of attention heads reveals emergent functional specialization, where specific heads correlate with particular syntactic dependencies, outperforming traditional linguistic indicators. This suggests that BERT’s transformations not only approximate classical syntactic operations but also capture a broader range of linguistic relationships, providing insights into the neural encoding of language in the brain. The findings underscore the importance of examining the nuanced contributions of Transformer features to better understand their functional properties and their relationship with cortical language processing.