نموذج GRU–CNN لاكتشاف الانتباه السمعي باستخدام تحليل الحالة الدقيقة وتحليل التكرار A GRU–CNN model for auditory attention detection using microstate and recurrence quantification analysis

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-58886-y
PMID: https://pubmed.ncbi.nlm.nih.gov/38632246
تاريخ النشر: 2024-04-17
المؤلف: MohammadReza EskandariNasab وآخرون
الموضوع الرئيسي: تخطيط الدماغ وواجهات الدماغ-الكمبيوتر

نظرة عامة

تبحث هذه الدراسة في الكشف عن الانتباه السمعي (AAD) من خلال تحليل إشارات تخطيط الدماغ الكهربائي (EEG) متعددة القنوات، مع التركيز على كيفية تركيز المستمعين على متحدث مستهدف وسط متحدثين متنافسين. تستخدم الدراسة تحليلات الحالة الدقيقة وتحليل تكرار الكوانتيشن لاستخراج ميزات ديناميكية تعكس تغييرات حالة الدماغ أثناء المهام المعرفية. يتم تحديد مجموعة ميزات محسّنة من خلال اختيار ميزات مهمة بناءً على أداء التصنيف، مما يؤدي إلى تطوير نموذج مصنف هجين يدمج وحدات التكرار المغلقة (GRU) والشبكات العصبية الالتفافية (CNN). تظهر الطريقة المقترحة للكشف عن الانتباه السمعي أداءً متفوقًا مقارنة بالأساليب الحالية، حيث تحقق دقة تصل إلى 98.9% مع نسبة التكرار المستخرجة من متوسط قوة المجال العالمي باستخدام مصنف GCQL.

تشير النتائج إلى أن نموذج AAD-GCQL يلتقط الانتباه السمعي بفعالية دون الحاجة إلى الوصول إلى المحفزات السمعية، مما يعالج القيود في المنهجيات السابقة التي اعتمدت فقط على الشبكات العصبية الالتفافية. تسلط الدراسة الضوء على الإمكانية لتحسين كفاءة التصنيف ضمن نوافذ زمنية أقصر للقرار، وهو متطلب حاسم للتطبيقات العملية. بينما تستخدم الدراسة الحالية إشارات EEG من 64 قناة، تشير إلى أن طرق تقليل الأقطاب قد تحسن الكفاءة الحسابية. ومع ذلك، فإن إعداد التجربة، الذي يتضمن فقط متحدثين متنافسين، يتطلب مزيدًا من الاستكشاف لأداء AAD في بيئات سمعية أكثر تعقيدًا، مثل تلك التي تشبه السيناريوهات الواقعية مثل تأثير حفلة الكوكتيل.

الطرق

في هذا القسم، يصف المؤلفون المنهجية المستخدمة لتقييم طريقة الكشف عن الانتباه المقترحة (AAD) باستخدام ميزات الحالة الدقيقة (MS) وتحليل تكرار الكوانتيشن (RQA). تم إجراء تجربتين: كانت الأولى تهدف إلى تقييم إجراء AAD باستخدام ميزات MS وRQA بشكل مستقل، بينما قيمت الثانية كفاءة مجموعات مختلفة من هذه الميزات. تم تحليل بيانات EEG من 43 موضوعًا عبر 48 تجربة، مع استخراج أربعة أنواع من الحالات الدقيقة وثمانية أنواع من ميزات RQA من نوافذ غير متداخلة من 256 عينة. تم استخدام مصنفات، بما في ذلك الجيران الأقرب (KNN)، وآلة الدعم الناقل (SVM)، وذاكرة طويلة وقصيرة الأمد (LSTM)، وذاكرة LSTM ثنائية الاتجاه (Bi-LSTM)، ومصنف عام مع التعلم الكمي (GCQL)، للكشف عن الكلام المسموع مقابل غير المسموع. تم استخدام 70% من البيانات (34 تجربة) للتدريب، بينما كانت الـ 30% المتبقية مجموعة الاختبار.

في التجربة الثانية، جمع المؤلفون ميزات MS وRQA لتحليل متعدد المتغيرات لتعزيز أداء الكشف عن الانتباه من إشارات EEG. تم تقييم فعالية طريقة AAD المقترحة مقارنة بعدة أنظمة حديثة لتحديد الانتباه من الأدبيات، بما في ذلك تلك التي قدمها O’Sullivan وآخرون، Lu وآخرون، Ciccarelli وآخرون، Geirnaert وآخرون، Zakeri وآخرون، Cai وآخرون، وNiu وآخرون، والتي خدمت كنظم أساسية للمقارنة. تهدف هذه المقاربة الشاملة إلى تحديد مجموعات الميزات المثلى التي تحقق أداءً عاليًا في الكشف عن مستويات الانتباه بناءً على بيانات EEG.

النتائج

في هذا القسم، يقدم المؤلفون نتائج تجربتين تهدفان إلى تحسين الكشف عن الانتباه السمعي من خلال تحليل الحالة الديناميكية لنشاط الدماغ. في البداية، تم إجراء تحليل إحصائي على كل من الميزات الفردية والمتعددة المتغيرات لتحديد الفروقات المهمة بين المجموعتين، مع تعيين عتبة دلالة عند \( p < 0.05 \). بعد ذلك، تم استخدام منهجيات كلاسيكية وحديثة مختلفة، كما هو موضح في "مراجعة الأدبيات"، لتقييم فعالية كل مجموعة ميزات. بالإضافة إلى ذلك، تفحص الدراسة كيف تؤثر فترات مختلفة من مقاطع EEG على أداء طريقة الكشف عن الانتباه السمعي المقترحة (AAD). تتيح هذه المقاربة الشاملة فهمًا دقيقًا للميزات وأطوال المقاطع التي تحقق أفضل النتائج في الكشف عن الانتباه السمعي، مما يساهم في تقديم رؤى قيمة في مجال البحث العصبي الفسيولوجي.

المناقشة

تسلط قسم المناقشة في ورقة البحث الضوء على التقدم في فك تشفير الانتباه السمعي الانتقائي باستخدام إشارات EEG في بيئات متعددة المتحدثين. تستعرض الدراسات المهمة، بدءًا من Mesgarani وChang (2012)، الذين أظهروا أن EEG يمكن أن يفك تشفير الانتباه السمعي بفعالية من خلال إعادة بناء طيف الكلام من الاستجابات القشرية. أكدت الأبحاث اللاحقة، مثل تلك التي أجراها O’Sullivan وآخرون، على إمكانية بيانات EEG من تجربة واحدة للت correlate مع الأداء الانتباهي في البيئات الطبيعية. ومع ذلك، أظهرت طرق فك التشفير الخطية التقليدية، بما في ذلك وظائف الاستجابة الزمنية متعددة المتغيرات (mTRF)، قيودًا في الدقة، مما دفع لاستكشاف نماذج غير خطية، وخاصة الشبكات العصبية الالتفافية (CNNs)، لتعزيز الكشف عن الانتباه السمعي (AAD).

توضح الورقة أيضًا المنهجيات المستخدمة في الدراسة الحالية، باستخدام قاعدتين بيانات EEG متاحتين للجمهور (DTU وKUL) لتحليل الانتباه السمعي من خلال خوارزميات تعلم الآلة المختلفة، بما في ذلك الجيران الأقرب (KNN)، وآلة الدعم الناقل (SVM)، وشبكات الذاكرة طويلة وقصيرة الأمد (LSTM). تؤكد الدراسة على أهمية معالجة EEG وتحليل الحالة الدقيقة في التقاط الديناميات الزمنية لنشاط الدماغ. تشير النتائج إلى أنه بينما كافحت ميزات الحالة الدقيقة التقليدية لتحقيق دقة تصنيف عالية، فإن الجمع بين تحليل تكرار الكوانتيشن (RQA) مع مصنفات متقدمة مثل الشبكة العصبية الالتفافية ذات الوحدة التكرارية المغلقة (GRU-CNN) قد حسّن الأداء بشكل كبير، محققًا دقة تصل إلى 98.9% في الكشف عن المتحدثين المسموعين عبر أطوال مقاطع EEG المتنوعة. وهذا يشير إلى أن الطريقة المقترحة تقدم نهجًا قويًا للكشف عن الانتباه السمعي في الوقت الحقيقي، مع تداعيات للتطبيقات مثل سماعات الأذن المدفوعة بالعقل.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-58886-y
PMID: https://pubmed.ncbi.nlm.nih.gov/38632246
Publication Date: 2024-04-17
Author(s): MohammadReza EskandariNasab et al.
Primary Topic: EEG and Brain-Computer Interfaces

Overview

This research investigates auditory attention detection (AAD) by analyzing multichannel electroencephalography (EEG) signals, focusing on how listeners concentrate on a target speaker amidst competing talkers. The study employs microstate and recurrence quantification analyses to extract dynamic features that reflect brain state changes during cognitive tasks. An optimized feature set is identified through significant feature selection based on classification performance, leading to the development of a hybrid classifier model that integrates Gated Recurrent Units (GRU) and Convolutional Neural Networks (CNN). The proposed AAD method demonstrates superior performance compared to existing approaches, achieving an accuracy of 98.9% with the recurrence ratio extracted from the mean global field power using the GCQL classifier.

The findings indicate that the AAD-GCQL model effectively captures auditory attention without requiring access to auditory stimuli, addressing limitations in previous methodologies that relied solely on CNNs. The study highlights the potential for improved classification proficiency within shorter decision time windows, a critical requirement for practical applications. While the current research utilizes EEG signals from 64 channels, it suggests that electrode reduction methods could optimize computational efficiency. However, the experimental setup, which involves only two competing talkers, calls for further exploration of AAD performance in more complex auditory environments, such as those resembling real-world scenarios like the cocktail party effect.

Methods

In this section, the authors describe the methodology employed to evaluate the proposed Attention Detection (AAD) method utilizing microstate (MS) and recurrence quantification analysis (RQA) features. Two experiments were conducted: the first aimed to assess the AAD procedure using MS and RQA features independently, while the second evaluated the efficiency of various combinations of these features. The EEG data from 43 subjects across 48 trials were analyzed, with four types of microstates and eight types of RQA features extracted from non-overlapping windows of 256 samples. Classifiers, including K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), and Generalized Classifier with Quantum Learning (GCQL), were employed to detect attended versus unattended speech. For training, 70% of the data (34 trials) was utilized, with the remaining 30% serving as the test set.

In the second experiment, the authors combined MS and RQA features for multivariate analysis to enhance attention detection performance from EEG signals. The effectiveness of the proposed AAD method was benchmarked against several recently developed attention detection systems from the literature, including those by O’Sullivan et al., Lu et al., Ciccarelli et al., Geirnaert et al., Zakeri et al., Cai et al., and Niu et al., which served as baseline systems for comparison. This comprehensive approach aims to identify optimal feature combinations that yield high performance in detecting attention levels based on EEG data.

Results

In this section, the authors present the results of two experiments aimed at optimizing auditory attention detection through dynamic state analysis of brain activity. Initially, a statistical analysis was conducted on both single and multivariate features to identify significant differences between the two groups, with a significance threshold set at \( p < 0.05 \). Following this, various classical and modern methodologies, as outlined in the "Literature Review," were employed to evaluate the effectiveness of each feature set. Additionally, the study examines how different durations of EEG segments influence the performance of the proposed auditory attention detection (AAD) method. This comprehensive approach allows for a nuanced understanding of which features and segment lengths yield the best results in detecting auditory attention, thereby contributing valuable insights to the field of neurophysiological research.

Discussion

The discussion section of the research paper highlights the advancements in decoding selective auditory attention using EEG signals in multi-talker environments. It reviews significant studies, beginning with Mesgarani and Chang (2012), who demonstrated that EEG can effectively decode auditory attention by reconstructing speech spectrograms from cortical responses. Subsequent research, such as that by O’Sullivan et al., confirmed the potential of single-trial EEG data to correlate with attentional performance in naturalistic settings. However, traditional linear decoding methods, including multivariate temporal response functions (mTRF), have shown limitations in accuracy, prompting the exploration of non-linear models, particularly convolutional neural networks (CNNs), to enhance auditory attention detection (AAD).

The paper also details the methodologies employed in the current study, utilizing two publicly available EEG databases (DTU and KUL) to analyze auditory attention through various machine learning algorithms, including K-nearest neighbor (KNN), support vector machine (SVM), and long short-term memory (LSTM) networks. The study emphasizes the importance of EEG preprocessing and microstate analysis in capturing the temporal dynamics of brain activity. The results indicate that while traditional microstate features struggled to achieve high classification accuracy, the combination of recurrence quantification analysis (RQA) with advanced classifiers like the gated recurrent unit-convolutional neural network (GRU-CNN) significantly improved performance, achieving accuracies up to 98.9% in detecting attended speakers across varying EEG segment lengths. This suggests that the proposed method offers a robust approach for real-time auditory attention detection, with implications for applications such as neuro-steered hearing aids.