تعزيز دمج الميزات مع نظام التعرف على الإيماءات اليدوية لتمكين الوصول إلى لغة الإشارة لمساعدة الأفراد ذوي الإعاقة السمعية والنطق Enhanced feature fusion with hand gesture recognition system for sign language accessibility to aid hearing and speech impaired individuals

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-34100-5
PMID: https://pubmed.ncbi.nlm.nih.gov/41484456
تاريخ النشر: 2026-01-02
المؤلف: Najm Alotaibi وآخرون
الموضوع الرئيسي: أنظمة التعرف على حركات اليد

نظرة عامة

تستعرض هذه القسم التحديات التي يواجهها الأفراد ذوو الإعاقات الفكرية أو التواصلية في تفاعلاتهم، مع التأكيد على الحاجة إلى أنظمة تواصل فعالة. لمعالجة هذه التحديات، يقدم البحث نظام تواصل يترجم لغة الإشارة إلى نص وصوت، مستفيدًا من تقنية التعرف على إيماءات اليد الديناميكية (HGR) كعنصر رئيسي في تفاعل الإنسان مع الكمبيوتر (HCI). يُلاحظ الاهتمام المتزايد في HGR، خاصة في تطبيقاته عبر مجالات مختلفة مثل التعليم، والواقع الافتراضي، والأجهزة المحمولة، مدفوعًا بالتقدم في الذكاء الاصطناعي (AI) ورؤية الكمبيوتر (CV).

يقدم البحث بشكل خاص نموذج التعرف على إيماءات اليد المعتمد على دمج الميزات للوصول إلى لغة الإشارة باستخدام خوارزمية تحسين تورنادو (FFHGR-SLATOA). تهدف هذه الطريقة المبتكرة المعتمدة على التعلم العميق إلى تحسين إمكانية الوصول إلى التواصل للأفراد الذين يعانون من ضعف السمع والكلام، مما يبرز إمكانيات أنظمة التعرف المتقدمة على الإيماءات لتسهيل التفاعل والفهم الأفضل في الحياة اليومية.

مقدمة

تسلط المقدمة الضوء على القضية الملحة لضعف السمع والكلام، التي تؤثر على 466 مليون فرد على مستوى العالم، مع توقعات تشير إلى أن هذا العدد قد يرتفع إلى 900 مليون بحلول عام 2050، وفقًا لمنظمة الصحة العالمية (WHO). تعتبر لغة الإشارة أداة تواصل حيوية لمجتمع ذوي الإعاقة السمعية، مما يسهل التفاعل مع الأقران والسكان العامين. ومع ذلك، تنشأ التحديات في التعرف على لغة الإشارة من تعقيدات الحصول على بيانات إدخال مناسبة، تتأثر بالعوامل البيئية والتنوع الفطري للغة الإشارة نفسها.

يؤكد البحث على أهمية التواصل القائم على الإيماءات في تفاعل الإنسان مع الكمبيوتر (HCI) وتفاعل الإنسان مع الروبوت (HRI)، مشيرًا إلى أنه بينما تقدم تقنية التعرف على الإيماءات المعتمدة على الفيديو (GR) تحديات مثل فصل الخلفية وتغيرات الإضاءة، فإن القفازات البيانية، رغم دقتها، مكلفة ومقيدة. لسد الفجوة في التواصل لأولئك غير المألوفين بلغة الإشارة، يقترح المؤلفون نموذجًا مبتكرًا يعتمد على التعلم العميق، وهو التعرف على إيماءات اليد المعتمد على دمج الميزات للوصول إلى لغة الإشارة باستخدام خوارزمية تحسين تورنادو (FFHGR-SLATOA). يهدف هذا النموذج إلى تعزيز إمكانية الوصول إلى التواصل للأفراد ذوي ضعف السمع والكلام من خلال تقنيات معالجة الصور المتقدمة، وطرق استخراج الميزات، وشبكة اعتقاد عميقة للتصنيف، مما يستفيد في النهاية من خوارزمية تحسين التورنادو لضبط المعلمات.

الطرق

في هذا القسم، يقدم المؤلفون تقنية FFHGR-SLATOA التي تهدف إلى تعزيز قدرات التعرف على السمع والكلام. يتم التحقق من أداء المنهجية باستخدام مجموعة بيانات GR، حيث يتم تقييم قيم الخسارة للتدريب (TRAN) والتحقق (VALD) على مدى 0-3000 دورة. تشير النتائج إلى انخفاض مستمر في قيم الخسارة، مما يظهر فعالية النموذج في تحقيق توازن بين ملاءمة البيانات والتعميم.

تكشف التحليلات المقارنة أن نموذج FFHGR-SLATOA يتفوق بشكل كبير على المنهجيات الحالية على كل من مجموعات بيانات GR وSign Language MNIST. على وجه التحديد، يحقق دقة تبلغ 97.56%، ودقة تبلغ 97.79%، واسترجاع يبلغ 97.50%، ودرجة F1 تبلغ 97.74%، متجاوزًا أداء نماذج CNN وRNN وLSTM وConv LSTM وGRU-LSTM. بالإضافة إلى ذلك، تُظهر دراسة الإزالة أن مجموعات مختلفة من الشبكات العميقة القائمة على الاعتقاد (DBN) مع هياكل مختلفة تحقق مقاييس أداء أقل مقارنة بنموذج FFHGR-SLATOA، الذي يحقق دقة إجمالية تبلغ 99.14%، ودقة تبلغ 97.85%، واسترجاع يبلغ 97.85%، ودرجة F1 تبلغ 97.85%. يبرز هذا كفاءة وفعالية المنهجية المقترحة.

المناقشة

تعزز طريقة FFHGR-SLATOA بشكل كبير التعرف على إيماءات اليد (HGR) من خلال اعتماد نهج متعدد الجوانب يتضمن معالجة الصور، واستخراج الميزات، والتصنيف، وتحسين المعلمات. في البداية، تستخدم الطريقة فلترًا وسطيًا (MF) لمعالجة الصور، مما يقلل بشكل فعال من الضوضاء مع الحفاظ على تفاصيل الحواف الحيوية اللازمة للتعرف الدقيق على الإيماءات. هذه الخطوة حاسمة لضمان بيانات إدخال عالية الجودة، مما يحسن بعد ذلك من استخراج الميزات ودقة التصنيف ضمن عملية التعرف على الإيماءات.

يدمج النموذج دمجًا من الشبكات العصبية التلافيفية المتقدمة (CNNs)—تحديدًا ConvNeXt Base وVGG16 وEfficientNet-V2—لاستخراج ميزات متنوعة وغنية من مدخلات الإيماءات. يعزز هذا الدمج قدرة النموذج على التقاط الأنماط المكانية الجوهرية، مما يؤدي إلى تحسين دقة التعرف. بالإضافة إلى ذلك، يسمح استخدام شبكة اعتقاد عميقة (DBN) للتصنيف الهرمي للنموذج بالتقاط العلاقات المعقدة بين الميزات بشكل فعال، مما يعزز قدرات اتخاذ القرار. يعمل تحسين معلمات النموذج من خلال خوارزمية تحسين التورنادو (TOA) على تحسين أداء التصنيف، مما يضمن التعرف على الإيماءات بشكل موثوق وفعال عبر ظروف متنوعة. بشكل عام، تبرز منهجية FFHGR-SLATOA من خلال تكاملها المبتكر لدمج ميزات متعددة من CNN، وتصنيف قائم على DBN، وتحسين مدفوع بـ TOA، مما ينتج عنه إطار عمل للتعرف عالي الكفاءة وقابل للتوسع.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-34100-5
PMID: https://pubmed.ncbi.nlm.nih.gov/41484456
Publication Date: 2026-01-02
Author(s): Najm Alotaibi et al.
Primary Topic: Hand Gesture Recognition Systems

Overview

The section outlines the challenges faced by individuals with intellectual or communication disabilities in their interactions, emphasizing the need for effective communication systems. To address these challenges, the paper introduces a communication system that translates sign language into text and speech, leveraging dynamic hand gesture recognition (HGR) as a key component of human-computer interaction (HCI). The growing interest in HGR is noted, particularly in its applications across various domains such as education, virtual reality, and mobile devices, driven by advancements in artificial intelligence (AI) and computer vision (CV).

The paper specifically presents the Feature Fusion-based Hand Gesture Recognition for Sign Language Accessibility using the Tornado Optimisation Algorithm (FFHGR-SLATOA) model. This innovative deep learning-based approach aims to improve communication accessibility for individuals who are hearing- and speech-impaired, highlighting the potential of advanced gesture recognition systems to facilitate better interaction and understanding in everyday life.

Introduction

The introduction highlights the pressing issue of hearing and speech impairment, affecting 466 million individuals globally, with projections suggesting this number could rise to 900 million by 2050, according to the World Health Organization (WHO). Sign language serves as a vital communication tool for the hearing-impaired community, facilitating interaction with both peers and the general population. However, challenges in sign language recognition arise from the complexities of acquiring suitable input data, influenced by environmental factors and the inherent variability of sign language itself.

The paper emphasizes the importance of gesture-based communication in human-computer interaction (HCI) and human-robot interaction (HRI), noting that while video-based gesture recognition (GR) presents challenges such as background separation and lighting variations, data gloves, though precise, are costly and restrictive. To bridge the communication gap for those unfamiliar with sign language, the authors propose a novel deep learning-based model, Feature Fusion-based Hand Gesture Recognition for Sign Language Accessibility using the Tornado Optimisation Algorithm (FFHGR-SLATOA). This model aims to enhance communication accessibility for hearing-and speech-impaired individuals through advanced image processing techniques, feature extraction methods, and a deep belief network for classification, ultimately leveraging the tornado optimization algorithm for parameter tuning.

Methods

In this section, the authors introduce the FFHGR-SLATOA technique aimed at enhancing hearing and speech recognition capabilities. The methodology’s performance is validated using the GR dataset, where loss values for training (TRAN) and validation (VALD) are assessed over 0-3000 epochs. The results indicate a consistent decrease in loss values, demonstrating the model’s effectiveness in achieving a balance between data fitting and generalization.

Comparative analyses reveal that the FFHGR-SLATOA model significantly outperforms existing methodologies on both the GR and Sign Language MNIST datasets. Specifically, it achieves an accuracy of 97.56%, precision of 97.79%, recall of 97.50%, and F1 score of 97.74%, surpassing the performance of CNN, RNN, LSTM, Conv LSTM, and GRU-LSTM models. Additionally, an ablation study shows that various combinations of Deep Belief Networks (DBN) with different architectures yield lower performance metrics compared to the FFHGR-SLATOA model, which achieves an overall accuracy of 99.14%, precision of 97.85%, recall of 97.85%, and F1 score of 97.85%. This highlights the superior efficiency and effectiveness of the proposed methodology.

Discussion

The FFHGR-SLATOA method significantly enhances hand gesture recognition (HGR) by employing a multifaceted approach that includes image pre-processing, feature extraction, classification, and parameter optimization. Initially, the method utilizes a median filter (MF) for image pre-processing, which effectively reduces noise while preserving critical edge details essential for accurate gesture recognition. This step is crucial for ensuring high-quality input data, which subsequently improves feature extraction and classification accuracy within the gesture recognition process.

The model integrates a fusion of advanced convolutional neural networks (CNNs)—specifically ConvNeXt Base, VGG16, and EfficientNet-V2—to extract diverse and rich features from gesture inputs. This fusion enhances the model’s capability to capture intrinsic spatial patterns, leading to improved recognition accuracy. Additionally, the use of a Deep Belief Network (DBN) for hierarchical classification allows the model to effectively capture complex feature relationships, thereby enhancing decision-making capabilities. The optimization of model parameters through the Tornado Optimization Algorithm (TOA) further refines classification performance, ensuring robust and reliable gesture recognition across varying conditions. Overall, the FFHGR-SLATOA methodology stands out for its innovative integration of multi-CNN feature fusion, DBN-based classification, and TOA-driven optimization, resulting in a highly efficient and scalable recognition framework.