التعرف الفعال على أنشطة الإنسان على الأجهزة الطرفية باستخدام هياكل DeepConv LSTM Efficient human activity recognition on edge devices using DeepConv LSTM architectures

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-98571-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40263516
تاريخ النشر: 2025-04-22
المؤلف: Haotian Zhou وآخرون
الموضوع الرئيسي: أنظمة التعرف على النشاط المعتمد على السياق

نظرة عامة

يتناول هذا القسم من ورقة البحث نشر نماذج التعلم العميق الخفيفة للتعرف على أنشطة الإنسان (HAR) على الأجهزة ذات الموارد المحدودة باستخدام TinyML، مدفوعة بالتطورات في إنترنت الأشياء (IoT). صمم المؤلفون وقيموا ثلاثة نماذج: شبكة عصبية تلافيفية ثنائية الأبعاد (2D CNN)، شبكة عصبية تلافيفية أحادية الأبعاد (1D CNN)، وDeepConv LSTM. أظهرت DeepConv LSTM أداءً متفوقًا، حيث حققت دقة 98.24% ودرجة F1 تبلغ 98.23% من خلال التقاط الميزات المكانية والزمانية بشكل فعال.

بعد إجراء التكميم الكامل للأعداد الصحيحة، تم تقليل حجم النموذج بشكل كبير من 513.23 كيلوبايت إلى 136.51 كيلوبايت، مما مكن من نشره بنجاح على Arduino Nano 33 BLE Sense Rev2 عبر منصة Edge Impulse. أشارت مقاييس النشر إلى استخدام ذاكرة قدرها 29.1 كيلوبايت، واستخدام فلاش قدره 189.6 كيلوبايت، ومتوسط زمن استدلال قدره 21 مللي ثانية، مع طلب حسابي يقارب 0.01395 GOP وأداء حوالي 0.664 GOPS. ومن الجدير بالذكر أنه حتى بعد التكميم، احتفظ النموذج بدقة 97% ودرجة F1 تبلغ 97%، مما يبرز فعالية TinyML لأنظمة HAR ذات الكمون المنخفض والموارد الفعالة المناسبة للتطبيقات في الوقت الحقيقي.

الطرق

يستعرض قسم “الطرق” تصميم التجارب والتقنيات التحليلية المستخدمة في الدراسة. استخدم الباحثون نهجًا كميًا، حيث قاموا بإجراء تحليلات إحصائية لتقييم البيانات التي تم جمعها من تجارب مختلفة. تضمنت المنهجيات المحددة تجارب محكومة، حيث تم التلاعب بالمتغيرات بشكل منهجي لمراقبة تأثيراتها على النتائج ذات الصلة. شملت جمع البيانات مقاييس نوعية وكمية، مما يضمن فهمًا شاملاً للظواهر قيد التحقيق.

بالإضافة إلى ذلك، يتناول القسم استخدام أدوات إحصائية متقدمة، مثل تحليل الانحدار واختبار الفرضيات، لتفسير النتائج. تأكد الباحثون من موثوقية وصلاحية نتائجهم من خلال بروتوكولات اختبار صارمة وتكرار التجارب. بشكل عام، كانت الطرق المستخدمة مصممة لتوفير نتائج قوية وقابلة للتكرار، مما يساهم في موثوقية استنتاجات الدراسة.

النتائج

في قسم النتائج، تناقش الورقة تطبيق تقنيات الذكاء الاصطناعي القابل للتفسير (XAI) لتعزيز قابلية تفسير نماذج التعلم الآلي في مهام التعرف على أنشطة الإنسان (HAR). يعد XAI ضروريًا لفهم عمليات اتخاذ القرار للنماذج، مما يسمح للباحثين والمستخدمين بتحديد كيفية تمييز الأنشطة وتحديد مصادر التصنيف الخاطئ المحتملة. استخدم المؤلفون t-SNE (التضمين الجار stochastic الموزع) لتصور مساحة الميزات التي تعلمها نموذج DeepConv LSTM، مما يتيح رسم الميزات عالية الأبعاد على مستوى ثنائي الأبعاد مع الحفاظ على الهياكل المحلية للبيانات.

كشفت تصور t-SNE عن تجمعات متميزة للأنشطة مثل المشي والجري، مما يدل على قدرة النموذج على التمييز بين هذه الأنشطة بناءً على خصائص الحركة الفريدة لها. من ناحية أخرى، أظهرت الأنشطة “الذهاب للأسفل” و”الذهاب للأعلى” تداخلًا كبيرًا، مما يبرز الصعوبة في التمييز بينها بسبب أنماط التسارع والمشي المتشابهة. قد يفسر هذا التداخل بعض التصنيفات الخاطئة التي تم ملاحظتها في مصفوفة الالتباس. بشكل عام، أظهر النموذج قدرات قوية في استخراج الميزات، كما يتضح من التجمعات الواضحة والمفصولة لمعظم الأنشطة.

المناقشة

في هذه الدراسة، استخدم المؤلفون مجموعة بيانات من مختبر WISDM، حيث شارك 36 متطوعًا في أداء أنشطة متنوعة أثناء حملهم لهاتف ذكي يعمل بنظام Android. كشفت بيانات مقياس التسارع، المسجلة بمعدل 20 هرتز، عن عدم توازن في عينات الأنشطة، خاصة مع وجود حالات أقل من الأنشطة الثابتة مثل “الوقوف” و”الجلوس”. على الرغم من هذا التوازن، اختار المؤلفون عدم استخدام تقنيات زيادة العينات للحفاظ على توزيع البيانات الأصلي، والذي يجادلون بأنه يعزز تعميم النموذج ودقته. شمل معالجة البيانات إزالة القيم الفارغة، وترتيبها حسب المستخدم والطابع الزمني، واستخدام نهج النافذة المنزلقة مع تداخل بنسبة 50% لضمان الاستمرارية والاستخراج الفعال للميزات. قامت الدراسة بالتحقق من أحجام النوافذ المختلفة، ووجدت في النهاية أن نافذة مدتها 5 ثوانٍ تعزز الدقة واستخراج الميزات.

طور المؤلفون ثلاثة نماذج للتعرف على أنشطة الإنسان: 1D CNN، 2D CNN، ونموذج DeepConv LSTM. حقق 1D CNN دقة قدرها 96.88%، بينما وصل 2D CNN إلى 96.10%. تفوق نموذج DeepConv LSTM على كلاهما، محققًا دقة 98.24% والتقاط الميزات المكانية والزمانية بشكل فعال. تم تقييم النماذج باستخدام مقاييس مثل الدقة، والدقة، والاسترجاع، ودرجة F1، حيث أظهر DeepConv LSTM أداءً متفوقًا في التمييز بين الأنشطة، خاصة في تقليل التصنيفات الخاطئة بين الأنشطة المتشابهة مثل “الذهاب للأعلى” و”الذهاب للأسفل”. تسلط الدراسة الضوء على أهمية بنية النموذج في تعزيز قدرات التعرف وتتناول التحديات التي تطرحها الفروق الدقيقة في الأنشطة.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-98571-2
PMID: https://pubmed.ncbi.nlm.nih.gov/40263516
Publication Date: 2025-04-22
Author(s): Haotian Zhou et al.
Primary Topic: Context-Aware Activity Recognition Systems

Overview

This research paper section discusses the deployment of lightweight deep learning models for human activity recognition (HAR) on resource-constrained hardware using TinyML, driven by the advancements in the Internet of Things (IoT). The authors designed and evaluated three models: a 2D Convolutional Neural Network (2D CNN), a 1D Convolutional Neural Network (1D CNN), and a DeepConv LSTM. The DeepConv LSTM demonstrated superior performance, achieving an accuracy of 98.24% and an F1 score of 98.23% by effectively capturing both spatial and temporal features.

Following full integer quantization, the model’s size was significantly reduced from 513.23 KB to 136.51 KB, enabling successful deployment on the Arduino Nano 33 BLE Sense Rev2 via the Edge Impulse platform. The deployment metrics indicated a memory usage of 29.1 KB, flash usage of 189.6 KB, and an average inference time of 21 milliseconds, with a computational demand of approximately 0.01395 GOP and a performance of around 0.664 GOPS. Notably, even after quantization, the model retained an accuracy of 97% and an F1 score of 97%, underscoring the efficacy of TinyML for low-latency and resource-efficient HAR systems suitable for real-time applications.

Methods

The “Methods” section outlines the experimental design and analytical techniques employed in the study. The researchers utilized a quantitative approach, employing statistical analyses to evaluate the data collected from various experiments. Specific methodologies included controlled experiments, where variables were systematically manipulated to observe their effects on the outcomes of interest. Data collection involved both qualitative and quantitative measures, ensuring a comprehensive understanding of the phenomena under investigation.

Additionally, the section details the use of advanced statistical tools, such as regression analysis and hypothesis testing, to interpret the results. The researchers ensured the reliability and validity of their findings through rigorous testing protocols and replication of experiments. Overall, the methods employed were designed to provide robust and reproducible results, contributing to the reliability of the study’s conclusions.

Results

In the section on results, the paper discusses the application of Explainable AI (XAI) techniques to enhance the interpretability of machine learning models in human activity recognition (HAR) tasks. XAI is essential for understanding the decision-making processes of models, allowing researchers and users to identify how activities are differentiated and to pinpoint potential misclassification sources. The authors utilized t-SNE (t-Distributed Stochastic Neighbor Embedding) for visualizing the feature space learned by the DeepConv LSTM model, effectively mapping high-dimensional features to a two-dimensional plane while maintaining local data structures.

The t-SNE visualization revealed distinct clusters for activities such as Walking and Jogging, indicating the model’s capability to differentiate between these activities based on their unique motion characteristics. Conversely, the activities Downstairs and Upstairs showed significant overlap, highlighting the difficulty in distinguishing between them due to similar acceleration and gait patterns. This overlap may account for some misclassifications noted in the confusion matrix. Overall, the model exhibited strong feature extraction abilities, as evidenced by the clear and well-separated clusters for most activities.

Discussion

In this study, the authors utilized a dataset from the WISDM Lab, involving 36 volunteers performing various activities while carrying an Android smartphone. The accelerometer data, recorded at 20 Hz, revealed an imbalance in activity samples, particularly with fewer instances of static activities like “standing” and “sitting.” Despite this imbalance, the authors opted against oversampling techniques to maintain the original data distribution, which they argue enhances model generalization and accuracy. Data preprocessing included removing null values, sorting by user and timestamp, and employing a sliding window approach with 50% overlap to ensure continuity and effective feature extraction. The study validated different window sizes, ultimately finding that a 5-second window optimized accuracy and feature extraction.

The authors developed three models for human activity recognition: a 1D CNN, a 2D CNN, and a DeepConv LSTM model. The 1D CNN achieved an accuracy of 96.88%, while the 2D CNN reached 96.10%. The DeepConv LSTM model outperformed both, achieving an accuracy of 98.24% and effectively capturing both spatial and temporal features. The models were evaluated using metrics such as accuracy, precision, recall, and F1 score, with the DeepConv LSTM demonstrating superior performance in distinguishing between activities, particularly in reducing misclassifications between similar activities like “Upstairs” and “Downstairs.” The study highlights the importance of model architecture in enhancing recognition capabilities and addresses the challenges posed by subtle activity distinctions.