طريقة تجميع محسنة لشبكات الأعصاب التلافيفية A improved pooling method for convolutional neural networks

المجلة: Scientific Reports، المجلد: 14، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-51258-6
PMID: https://pubmed.ncbi.nlm.nih.gov/38238357
تاريخ النشر: 2024-01-18
المؤلف: Lei Zhao وآخرون
الموضوع الرئيسي: تطبيقات الشبكات العصبية المتقدمة

نظرة عامة

تقدم ورقة البحث طبقة تجميع جديدة، تُسمى T-Max-Avg، مصممة لتعزيز استخراج الميزات في الشبكات العصبية التلافيفية (CNNs). تؤدي طرق التجميع التقليدية، مثل التجميع الأقصى والتجميع المتوسط، غالبًا إلى فقدان المعلومات على الرغم من كفاءتها في تقليل الأبعاد المكانية. تُدخل طبقة التجميع T-Max-Avg معلمة عتبة $T$ التي تسمح باختيار أعلى $K$ بكسلات متفاعلة، مما يمكّن من نهج مرن يمكنه التعلم بشكل تكيفي من مجموعات بيانات محددة. تظهر النتائج التجريبية أن هذه الطريقة تتفوق على تقنيات التجميع التقليدية، محققة أعلى دقة على مجموعات بيانات CIFAR-10 وCIFAR-100 وMNIST عند دمجها في نموذج LeNet-5.

تشير النتائج إلى أن طريقة التجميع T-Max-Avg لا تتفوق فقط على التجميع المتوسط والتجميع الأقصى، بل تحسن أيضًا من طريقة Avg-TopK، مما يظهر استقرارًا أكبر وتوافقًا. تم تحديد المعلمات المثلى للصور الملونة والرمادية، مع أحجام تجمع محددة، وقيم $K$، وقيم $T$ التي تحقق أفضل أداء. ستركز الأعمال المستقبلية على تحسين تعقيد الوقت لطريقة T-Max-Avg مع الحفاظ على دقتها، بهدف تحقيق توازن بين الكفاءة الحسابية والأداء التنبؤي في تطبيقات التعلم العميق.

طرق

تعالج طريقة التجميع المقترحة T-Max-Avg قيود تقنيات التجميع التقليدية، وبشكل خاص التجميع الأقصى والتجميع المتوسط. بينما تعالج طريقة التجميع المتوسط جميع القيم بشكل متساوٍ، مما يؤدي إلى نتائج دون المستوى الأمثل عندما تكون القيم قريبة من الصفر، يختار التجميع الأقصى أعلى قيمة فقط، والتي يمكن أن تتأثر سلبًا بالبكسلات المزعجة. تتضمن طريقة T-Max-Avg معلمة $ T $ (تتراوح من 0 إلى 1) التي تسمح باختيار وزني بين الحد الأقصى والمتوسط لأعلى $ K $ من قيم البكسلات، مما يعزز تمثيل الميزات المهمة مع التخفيف من تأثير الضوضاء. يتم التعبير عن الصياغة الرياضية لهذه الطريقة كـ $ F(T-Max-Avg(X)) = \max \{Y_i\}_{i=0}^{K}, Y_i \geq T $ و $ \frac{1}{K} \sum_{i=1}^{K} Y_i, Y_i < T $. تظهر التقييمات التجريبية باستخدام نموذج CNN LeNet-5 عبر ثلاث مجموعات بيانات (CIFAR-10 وCIFAR-100 وMNIST) أن طريقة T-Max-Avg تتفوق باستمرار على طرق التجميع التقليدية. على سبيل المثال، تحقق تحسينًا متوسطًا في الدقة بنسبة 2.45% مقارنةً بـ Avg-TopK و3.44% مقارنةً بالتجميع الأقصى على مجموعة بيانات CIFAR-10 بحجم تجميع 2. تظهر الطريقة أيضًا تحسينات كبيرة في الدقة عبر تكوينات مختلفة، خاصة مع حجم تجميع 4 و$ K = 6 $، حيث تتفوق على طريقة Avg-TopK بنسبة 0.28% على CIFAR-10 و0.53% على CIFAR-100. بشكل عام، تُظهر طريقة T-Max-Avg أداءً متفوقًا في استخراج المعلومات الحيوية من الصور، مما يجعلها بديلًا واعدًا للتجميع في الشبكات العصبية التلافيفية.

نقاش

في قسم النقاش، يتم استعراض طرق التجميع المبتكرة كبدائل لتقنيات التجميع التقليدية في الشبكات العصبية التلافيفية (CNNs). من الجدير بالذكر أن التجميع الثنائي يلتقط علاقات الميزات من الدرجة الثانية من خلال المنتجات الخارجية، مما يعزز التفاعلات المكانية. قدم لي وآخرون تجميعًا هجينًا، وتجميعًا عبر بوابة، وتجميعًا شجريًا، والتي تتكيف ديناميكيًا مع عمليات التجميع بناءً على خصائص المدخلات. يركز التجميع الذي يحافظ على التفاصيل (DPP) على الحفاظ على التفاصيل الهيكلية باستخدام مرشحات ثنائية عكسية، بينما يستخدم adaPool دمجًا محددًا للمنطقة لتحسين احتفاظ التفاصيل عبر المهام. تعزز طريقة التجميع العالمية AlphaMEX دقة التصنيف دون إضافة معلمات زائدة، وتظهر طريقة التجميع القائمة على الانتباه الذاتي أداءً متفوقًا في مهام تصنيف الرسوم البيانية. تعالج كتلة التجميع المتجه (VPB) ونموذج Avg-TopK التجميع القيود المفروضة على طرق التجميع التقليدية من خلال اختيار ومتوسط أعلى قيم K من البكسلات.

تجمع طريقة التجميع المقترحة T-Max-Avg بين مزايا كل من التجميع الأقصى وAvg-TopK، مما يظهر أداءً محسنًا عبر نماذج مختلفة، بما في ذلك ChestX وResNet50، خاصة في مهام التصنيف باستخدام مجموعة بيانات CIFAR-10. تشير النتائج التجريبية إلى أن T-Max-Avg تتفوق على طرق التجميع التقليدية، محققة دقة واستقرار أعلى. تشير النتائج إلى أن المعلمات المثلى لـ T-Max-Avg تختلف مع خصائص مجموعة البيانات، مما يتطلب مزيدًا من البحث لتحسين هذه المعلمات. ستركز الأعمال المستقبلية على تعزيز كفاءة طريقة T-Max-Avg مع الحفاظ على دقتها، بهدف تحقيق توازن بين سرعة الحساب وأداء النموذج.

Journal: Scientific Reports, Volume: 14, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-51258-6
PMID: https://pubmed.ncbi.nlm.nih.gov/38238357
Publication Date: 2024-01-18
Author(s): Lei Zhao et al.
Primary Topic: Advanced Neural Network Applications

Overview

The research paper presents a novel pooling layer, termed T-Max-Avg, designed to enhance feature extraction in convolutional neural networks (CNNs). Traditional pooling methods, such as max and average pooling, often lead to information loss despite their efficiency in reducing spatial dimensions. The T-Max-Avg pooling layer introduces a threshold parameter $T$ that allows for the selection of the $K$ highest interacting pixels, enabling a flexible approach that can adaptively learn from specific datasets. Experimental results demonstrate that this method outperforms conventional pooling techniques, achieving the highest accuracy on CIFAR-10, CIFAR-100, and MNIST datasets when integrated into the LeNet-5 model.

The findings indicate that the T-Max-Avg pooling method not only surpasses average and max pooling but also improves upon the Avg-TopK method, showcasing greater stability and compatibility. Optimal parameters for color and grayscale images were identified, with specific pool sizes, $K$ values, and $T$ values yielding the best performance. Future work will focus on optimizing the time complexity of the T-Max-Avg method while preserving its accuracy, aiming for a balance between computational efficiency and predictive performance in deep learning applications.

Methods

The proposed T-Max-Avg pooling method addresses the limitations of traditional pooling techniques, specifically Max and Average pooling. While the Avg pooling method treats all values equally, leading to suboptimal results when values are close to zero, Max pooling selects only the highest value, which can be adversely affected by noisy pixels. The T-Max-Avg method incorporates a parameter $ T $ (ranging from 0 to 1) that allows for a weighted selection between the maximum and average of the top $ K $ pixel values, thus enhancing the representation of significant features while mitigating the impact of noise. The mathematical formulation of this method is expressed as $ F(T-Max-Avg(X)) = \max \{Y_i\}_{i=0}^{K}, Y_i \geq T $ and $ \frac{1}{K} \sum_{i=1}^{K} Y_i, Y_i < T $. Experimental evaluations using the LeNet-5 CNN model across three datasets (CIFAR-10, CIFAR-100, and MNIST) demonstrate that the T-Max-Avg method consistently outperforms traditional pooling methods. For instance, it achieves an average accuracy improvement of 2.45% over Avg-TopK and 3.44% over Max pooling on the CIFAR-10 dataset with a pooling size of 2. The method also shows significant enhancements in accuracy across various configurations, particularly with a pooling size of 4 and $ K = 6 $, where it surpasses the Avg-TopK method by 0.28% on CIFAR-10 and 0.53% on CIFAR-100. Overall, the T-Max-Avg method demonstrates superior performance in extracting critical information from images, making it a promising alternative for pooling in convolutional neural networks.

Discussion

In the discussion section, various innovative pooling methods are reviewed as alternatives to traditional pooling techniques in convolutional neural networks (CNNs). Notably, bilinear pooling captures second-order feature relationships through outer products, enhancing spatial interactions. Lee et al. introduced hybrid, gate, and tree pooling, which dynamically adapt pooling operations based on input characteristics. Detail preserving pooling (DPP) focuses on maintaining structural details using inverse binary filters, while adaPool employs region-specific fusion to improve detail retention across tasks. The AlphaMEX Global Pool method enhances classification accuracy without adding redundant parameters, and a self-attention-based graph pooling method demonstrates superior performance in graph classification tasks. The vector pooling block (VPB) and the Avg-TopK pooling model further address the limitations of conventional pooling methods by selecting and averaging the top K pixel values.

The proposed T-Max-Avg pooling method combines advantages from both max pooling and Avg-TopK, showing improved performance across various models, including ChestX and ResNet50, particularly in classification tasks using the CIFAR-10 dataset. Experimental results indicate that T-Max-Avg outperforms traditional pooling methods, achieving higher accuracy and stability. The findings suggest that optimal parameters for T-Max-Avg vary with dataset characteristics, necessitating further research to refine these parameters. Future work will focus on enhancing the efficiency of the T-Max-Avg method while preserving its accuracy, aiming for a balance between computational speed and model performance.