إطار عمل متعدد المراحل لتصنيف أصوات الخنازير في بيئات مزارع الحيوانات المزعجة A multi-stage ensemble framework for classifying pig vocalizations under noisy animal farm environments

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-17205-9
PMID: https://pubmed.ncbi.nlm.nih.gov/41053224
تاريخ النشر: 2025-10-06
المؤلف: Seyeon Chung وآخرون
الموضوع الرئيسي: دراسات سلوك الحيوان ورفاهيته

نظرة عامة

تقدم البحث نموذج تصنيف متعدد المراحل لتواصل الخنازير (PVMC)، وهو إطار عمل جديد يهدف إلى اكتشاف وتصنيف مجموعة واسعة من أصوات الخنازير لتقييم الصحة والضغط العاطفي في ظروف الزراعة الواقعية. على عكس الدراسات السابقة التي ركزت بشكل أساسي على أنماط الصوت المعزولة، مثل السعال، يستخدم نموذج PVMC نهجًا متعدد المراحل يدمج اكتشاف السعال والصراخ مع تصنيف الحالة العاطفية. تشمل الميزات الرئيسية للنموذج تعزيز المتانة ضد اختلافات مدة الأصوات ومستويات الضوضاء، وهياكل مخصصة لكل مرحلة معالجة، واستراتيجية تعلم جماعي تجمع بين Wav2Vec2 وAudio Spectrogram Transformer (AST) لتحسين الأداء. حقق النموذج مقاييس ملحوظة، بما في ذلك تحسين نسبة الإشارة إلى الضوضاء (SNR) تصل إلى 4.9 ديسيبل، ودقة 95.80% في تقسيم الأصوات، ودقة 98.88% في تصنيف الأصوات الرئيسية، ودقة 92.15% في اكتشاف الحالة العاطفية.

في الختام، يظهر نموذج PVMC إمكانات كبيرة لمراقبة أصوات الخنازير في الوقت الفعلي، مما يساهم في أنظمة إدارة الماشية الذكية التي تركز على رفاهية الحيوانات. يبرز البحث الحاجة إلى مزيد من التحسين لضمان النشر الفعال في بيئات الزراعة المتنوعة، مع التركيز على تطوير إطار عمل شامل يدمج تقليل الضوضاء، واستخراج الميزات، والتصنيف في خط أنابيب متماسك. كما يتم التأكيد على تعزيز قدرات تعميم النموذج من خلال مجموعات بيانات متنوعة وتقنيات تكيف متقدمة، مع السعي في النهاية لتحسين الكشف المبكر عن القضايا الصحية والمرتبطة بالضغط في الخنازير مع تسهيل الكفاءة التشغيلية في الزراعة الدقيقة.

الطرق

يستعرض قسم “الطرق” المواد والمنهجيات المستخدمة في البحث. يوضح المواد المحددة المستخدمة، بما في ذلك أي مواد كيميائية، عينات بيولوجية، أو معدات ضرورية للتجارب. كما يصف القسم تصميم التجربة، بما في ذلك الإجراءات المتبعة، والضوابط المنفذة، والتحليلات الإحصائية التي أجريت لضمان صحة وموثوقية النتائج.

بالإضافة إلى ذلك، قد تتضمن الطرق معلومات عن تقنيات أخذ العينات، وعمليات جمع البيانات، وأي أدوات أو برامج حاسوبية مستخدمة للتحليل. يضمن هذا النهج الشامل أن البحث يمكن تكراره وأن النتائج موثوقة وصحيحة علميًا.

النتائج

يقدم قسم النتائج من ورقة البحث تقييمًا شاملاً لطريقة جديدة لتحليل أصوات الخنازير، مع التركيز على تصفية الضوضاء، والتقسيم، ومرحلة التصنيف. تم إجراء التجارب على محطة عمل عالية الأداء، باستخدام مكتبات بايثون متنوعة. سلط تقييم تصفية الضوضاء الضوء على قيود نموذج SEGAN، الذي حقق مقاييس أداء دون المستوى (CBAK = -7.51، CSIG = 2.49، PESQ = 1.10، SNR = 3.15 ديسيبل) في البيئات الزراعية المليئة بالضوضاء. بالمقابل، حسّن نموذج SEGAN+ المعزز هذه المقاييس بشكل كبير (CBAK = 18، CSIG = 28، PESQ = 4.16، SNR = 23.8 ديسيبل)، مما يدل على قدراته الفائقة في كبح الضوضاء وأهمية تقليل الضوضاء الفعالة في مراحل التحليل اللاحقة.

في مرحلة التقسيم، قارن البحث بين نماذج الإدخال الفردي والمتعدد لتقسيم نشاط أصوات الخنازير (PVAS). كان نموذج الإدخال المتعدد، الذي جمع بين ميزات صوتية متعددة، يتفوق باستمرار على نموذج الإدخال الفردي، خاصة في البيئات الزراعية المعقدة حيث تتداخل الضوضاء الخلفية مع اكتشاف الأصوات. كشف تقييم نماذج التصنيف لتحليل أصوات الخنازير في الوقت الفعلي أن SqueezeNet وShuffleNet-V1 قدما أفضل توازن بين الدقة والكفاءة الحاسوبية، حيث تم اختيار SqueezeNet كنموذج مثالي نظرًا لدقته الفائقة في تمييز الأصوات الدقيقة. أخيرًا، أدى دمج البيانات الاصطناعية والنماذج المتخصصة في الصوت المتقدمة، بما في ذلك مجموعة من AST وWav2Vec2، إلى تحسين تصنيف الحالات العاطفية الغامضة، محققًا درجة F1 تبلغ 92.01%. عالج هذا النهج الجماعي بشكل فعال التحديات المتعلقة بتصنيف الحالات العاطفية للخنازير بدقة، مما يبرز إمكاناته للتطبيقات في الوقت الفعلي في الزراعة الدقيقة.

المناقشة

يمثل نموذج PVMC المقترح إطار عمل شامل لتصنيف متعدد المراحل يهدف إلى مراقبة أصوات الخنازير بشكل فعال لتقييم الصحة والسلوك في الزراعة. يتكون الإطار من أربع مراحل: (1) معالجة بيانات الصوت لتحسين وضوح الإشارة باستخدام SEGAN+ لتقليل الضوضاء؛ (2) تقسيم الصوت المكرر لعزل الأصوات ذات الصلة؛ (3) تصنيف هذه المقاطع إلى فئات مثل السعال، والصراخ، والأصوات السياقية؛ و(4) تصنيف الأصوات السياقية إلى حالات عاطفية مثل الهدوء، والتغذية، والخوف، والقلق. يستخدم النموذج تقنيات التعلم الجماعي، مدمجًا AST وWav2Vec2، لتحسين دقة التصنيف، خاصة في البيئات الصوتية الصعبة.

على الرغم من تحقيق نتائج واعدة، بما في ذلك استدعاء ودقة أعلى مقارنة بالنماذج السابقة، تواجه إطار عمل PVMC قيودًا بسبب هيكله المعياري، مما يحد من تحسين النهاية إلى النهاية وقد يؤثر على كفاءة النشر في الوقت الفعلي. بالإضافة إلى ذلك، كانت مجموعة البيانات المستخدمة محدودة في الحجم وتوازن الفئات، مما أدى إلى تصنيف خاطئ في الفئات العاطفية المتداخلة. ستركز الأبحاث المستقبلية على إنشاء مجموعة بيانات أكبر وأكثر تمثيلًا، ودمج مكونات النظام في خط أنابيب موحد، واستكشاف تقنيات التعلم التكيفية لتعزيز تعميم النموذج عبر ظروف الزراعة المتنوعة. تهدف هذه التطورات إلى تأسيس نموذج PVMC كأداة موثوقة لمراقبة رفاهية الحيوانات في الوقت الفعلي وإدارة الماشية الذكية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-17205-9
PMID: https://pubmed.ncbi.nlm.nih.gov/41053224
Publication Date: 2025-10-06
Author(s): Seyeon Chung et al.
Primary Topic: Animal Behavior and Welfare Studies

Overview

The research presents the Pig Vocalization Multi-stage Classification (PVMC) model, a novel framework aimed at detecting and classifying a wide array of pig vocalizations to assess health and emotional stress in real-world farming conditions. Unlike previous studies that primarily focused on isolated vocal patterns, such as coughs, the PVMC model employs a multi-stage approach that integrates cough and scream detection with emotional state classification. Key features of the model include enhanced robustness against varying vocalization durations and noise levels, customized architectures for each processing stage, and an ensemble learning strategy that combines Wav2Vec2 and Audio Spectrogram Transformer (AST) to improve performance. The model achieved significant metrics, including a signal-to-noise ratio (SNR) improvement of up to 4.9 dB, 95.80% accuracy in vocalization segmentation, 98.88% accuracy in key vocalization classification, and 92.15% accuracy in emotional state detection.

In conclusion, the PVMC model demonstrates substantial potential for real-time pig vocalization monitoring, contributing to intelligent livestock management systems focused on animal welfare. The study highlights the need for further refinement to ensure effective deployment in diverse farming environments, with future work aimed at developing an end-to-end framework that integrates noise reduction, feature extraction, and classification into a cohesive pipeline. Enhancing the model’s generalization capabilities through diverse datasets and advanced domain adaptation techniques is also emphasized, ultimately striving to improve early detection of health and stress-related issues in pigs while facilitating operational efficiency in precision livestock farming.

Methods

The “Methods” section outlines the materials and methodologies employed in the research. It details the specific materials used, including any reagents, biological samples, or equipment necessary for the experiments. The section also describes the experimental design, including the procedures followed, the controls implemented, and the statistical analyses conducted to ensure the validity and reliability of the results.

Additionally, the methods may include information on the sampling techniques, data collection processes, and any computational tools or software utilized for analysis. This comprehensive approach ensures that the research can be replicated and that the findings are robust and scientifically sound.

Results

The results section of the research paper presents a comprehensive evaluation of a novel method for analyzing pig vocalizations, focusing on noise filtering, segmentation, and classification stages. The experiments were conducted on a high-performance workstation, utilizing various Python libraries. The evaluation of noise filtering highlighted the limitations of the SEGAN model, which achieved suboptimal performance metrics (CBAK = -7.51, CSIG = 2.49, PESQ = 1.10, SNR = 3.15 dB) in noisy agricultural environments. In contrast, the enhanced SEGAN+ model significantly improved these metrics (CBAK = 18, CSIG = 28, PESQ = 4.16, SNR = 23.8 dB), demonstrating its superior noise suppression capabilities and the importance of effective noise reduction in subsequent analysis stages.

In the segmentation phase, the study compared single-input and multi-input models for pig vocalization activity segmentation (PVAS). The multi-input model, which combined multiple acoustic features, consistently outperformed the single-input model, particularly in complex farm environments where background noise interfered with vocalization detection. The evaluation of classification models for real-time pig sound analysis revealed that SqueezeNet and ShuffleNet-V1 provided the best balance of accuracy and computational efficiency, with SqueezeNet being selected as the optimal model due to its superior precision in distinguishing nuanced vocalizations. Finally, the integration of synthetic data and advanced audio-specialized models, including an ensemble of AST and Wav2Vec2, led to improved classification of ambiguous emotional states, achieving an F1-score of 92.01%. This ensemble approach effectively addressed the challenges of accurately classifying pig emotional states, underscoring its potential for real-time applications in precision livestock farming.

Discussion

The proposed PVMC model represents a comprehensive multi-stage classification framework aimed at effectively monitoring pig vocalizations for health and behavioral assessment in livestock farming. The framework consists of four stages: (1) preprocessing audio data to enhance signal clarity using SEGAN+ for noise suppression; (2) segmenting the refined audio to isolate relevant vocalizations; (3) classifying these segments into categories such as coughs, screams, and contextual vocalizations; and (4) further categorizing contextual vocalizations into emotional states like calm, feeding, frightened, and anxious. The model employs ensemble learning techniques, integrating AST and Wav2Vec2, to improve classification accuracy, particularly in acoustically challenging environments.

Despite achieving promising results, including higher recall and precision compared to previous models, the PVMC framework faces limitations due to its modular architecture, which restricts end-to-end optimization and may affect real-time deployment efficiency. Additionally, the dataset utilized was limited in size and class balance, leading to misclassification in overlapping emotional categories. Future research will focus on creating a more extensive and representative dataset, integrating system components into a unified pipeline, and exploring adaptive learning techniques to enhance model generalization across diverse farming conditions. These advancements aim to establish the PVMC model as a reliable tool for real-time animal welfare monitoring and intelligent livestock management.