الكشف التلقائي عن العرج في الأبقار الحلوب باستخدام الفيديو وتقدير الوضعية وخصائص الحركة المتعددة Video-based automatic lameness detection of dairy cows using pose estimation and multiple locomotion traits

عربي
English

المجلة: Computers and Electronics in Agriculture، المجلد: 223
DOI: https://doi.org/10.1016/j.compag.2024.109040
تاريخ النشر: 2024-06-04

الكشف التلقائي عن العرج في الأبقار الحلوب باستخدام الفيديو وتقدير الوضعية وخصائص الحركة المتعددة

هيلينا روسيلو, ريك فان دير تول, مينو هولزهاور, إلديرت ج. فان هينتن, جيرت كوتسترامجموعة هندسة النظم الزراعية، جامعة فاجينينغنالبحث، فاجينينغن، هولنداقسم صحة المجترات، Royal GD AH، ديفينتر، هولندا

الملخص

تقدم هذه الدراسة نظام كشف تلقائي عن العرج يستخدم تقنيات معالجة الصور العميقة لاستخراج خصائص الحركة المتعددة المرتبطة بالعرج. باستخدام نموذج تقدير الوضع T-LEAP، تم استخراج حركة تسع نقاط رئيسية من مقاطع الفيديو للأبقار التي تمشي. تم تسجيل مقاطع الفيديو في الهواء الطلق، مع ظروف إضاءة متغيرة، واستخرج T-LEAPمن النقاط الرئيسية الصحيحة. ثم تم استخدام مسارات النقاط الرئيسية لحساب ست خصائص حركة: قياس وضع الظهر، اهتزاز الرأس، مسافة التتبع، طول الخطوة، مدة الوقوف، ومدة التأرجح. كانت الخصائص الثلاث الأكثر أهمية هي قياس وضع الظهر، اهتزاز الرأس، ومسافة التتبع. بالنسبة للحقيقة الأرضية، أظهرنا أن دمج درجات المراقبين بشكل مدروس يمكن أن يحسن موثوقية واتفاق المراقب الداخلي. أظهرنا أن تضمين خصائص الحركة المتعددة يحسن دقة التصنيف منمع خاصية واحدة فقط إلىمع الخصائص الثلاث الأكثر أهمية وإلىمع جميع خصائص الحركة الست.

الكلمات المفتاحية: العرج، الكشف، الأبقار، الحركة، تقدير الوضع، التعلم العميق

15 أغسطس 2025

1. المقدمة

العرج هو اضطراب مؤلم في المشي في الأبقار الحلوب وغالبًا ما يتميز بحركة غير طبيعية للبقرة.

تقدير مراجعة أدبية حديثة [1] انتشار العرج العالمي بـ

، مع تغير طفيف في السنوات الثلاثين الماضية. للعرج تأثير سلبي على الرفاهية 2 ويؤدي إلى خسائر اقتصادية كبيرة [3] بسبب انخفاض إنتاج الحليب والتكاثر [4] بالإضافة إلى الذبح المبكر [3]. بينما يتم تقييم العرج عادةً من قبل مراقبين مدربين يقومون بتسجيل الحركة بصريًا للقطيع، فإن الإجراء يستغرق وقتًا طويلاً ولا يمكن تنفيذه بشكل واقعي على أساس منتظم. ومن ثم، يمكن أن تستفيد مزارع الألبان من الكشف التلقائي عن العرج.

حتى الآن، درست عدد من الدراسات طرقًا لأتمتة تسجيل الحركة وكشف العرج باستخدام أنظمة الكاميرا. تعتبر كاميرات الفيديو مستشعرًا جذابًا لهذا التطبيق لأنها غير مكلفة نسبيًا، وغير متطفلة، وتتناسب بشكل جيد مع القطعان الكبيرة. يتم اتخاذ نهج من ثلاث خطوات عادةً للكشف عن العرج من مقاطع الفيديو: (1) استخدام طرق رؤية الكمبيوتر لتحديد أجزاء الجسم ذات الأهمية، (2) حساب واحدة أو أكثر من خصائص الحركة من أجزاء الجسم المستخرجة، و(3) تدريب مصنف لتسجيل العرج باستخدام خصائص الحركة كميزات. في الماضي، تم تحديد أجزاء الجسم باستخدام طرق رؤية الكمبيوتر التقليدية مثل طرح الخلفية [5، 6، 7، 8]. كانت هذه الطرق تعمل في الإعدادات التجريبية لكنها كانت حساسة للتغيرات في الخلفية والضوء، مما يجعلها أقل قابلية للتطبيق في الممارسة العملية. وضع آخرون علامات مادية (علامات أو علامات طلاء) على أجزاء جسم الأبقار وتابعوا العلامات باستخدام برامج متخصصة [9، 10]. ومع ذلك، في الإعدادات العملية، لا تتناسب العلامات المادية بشكل جيد مع القطعان الكبيرة لأنها تحتاج إلى أن توضع على كل بقرة وتنظف بانتظام لتبقى مرئية. مؤخرًا، مع ظهور الشبكات العصبية العميقة، بدأت الدراسات في استخدام الكشف عن الكائنات المعتمد على التعلم العميق [11، 12، 13، 7] لتحديد أرجل الأبقار أو ظهورها، تقسيم الكائنات [14] لاستخراج محيط الجسم من الخلفية، أو تقدير الوضع بدون علامات (أي، بدون علامات مادية) [15، 16، 8، 17، 18] لتحديد أجزاء الجسم المتعددة في مقاطع الفيديو. على الرغم من أنها تتطلب عادةً بيانات أكثر من الطرق التقليدية، فإن طرق التعلم العميق تتعامل بشكل جيد مع الخلفيات المعقدة وظروف الإضاءة ويمكن أحيانًا التعامل مع العوائق مثل الأسوار [16، 18.

بمجرد تحديدها في الصور أو إطارات الفيديو، يمكن استخدام مخطط العمود الفقري، على سبيل المثال، لحساب وضع الظهر [6، 19، 20، 21، 22، 13، 7، وموقع الأرجل لحساب مسافة التتبع [5، 9] أو طول الخطوة
[9، 11، 7]. على حد علمنا، تستخدم جميع الدراسات تقريبًا حول كشف العرج من مقاطع الفيديو خاصية واحدة فقط كميزة لتسجيل العرج، وحتى الآن، فقط [23، [17]، و [8] دمجت خصائص الحركة المتعددة.

باستخدام خاصية (خصائص) الحركة كميزة (ميزات)، يمكن بعد ذلك تدريب مصنفات التعلم الخاضع للإشراف لتسجيل العرج. في التعلم الخاضع للإشراف، تتعلم المصنفات من الأمثلة المعطاة، المعروفة أيضًا بالحقيقة الأرضية أو المعيار الذهبي. تشكل درجات الحركة اليدوية، أي درجات الحركة المقدمة من قبل واحد أو أكثر من المراقبين، الحقيقة الأرضية لمصنفات كشف العرج. الطبيعة الذاتية لتسجيل الحركة اليدوية هي مشكلة معروفة جيدًا [24] وغالبًا ما تؤدي إلى انخفاض موثوقية واتفاق المراقب الداخلي والخارجي. ومع ذلك، يمكن أن يكون المصنف جيدًا فقط مثل حقيقته الأرضية، لذا فإن المعلومات حول موثوقية مقياس الحركة ضرورية. ومع ذلك، نادرًا ما يتم الإبلاغ عن موثوقية المراقب والاتفاق، ناهيك عن تحليلها.

تظهر ثلاث فجوات حرجة من الدراسات التي تم مناقشتها حتى الآن: (1) لا يزال استخدام طرق معالجة الصور القديمة شائعًا، (2) القليل من الدراسات تجمع بين خصائص الحركة المتعددة لتصنيف العرج، و(3) نادرًا ما يتم الإبلاغ عن موثوقية الحقيقة الأرضية. تتناول هذه الورقة الفجوات الثلاث المذكورة أعلاه وتقترح نهجًا غير متطفل ومؤتمت بالكامل للكشف عن العرج المعتمد على الكاميرا والذي يتضمن خصائص الحركة المتعددة. بالإضافة إلى ذلك، فإن كود هذه الورقة مفتوح المصدر

استخدمنا مقاطع فيديو لأبقار تمشي تم تسجيلها على مقياس تسجيل حركة من 5 نقاط من قبل أربعة مراقبين. أبلغنا أولاً وناقشنا موثوقية واتفاق المراقب الداخلي والخارجي للحقيقة الأرضية. دمجنا درجات المراقبين المتعددة إلى مقياس ثنائي. ثم قمنا بتدريب T-LEAP [16، نموذج تقدير الوضع بدون علامات المعتمد على التعلم العميق، لاستخراج حركة أجزاء الجسم المتعددة (التي يشار إليها لاحقًا بالنقاط الرئيسية) تلقائيًا من مقاطع الفيديو للأبقار التي تمشي. تم استخدام تسلسلات النقاط الرئيسية لحساب ست خصائص حركة معروفة بأنها مرتبطة بدرجات الحركة [25]، وهي قياس وضع الظهر، اهتزاز الرأس، مسافة التتبع، طول الخطوة، مدة الوقوف، ومدة التأرجح. باستخدام خصائص الحركة المذكورة أعلاه كميزات إدخال، قمنا بتدريب نماذج متعددة من التعلم الآلي لتصنيف المشية كعادية أو عرجاء. قمنا بتقييم أداء كل نموذج وأظهرنا تأثير استخدام تركيبات مختلفة من خصائص الحركة على تصنيف العرج.

2. المواد

2.1. جمع البيانات

تم جمع البيانات في تيلبورغ، هولندا، في مزرعة ألبان تجارية تحتوي على قطيع يتكون من حوالي مئة بقرة هولشتاين-فريزيان. تم جمع البيانات بين الساعة 9 صباحًا و4 مساءً في 8 أيام مختلفة بين مايو ويوليو 2019. تم تصوير الأبقار من الجانب أثناء مشيها بحرية عبر ممر خارجي. تم وضع كاميرا ZED RGB-D ستيريو

على ارتفاع مترين فوق الأرض، على بعد 4.5 متر من سياج الممر. كانت الكاميرا تواجه مباشرة الممر وسجلت في وضع المنظر العريض بدقة Full-HD (1080p) بمعدل 30 إطارًا في الثانية. تم حفظ التسجيلات في مقاطع فيديو قصيرة تبلغ حوالي 7.6 ثوانٍ، وهو متوسط الوقت الذي تحتاجه البقرة لعبور الجزء المرئي من الممر (9.5 متر). تم استخدام نفس حملة جمع البيانات من قبل [16] في نفس المزرعة. تم جمع ما مجموعه 1101 مقطع فيديو، وتم اختيار مجموعة فرعية من 272 مقطع فيديو وفقًا للمعايير التالية: كان هناك بقرة واحدة فقط في الممر، وكانت البقرة تمشي من اليسار إلى اليمين دون تشتيت أو انقطاع.

خلال جمع البيانات، لم يتم وضع عملية لربط مقاطع الفيديو تلقائيًا مع بقرة فردية (على سبيل المثال، عن طريق استخدام قارئ بطاقات RFID). لذلك، تم تعيين معرف فريد للأبقار في وقت لاحق من خلال تجميع الأبقار الفردية يدويًا. حددنا 98 بقرة فريدة، من بينها كانت 24 بقرة موجودة في مقاطع الفيديو مرة واحدة فقط، و21 بقرة مرتين، و25 ثلاث مرات، و17 أربع مرات، و6 خمس مرات، و3 ست مرات، و1 سبع مرات، و1 ثماني مرات. بالنسبة للأبقار التي كانت موجودة عدة مرات، تم تسجيل بعضها في أوقات مختلفة في نفس اليوم، وبعضها في أيام مختلفة.

2.2. تقييم الحركة

تم إجراء تقييم الحركة باستخدام مقياس منفصل من 5 نقاط كما وصفه Sprecher وآخرون في عام 1997، حيث يتوافق الدرجة 1 مع المشي الطبيعي، و2 مع العرج الخفيف، و3 مع العرج المعتدل، و4 مع العرج، و5 مع العرج الشديد. قام أربعة مراقبين بتقييم الفيديوهات: خبير واحد (A) لديه 20 عامًا من الخبرة في تقييم الحركة بصريًا وثلاثة مراقبين (B، C، D) ليس لديهم خبرة سابقة في تقييم الحركة ولكن لديهم خلفية في علوم الحيوان وتربية الألبان. تم تدريب المراقبين غير المتمرسين من قبل الخبير (A) قبل جلسة التقييم. خلال جلسة التقييم، تم عرض كل فيديو مرتين متتاليتين لإتاحة الوقت الكافي لملاحظة الحركة. لضمان الاتساق، طُلب من المراقبين إعطاء

أدنى درجة إذا كانوا مترددين بين درجتين. تم تقييم جميع الفيديوهات في نفس اليوم. بعد جلسة التقييم، أشار المراقبون إلى عدم التعرف على الأبقار، أي أنهم لم يتعرفوا على الأبقار الفردية التي ظهرت في عدة فيديوهات. يوضح الجدول 1 توزيع الدرجات التي منحها المراقبون الأربعة. كان توزيع الدرجات غير متوازن بشكل كبير وأشار إلى قطيع متجانس، حيث تم توزيع معظم الأبقار في المستويين الأولين من المقياس (طبيعي، عرج خفيف)، وهو ما يعد نموذجياً للقطعان التي تعاني من انتشار منخفض للعرج.

الجدول 1: توزيع درجات الحركة المعينة من قبل المراقبين

مراقب	درجة الحركة
مراقب	1	2	٣	٤	٥	إجمالي
أ	١١٥	99	27	31	0	إجمالي
ب	١٠٩	٨٠	٥٤	26	٣	272
ج	١٠١	١١٩	٣٤	15	٣	272
D	141	٨٠	٣٨	12	1	272
توزيع

2.3. موثوقية المراقبين والاتفاق

تقييم الحركة اليدوية هو تقييم ذاتي [28]. يمكن أن يساعد التحقيق في موثوقية واتفاقية بين (المقيمين الخارجيين) و(المقيمين الداخليين) في تقييم جودة البيانات. تقيس الموثوقية قدرة المقيمين على التمييز بين الدرجات المختلفة، بينما تقيم الاتفاقية قدرة المقيمين على منح نفس الدرجة لنقطة البيانات نفسها. تم قياس الموثوقية باستخدام مقياس كريبندورف.

[29] للقيم الترتيبية، وتم تقديم الاتفاق كنسبة الاتفاق (PA). تم الإبلاغ عن مقاييس التوافق بين المراقبين وداخل المراقب في الجدول 2. لاحظ أنه، حيث تم تقييم الفيديوهات مرة واحدة فقط، تم تقدير اتفاق المراقب الداخلي من خلال مقارنة الدرجات بين أزواج الفيديوهات لنفس الأبقار المسجلة بفواصل زمنية تقل عن 48 ساعة.

الجدول 2: موثوقية واتفاق المراقبين الخارجيين (A،B،C،D) والمراقب الداخلي في تقييم الحركة.

مراقب	(أ، ب، ج، د)	أ	ب	ج	D
كريبندورف	0.602	0.611	0.552	0.653	0.585
نسبة الاتفاق	٥٥.٨	٥٦.٤	٤٩.١	60.0	٥٨.٢

2.4. دمج درجات الحركة

من أجل إنشاء الحقيقة الأساسية لمجموعة بيانات التصنيف، تم تعيين علامة واحدة للحقيقة الأساسية لكل عينة (فيديو) بناءً على درجات الحركة المقدمة من قبل عدة مراقبين. لتحقيق ذلك، تم دمج الدرجات من المراقبين الأربعة في قيمة واحدة من خلال حساب المتوسط المقرب لأسفل لأعلى مراقبين من حيث موثوقية المراقب الداخلي.

تتركز الغالبية العظمى من الدراسات حول اكتشاف العرج على مقاييس الحركة ذات المستويين (طبيعي، عرج) أو الثلاثة مستويات (طبيعي، عرج معتدل، عرج) بدلاً من مقياس ذو خمسة مستويات [26، 28]. الدافع الرئيسي للجوء إلى دقة أقل في مقاييس الحركة هو توزيع الدرجات عبر المقياس. الأبقار العرجاء بشدة نادرة، حيث يتلقى معظمها العلاج أو يتم استبعادها قبل أن تصل إلى هذا المستوى من العرج [30]. وهذا يؤدي إلى توزيع غير متوازن بشكل كبير للدرجات، حيث تكون معظم الدرجات في المستويات 1 و2 و3. وبالتالي، يصبح من الصعب تدريب مصنف على مجموعات بيانات غير متوازنة، خاصة عندما تكون الأمثلة المتاحة لبعض الفئات قليلة. كما هو موضح في الجدول 1، كان توزيع درجات الحركة غير متوازن بشكل كبير. لذلك، من أجل تحقيق التوازن في مجموعة البيانات، قمنا بدمج مستويات المقياس في مقياس ثنائي حيث يشير المستوى 1 إلى مشية طبيعية، وتشير المستويات 2 و3 و4 و5 إلى مشية عرجاء. وقد أدى ذلك إلى توافق

وموثوقية تبلغ 0.590. لاحظ أن مقاييس الموثوقية مثل كريبيندورف

يمكن أن تنخفض عندما تكون مقياس التقييم أصغر لأن فرصة الاتفاق أكبر.

بعد دمج درجات المراقبين المتعددين والتحول إلى مقياس ثنائي، كانت الحقيقة الأساسية لمجموعة بيانات التصنيف، التي تتكون من 272 فيديو، تحتوي على 143 فيديو مصنفة على أنها طبيعية، و129 فيديو مصنفة على أنها معاقة.

3. الطرق

تتكون منهجيتنا من ثلاثة أجزاء رئيسية: تقدير الوضع، استخراج ميزات المشي، وتصنيف العرج. يتم وصف هذه الأجزاء بالتفصيل في الفقرات الفرعية التالية، ويتم تقديم ملخص رسومي للطرق في الشكل 1.

الشكل 1: ملخص لإجراء معالجة الفيديو.

3.1. تقدير الوضع

يمكن استخدام نماذج تقدير الوضعية للتنبؤ بموقع النقاط الرئيسية (أجزاء الجسم) في الصور ومقاطع الفيديو دون الحاجة إلى علامات مادية. TLEAP هو نموذج حديث يعتمد على التعلم العميق لتقدير الوضعية الزمنية تم تدريبه لاكتشاف النقاط الرئيسية على جسم الأبقار في مقاطع الفيديو. استخدم النموذج تسلسلات من الإطارات المتعاقبة للتنبؤ بإحداثيات النقاط الرئيسية، وقد أظهر أداءً أفضل من الأساليب الثابتة في وجود العوائق (مثل الأسوار). في هذه الدراسة، استخدمنا T-LEAP لاستخراج إحداثيات تسع نقاط رئيسية من إطارات الفيديو. في الفقرات التالية، نصف الخطوات اللازمة لقص الصور، وتقدير الوضعية، والتصحيح.

3.1.1. الكشف والقص

نموذج T-LEAP يتطلب أن تكون الإطارات المدخلة مربعة ومقصوصة حول جسم البقرة. تم تحديد موقع الأبقار تلقائيًا في
إطارات الفيديو باستخدام شبكة الأعصاب التلافيفية المعتمدة على المناطق الأسرع (Faster R-CNN)، وهو نموذج للكشف عن الأجسام يعيد إحداثيات صندوق محيط (bbox) حول كل جسم مثير للاهتمام (هنا، الأبقار). استخدمنا نموذج Faster R-CNN (مع هيكل ResNeXt-101) المدرب على مجموعة بيانات COCO-2017 من مكتبة Detectron2 [31]. احتوت مجموعة بيانات COCO-2017 على 118 ألف صورة تدريب مع تعليقات توضيحية لـ 80 فئة من الأجسام، من بينها 8014 تعليق توضيحي لصناديق محيط للأبقار. عمل نموذج Faster R-CNN من Detectron2 بشكل مباشر وكان قادرًا على اكتشاف الأبقار في إطارات الفيديو لدينا دون الحاجة إلى ضبط دقيق. تم إدخال كل إطار من كل فيديو إلى نموذج الكشف عن الأجسام، الذي أعاد قائمة بصناديق المحيط، واحدة لكل بقرة تم اكتشافها. لكل إطار، تم جعل صندوق المحيط مربعًا عن طريق تمديد الإحداثيات العلوية والسفلية لتتناسب مع العرض مع الحفاظ على البقرة في المنتصف عموديًا. تمت إضافة حشوة بكسل 100 إلى جميع الجوانب الأربعة لضمان رؤية جسم البقرة بالكامل في المنطقة المقتطعة. تم قص الصورة إلى إحداثيات صندوق المحيط الممدود وإعادة قياسها إلى حجم

بكسل. تم حفظ إحداثيات صندوق القص لتحويل توقعات النقاط الرئيسية إلى الإحداثيات الحقيقية لإطار الفيديو.

3.1.2. كشف النقاط الرئيسية

قمنا بتدريب T-LEAP للتنبؤ بموقع 9 نقاط رئيسية. كانت تمثل موقع المعالم التشريحية التالية: الأنف، الجبين، الكاهل، العجز، الفقرات الصدرية الذيلية، والأربعة حوافر (الشكل 2). كان موقع هذه النقاط التسع ضروريًا لاستخراج ميزات المشي الموضحة في القسم 3.2. تم تدريب T-LEAP باستخدام تسلسلات من إطارين متتاليين كمدخلات لأن المؤلفين أبلغوا عن أفضل أداء مع

[16].

تم إنشاء مجموعة بيانات لتقدير الوضع لتدريب وتقييم TLEAP، باستخدام 28 فيديو لأبقار فريدة تم اختيارها عشوائيًا من بين 272 فيديو متاح. تم توضيح إحداثيات النقاط الرئيسية التسع لكل إطار من إطارات الفيديوهات الـ 28 وتم تقسيمها إلى 968 تسلسل غير متداخل من إطارين. نشير إلى كل مجموعة من الإطارات المتتالية على أنها عينة. تم تدريب T-LEAP باستخدام مجموعة فرعية عشوائية من

من العينات (أي 774 عينة تدريب) وتم تقييمه على الباقي من

من العينات (أي 194 عينة اختبار). استخدمنا نفس إجراء التدريب وإعدادات المعلمات الفائقة كما هو موضح في ورقة T-LEAP الأصلية [16].

ثم تم استخدام نموذج T-LEAP المدرب للتنبؤ بموقع النقاط الرئيسية التسع على جميع الفيديوهات الـ 272 للأبقار السائرة، بما في ذلك الفيديوهات الـ 28 المستخدمة للتدريب. تم قص كل إطار فيديو حول جسم البقرة،
وتم إدخال تسلسلات من إطارين متتاليين إلى نموذج تقدير الوضع. ثم تم تحويل إحداثيات النقاط الرئيسية التي تنبأ بها النموذج إلى الإحداثيات الحقيقية للفيديو. لكل فيديو، أسفر ذلك عن إحداثيات

لكل نقطة رئيسية لكل إطار

. نشير إلى مجموعة النقاط الرئيسية لفيديو واحد على أنها “مسارات النقاط الرئيسية”. في جوهرها، تمثل هذه المسارات حركة المعالم التشريحية التي تم تحديدها بواسطة نموذج تقدير الوضع في مستوى الصورة ثنائية الأبعاد.

الشكل 2: النقاط الرئيسية التسع (المعالم التشريحية) كما هو موضح في 16. النقاط الرئيسية مسماة كما يلي: 1: حافر خلفي أيسر، 2: حافر خلفي أيمن، 3: حافر أمامي أيسر، 4: حافر أمامي أيمن 5: أنف، 6: جبهة، 7: كتف، 8: فقرات صدرية ذيلية، 9: عجز.

3.1.3. تصحيح النقاط الرئيسية

في مجموعة الفيديوهات الـ 272 لدينا، حددنا 98 بقرة فردية. كان هناك 28 فيديو لأبقار فريدة تم تضمينها في تدريب نموذج تقدير الوضع، وبالتالي 70 بقرة لم يرها نموذج تقدير الوضع. في تجربتهم العامة، أبلغ مؤلفو T-LEAP عن نسبة النقاط الرئيسية الصحيحة (PCKh@0.2) بلغت 93.8% على الأبقار المعروفة (أي الأبقار المدرجة في مجموعة التدريب) وأداء قدره

على الأبقار غير المعروفة (أي الأبقار غير المدرجة في مجموعة التدريب). لذلك، كان من المتوقع أن تكون هناك أخطاء في مسارات النقاط الرئيسية المتوقعة. للتعامل مع ذلك، طورنا طريقة لتصحيح النقاط الرئيسية. أولاً، لتحديد وتصحيح القيم الشاذة الكبيرة في المسارات، استخدمنا فلتر الانحراف المطلق الوسيط (MAD) مع نافذة زمنية بحجم 3. ثم طبقنا فلتر سافيتسكي-غولاي [32]
(النافذة=10، الترتيب=3) لتنعيم المسارات زمنياً. يوضح الشكل 3 أمثلة على المسارات مع القيم الشاذة قبل وبعد تطبيق الفلاتر.

الشكل 3: مثال على مسارات النقاط الرئيسية المستخرجة باستخدام T-LEAP (يسار)، وبعد التصفية (يمين) لمشية طبيعية (أعلى) ومشية عرجاء (أسفل).

3.2. استخراج ميزات المشي

باستخدام مسارات النقاط الرئيسية، قمنا بحساب ستة سمات للحركة التي أظهرت أنها مرتبطة بدرجات الحركة [25، وهي قياس وضع الظهر (BPM)، سعة اهتزاز الرأس (HBA)، مسافة التتبع (TRK)، طول الخطوة (STL)، مدة الوقوف (STD) ومدة التأرجح (SWD). اعتمدت جميع الميزات على اكتشاف الخطوات، أي معرفة متى كانت كل حافر تتحرك (مرحلة التأرجح) أو تبقى ثابتة (مرحلة الوقوف). لذلك، في الفقرات التالية، نصف أولاً تنفيذ اكتشاف الخطوات، يليه تنفيذ ميزات المشي.

3.2.1. اكتشاف الخطوات

لكل ساق، تم استخدام الحركة الأفقية (إحداثي x) للحافر لاكتشاف مراحل الوقوف والتأرجح. تبدأ مرحلة الوقوف عندما تهبط حافر على الأرض وتنتهي عندما تتحرك الحافر للأمام مرة أخرى. في تلك اللحظة، تبدأ مرحلة التأرجح. تستمر الحافر في التحرك للأمام
طوال مدة مرحلة التأرجح حتى تهبط وتبقى ثابتة لمرحلة وقوف أخرى. تم اكتشاف إطارات بداية ونهاية مراحل الوقوف من خلال العثور على متى ظلت إحداثيات x للحافر كما هي، أي من خلال العثور على هضاب من 10 إطارات على الأقل حيث كانت الفروق المطلقة في إحداثيات x بين إطارين هي

بكسل، لأخذ الاهتزازات الصغيرة في الاعتبار. نعرف منتصف التأرجح على أنه إطار بين الإقلاع والهبوط للحافر، قبل أن تبدأ الحافر في التباطؤ. تم اكتشاف لحظات منتصف التأرجح من خلال العثور على قمم تسارع إحداثيات x. تم حساب التسارع الأفقي للحافر من خلال أخذ المشتق من الدرجة الثانية لإحداثيات x ثم تمريره عبر فلتر موحد بحجم 3. يظهر مثال على مسارات إحداثيات x في الشكل 4، مع تحديد مراحل الوقوف ومنتصف التأرجح بواسطة اكتشاف الخطوات.

الشكل 4: مثال على اكتشاف الخطوات، باستخدام مسارات إحداثيات x للحوافر. تحدد الخطوط الرأسية بداية ونهاية مرحلة الوقوف. تحدد العلامات المتقاطعة قمة مرحلة التأرجح.

3.2.2. تصحيح الخطوات

تم التحكم في اكتشاف الخطوات وتصحيحه تلقائيًا باستخدام الإجراء التالي: بالنسبة لأي ساق معينة، يجب أن تحدث منتصف التأرجحات قبل أو بعد مراحل الوقوف، ويجب أن تحدث منتصف التأرجحات خلال مرحلة الدعم للساق المقابلة (يسار-يمين). عندما فشل اكتشاف الخطوات
في تلبية هذه المتطلبات، كان ذلك يشير إلى أن توقعات النقاط الرئيسية كانت صاخبة جدًا على تلك الحافر. تم العثور على أربعة فيديوهات فقط بها اكتشاف خطوات مشكلة. ثم تمت إزالة الإطارات ذات الخطوات المشكلة من مسارات النقاط الرئيسية، مما أسفر عن مسارات بها فجوة واحدة أو أكثر. ثم تم تقليم المسارات إلى الجزء الذي يحتوي على أكبر عدد من الإطارات المتبقية.

3.2.3. قياس وضع الظهر (BPM)

لتقدير وضع الظهر، أو انحناء الظهر، تم اتخاذ نهج مشابه كما هو موضح في [6]. تم تركيب دائرة عبر النقاط الرئيسية الثلاث على العمود الفقري. يمكن العثور على انحناء الدائرة من خلال أخذ معكوس نصف قطرها. تم تطبيع نصف القطر (

) للدائرة المركبة مع طول الرأس (

) للبقرة (بالبكسل)، حيث يمكن أن يختلف طول الأبقار. تم أخذ طول الرأس كالمسافة الإقليدية بين النقاط الرئيسية على الجبهة والأنف. ثم تم حساب BPM كما يلي:

لكل ساق، تم حساب BPM في كل مرحلة منتصف التأرجح. إذا كانت هناك مراحل تأرجح متعددة، تم الاحتفاظ بقيمة BPM الوسيطة لتلك الساق. تم استخدام أكبر BPM عبر جميع الساقين الأربعة كقيمة BPM النهائية.

3.2.4. سعة اهتزاز الرأس (HBA)

يتم تعريف اهتزاز الرأس على أنه حركة مبالغ فيها للرأس عندما تهبط الأطراف المتأثرة وترتفع من الأرض [25، 9]. وبالتالي، في وجود اهتزاز الرأس، يتحرك الرأس بشكل ملحوظ لأعلى ولأسفل بشكل دوري (على الأقل مرة واحدة لكل دورة مشي). من المتوقع أن يكون لدى الأفراد السليمين وضع رأس أكثر استقرارًا. تظهر أمثلة على اهتزاز الرأس الملحوظ ووضع الرأس الثابت في الشكل 5. تم استخدام سعة الحركة الرأسية (

-إشارة) لنقطة الجبهة كمقياس لاهتزاز الرأس. تم حساب سعة إشارة y باستخدام تحويلات فورييه السريعة [33] كما يلي: دع

يكون عدد الإطارات في الفيديو، دع

يكون عدد الإطارات لكل دورة مشي في الفيديو،

التردد،

تحويل فورييه للإشارة، و

السعة عند التردد

. ثم تم تعيين قيمة HBA كأكبر سعة في دورة مشي:

الشكل 5: مثال على إشارة y مع وبدون اهتزاز الرأس.

3.2.5. مسافة التتبع (TRK)

تُعرف مسافة التتبع بأنها المسافة الأفقية (إحداثي x) بين موقع هبوط الحافر الأمامي وموقع هبوط الحافر الخلفي التالي لنفس الجانب. إذا هبط الحافر الخلفي في نفس الموقع مثل الحافر الأمامي، فهذا يشير إلى عدم وجود مشكلة خطيرة في المشي [5]، وتكون قيمة TRK مساوية (أو قريبة) من 0. تم قياس مسافة التتبع على الجانب الأيسر (

) والجانب الأيمن (

) من البقرة وتم تطبيعها إلى طول الرأس (

) كما يلي: لأي جانب معين (يسار، يمين)، دع

يكونا إحداثيات x للحوافر الأمامية والخلفية، دع

يكون إطار البداية لمرحلة الوقوف على الحافر الأمامي، و

إطار البداية لمرحلة الوقوف التالية على الحافر الخلفي. عندما كان هناك أكثر من قيمة واحدة لكل جانب، تم إرجاع قيمة TRK الوسيطة لذلك الجانب.

3.2.6. فرق طول الخطوة (STL)

يتم تعريف طول الخطوة على أنه المسافة الأفقية بين هبوطين متتاليين لنفس الحافر. تم قياس طول الخطوة (

) لكل حافر بين كل مرحلة وقوف متتالية

وتم تطبيعها إلى طول الرأس

. إذا كان هناك أكثر من طول خطوة لكل حافر، تم الاحتفاظ بالقيمة المتوسطة. قمنا بقياس الفرق في طول الخطوة بين الجانبين الأيسر والأيمن للساقين الخلفيتين (

) والأمامية (

) كما يلي:

3.2.7. فرق مدة الوقوف (STD)

نحن نعرف مدة الوقوف على أنها الوقت (بالثواني) بين بداية (أ) ونهاية (ب) كل مرحلة وقوف. تم اشتقاق الوقت بالثواني من معدل الإطارات لتسجيل الفيديو (هنا، 30 إطارًا في الثانية).

تم قياس مدة الوقوف (

) لكل حافر لكل مرحلة وقوف. إذا كانت الساق تحتوي على أكثر من مرحلة وقوف، تم استخدام المدة المتوسطة. قمنا بقياس الفرق في المدة بين الجانبين الأيسر والأيمن للساقين الخلفيتين (

) والأمامية (

) كما يلي:

3.2.8. فرق مدة التأرجح (SWD)

نحن نعرف مدة التأرجح على أنها الوقت (بالثواني) بين (أ) ونهاية (ب) كل مرحلة تأرجح. تم اشتقاق الوقت بالثواني من معدل الإطارات لتسجيل الفيديو (هنا، 30 إطارًا في الثانية).

تم قياس مدة التأرجح (

) لكل حافر لكل مرحلة تأرجح. إذا كانت الساق تحتوي على أكثر من مرحلة تأرجح، تم استخدام المدة المتوسطة. قمنا بقياس الفرق في المدة بين الجانبين الأيسر والأيمن للساقين الخلفيتين (SWD

) والأمامية (SWD

) كما يلي:

ملخص الميزات المستخرجة مدرج في الجدول 3، ويعرض الشكل 6 توزيع قيم كل ميزة لكل فئة عرج.

الجدول 3: قائمة الميزات المستخرجة من مسارات النقاط الرئيسية.

الميزة	الوصف
BPM	قياس وضع الظهر
HBA	سعة اهتزاز الرأس
TRK	مسافة التتبع على الجانب الأيسر
TRK	مسافة التتبع على الجانب الأيمن
	فرق طول الخطوة بين الحوافر الأمامية اليسرى واليمنى
	فرق طول الخطوة بين الحوافر الخلفية اليسرى واليمنى
	فرق مدة الوقوف بين الحوافر الأمامية اليسرى واليمنى
	فرق مدة الوقوف بين الحوافر الخلفية اليسرى واليمنى
	فرق مدة التأرجح بين الحوافر الأمامية اليسرى واليمنى
	فرق مدة التأرجح بين الحوافر الخلفية اليسرى واليمنى

الشكل 6: توزيع الميزات لكل فئة عرج، حيث 0 يتوافق مع الطبيعي، و1 مع العرج.

3.3. تصنيف العرج

يتم وصف تخطيط تجارب التعلم الآلي لدينا في الفقرات التالية. أولاً، قمنا بتقسيم البيانات إلى مجموعات تدريب وتحقق باستخدام التحقق المتقاطع. ثم قمنا بتدريب وتقييم مصنفات ثنائية مختلفة لاكتشاف العرج باستخدام جميع الميزات المستخرجة. أخيرًا، بحثنا في أهمية الميزات على أداء التصنيف.

3.3.1. إعداد البيانات

نظرًا لحجم مجموعة البيانات الصغيرة نسبيًا (272 فيديو)، تم تقسيم مجموعة البيانات إلى مجموعات تدريب وتحقق باستخدام تحقق متقاطع من 5 طيات مع تجميع مصنف. لمنع تسرب البيانات، تم إجراء التجميع على معرفات الأبقار لضمان عدم وجود تداخل لمعرفات الأبقار بين مجموعة التدريب ومجموعة التحقق في كل طية. نظرًا لهذا القيد غير المتداخل، فإن التجميع ينشئ طيات تحتفظ، قدر الإمكان، بنفس توزيع الفئات [34]. لضمان توزيع متوازن للفئات أثناء التدريب، قمنا بتطبيق تقنية زيادة العينة الاصطناعية للأقليات (SMOTE) [35] على الفئات الأقلية في مجموعات التدريب. تقوم SMOTE بإنشاء عينات تدريب جديدة تكون قيم ميزاتها قريبة من العينات الأخرى في الفئة الأقلية. أخيرًا، تم إعادة قياس الميزات حيث تتطلب نماذج التعلم الآلي غالبًا أن تكون الميزات على مقياس مشابه. تم إعادة قياس نطاق الميزات باستخدام القياس القوي [36]، الذي يستخدم إحصائيات قوية ضد القيم الشاذة لقياس البيانات.

3.3.2. نماذج التصنيف

قمنا بمقارنة أداء المصنفات الستة التالية: الانحدار اللوجستي (LR)، غابة عشوائية (RF)، دعم المتجهات مع نواة خطية (SVL) ومع نواة شعاعية (SVR)، متعدد الطبقات (MLP) وآلات تعزيز التدرج (GB). تم اختيار هذه المصنفات لأنها أظهرت أداءً جيدًا في الأبحاث السابقة حول اكتشاف العرج [8، 7، 14]. استخدمنا نهج تحقق متقاطع مسطح لضبط المعلمات الفائقة وتدريب النماذج، حيث إنه أقل تكلفة حسابيًا من التحقق المتقاطع المتداخل، وعادة ما يؤدي إلى اختيار خوارزمية بجودة مماثلة لتلك المختارة عبر التحقق المتقاطع المتداخل 37. تم تحسين المعلمات الفائقة للمصنفات أولاً باستخدام بحث عشوائي متحقق من 100 تكرار عبر 5 طيات. ثم تم إعادة تدريب المصنفات على 5 طيات مع أفضل مجموعة من المعلمات الفائقة.

3.3.3. مقاييس التقييم

تم تقييم أداء نماذج التصنيف باستخدام المقاييس التالية: الدقة، درجة F1، الحساسية، والخصوصية. تم حساب درجة F1 بشكل متوسط؛ أي، تم حساب المقياس لكل فئة ثم تم حساب المتوسط. المتوسط الكلي مفيد بشكل خاص مع مجموعات البيانات غير المتوازنة، حيث تساهم جميع الفئات بالتساوي في المقياس.

3.3.4. أهمية الميزات

تم إجراء تجربة إضافية للتحقق مما إذا كان تضمين ميزات متعددة يمكن أن يؤدي إلى تحسينات في تصنيف العرج. تم تقييم القيمة التنبؤية لميزة من خلال قياس أهمية الميزة، أي مدى مساهمة ميزة في تصنيف صحيح. لقياس أهمية الميزة، اخترنا طريقة أهمية التبديل [38] حيث يمكن تطبيقها على أي مصنف. تم تقييم أهمية الميزات على أفضل مصنف أداءً من بين 6 مصنفات تم تدريبها مع جميع الميزات. تم تنفيذ طريقة أهمية التبديل كما يلي: لكل طية تحقق متقاطع، تم ضبط النموذج على مجموعة بيانات التدريب وتم تقييمه على درجة F1 على مجموعة التحقق. ثم، تم خلط عمود ميزة من مجموعة التحقق عشوائيًا، وتم تقييم النموذج مرة أخرى. كانت درجة الأهمية هي الفرق بين درجة F1 على بيانات التحقق غير المخلوطة والمخلوطة. تم تكرار التبديلات 100 مرة لكل ميزة. ثم تم ترتيب الميزات حسب ترتيب درجة أهميتها المتوسطة. لتقدير ما إذا كان تضمين ميزات متعددة يمكن أن يؤدي إلى تحسينات في تصنيف العرج، تم إعادة تدريب المصنف بعد ذلك مع الميزة الأكثر أهمية، والميزتين الأكثر أهمية، وهكذا، مع إضافة ميزة واحدة تدريجيًا حسب ترتيب أهميتها.

4. النتائج

4.1. تقدير الوضع

تم تقديم نتائج اختبار T-LEAP في الجدول 4. في المتوسط، كان هناك

من النقاط الرئيسية التي تم اكتشافها بشكل صحيح (PCKh@0.2). بعبارة أخرى، كانت المسافة الإقليدية بين النقطة الرئيسية المتوقعة وواقعها أقل من

من طول الرأس في

من الحالات. هذا يتماشى مع النتائج المقدمة في الورقة الأصلية [16]، حيث حققوا معدل اكتشاف قدره

على نفس النموذج مع 17 نقطة رئيسية. تم تشغيل تصحيح النقاط الرئيسية والتصفية على جميع الفيديوهات الـ 272، وحدد فلتر MAD (بحجم نافذة 3)

من النقاط الرئيسية الشاذة، والتي تم تصحيح إحداثياتها بعد ذلك
إلى القيمة المتوسطة للنافذة الزمنية. بسبب نقص تعليقات النقاط الرئيسية على جميع الفيديوهات، لم يكن من الممكن تقييم تصحيح النقاط الرئيسية إلا نوعيًا. تم رسم مسارات النقاط الرئيسية قبل وبعد التصفية لكل فيديو وتم التحكم فيها بصريًا. تم اعتبار جودة المسارات المصفاة متوازنة، حيث كان من الممكن تصحيح معظم القيم الشاذة وبدت المسارات سلسة، دون تصحيح مفرط أو تسطيح. أدت القيم الشاذة التي لم يكن من الممكن تصحيحها بشكل كافٍ إلى اكتشاف خطوة خاطئة. تم بعد ذلك التخلص من هذه الخطوات من المسارات، كما هو مفصل في القسم 3.2.

الجدول 4: نسبة النقاط الرئيسية الصحيحة (PCKh@0.2) لـ T-LEAP على مجموعة الاختبار. تم تسمية النقاط الرئيسية كما يلي: 1: حافر خلفي أيسر، 2: حافر خلفي أيمن، 3: حافر أمامي أيسر، 4: حافر أمامي أيمن 5: أنف، 6: جبهة، 7: كتف، 8: عجز، 9: فقرات صدرية ذيلية.

نقطة رئيسية										متوسط
PCKh@0.2	98.45	1	99.48	98.45	100	100	100	100	100	99.60

4.2. اكتشاف العرج

تُدرج نتائج المصنفات الثنائية المختلفة في الجدول 5. كان أداء SVM مع نواة شعاعية، والغابات العشوائية، ومصنفات تعزيز التدرج هو الأفضل، بدقة تزيد عن

. كان لدى SVM-R خصوصية أعلى، بينما كانت الغابات العشوائية وتعزيز التدرج لهما حساسية أعلى. كان أداء الانحدار اللوجستي، وSVM مع نواة خطية، والشبكة العصبية متعددة الطبقات أسوأ قليلاً.

الجدول 5: نتائج المصنفات الثنائية باستخدام جميع الميزات. يتم التعبير عن القيم بالنسبة المئوية. تم تمييز أفضل النتائج بالخط العريض.

النموذج	الدقة	درجة F1	الحساسية	الخصوصية
الانحدار اللوجستي	78.49	77.26	77.33	77.90
SVM مع نواة خطية	77.25	76.31	75.39	77.90
SVM مع نواة شعاعية			76.78
الغابات العشوائية	79.66	78.44	83.68	74.64
تعزيز التدرج	79.12	77.79		72.05
الشبكة العصبية متعددة الطبقات	78.97	77.60	80.74	74.59

4.3. أهمية الميزات

يظهر الرسم البياني مع الدرجات التي تم إرجاعها بواسطة أهمية التبديل في الشكل 7. لكل ميزة، تشير الدرجة إلى مدى تأثير تبديل عشوائي لقيم الميزات على درجات التنبؤ، متوسطًا على 100 تبديل. كانت قياسات وضعية الظهر (BPM) لها أعلى درجة تبديل، تليها سعة اهتزاز الرأس (HBA) والمسافة اليسرى للتتبع (TRK_L). أظهرت الميزات المتبقية أهمية أقل.

الشكل 7: نتائج أهمية الميزات على مدى 100 تبديل عشوائي.

باستخدام نتائج أهمية التبديل، تم إعادة تدريب مصنف SVM مع نواة شعاعية (SVM-R) عن طريق إضافة ميزة واحدة تدريجياً، وفقًا لترتيب أهميتها. تُعرض نتائج التصنيف للمصنف باستخدام هذه التركيبات المختلفة من الميزات في الجدول 6. من حيث الدقة ودرجة F1، فإن استخدام ميزتين أو أكثر يحسن نتائج التصنيف مقارنةً باستخدام BPM فقط. يتم الوصول إلى أفضل درجات التصنيف باستخدام تركيبات من 3 و 6 ميزات.

الجدول 6: النتائج (بالنسبة المئوية) لمصنف SVM-R بعد إضافة ميزة واحدة تدريجياً حسب ترتيب درجة أهميتها.

ميزات SVM-R	الدقة	درجة F1	الحساسية	الخصوصية
BPM	76.66	74.81	63.26	86.69
BPM، HBA	79.31	77.50	77.42	77.32
BPM، HBA، TRK	79.87	78.22	76.35	80.14
BPM، HBA، TRK، STD	79.47	77.87	77.09	78.89
BPM، HBA، TRK، STD، STL	79.18	78.03	78.31	79.17
BPM، HBA، TRK، STD، STL، SWD	80.07	78.70	76.78	81.15

5. المناقشة

5.1. معالجة الفيديو

تكون معالجة الفيديو تتكون من الخطوات التالية: استخدام Faster-R-CNN لاكتشاف وعزل الأبقار من إطارات الفيديو، واستخدام T-LEAP لاستخراج السلاسل الزمنية لمواقع النقاط الرئيسية، واستخدام مرشحات MAD وSavitzky-Golay لتقليل الضوضاء من تنبؤات النقاط الرئيسية. بالنسبة لمجموعة الفيديو الخاصة بنا، عملت Faster-R-CNN المدربة مسبقًا بشكل جيد وكشفت عن موقع الأبقار في كل إطار فيديو. كان أداء T-LEAP متساويًا مع النتائج الموصوفة في الورقة الأصلية [16]، وسيحتاج إلى جهد قليل ليتم نقله إلى مقاطع الفيديو المسجلة في مزارع جديدة، كما أظهرت [18] أن القليل من بيانات التدريب الجديدة كانت مطلوبة لضبط نموذج T-LEAP. ومع ذلك، كان هناك بعض الأخطاء في اكتشاف النقاط الرئيسية التي تحتاج إلى تصحيح. كان يجب ضبط معلمات مرشح MAD للانحراف ومرشح Savitzky-Golay الملسّح يدويًا حتى تم العثور على توازن جيد بين التصحيح الناقص والمفرط. مع عدم وجود تصحيح أو تصحيح غير كافٍ لمسارات النقاط الرئيسية، يمكن أن تعطي الميزات قيمًا خاطئة. بينما مع التصحيح المفرط، قد يواجه المرء خطر إزالة الإشارة الحقيقية لمسارات النقاط الرئيسية، ولن تكون الميزات المستخرجة تمييزية. على سبيل المثال، إذا كانت إشارة الجبهة مسطحة جدًا، فسيتم تفويت اهتزاز الرأس بشكل منهجي.

تم اختيار مقاطع الفيديو بحيث يكون هناك بقرة واحدة فقط في كل مرة في مجال الرؤية. تجعل هذه القيود تحليل المشي أكثر موثوقية بطريقتين. أولاً، وجود بقرة واحدة في مجال الرؤية يضمن أن الأبقار لا تحجب أجزاء جسم بعضها البعض، مما يجعل تقدير الوضع أكثر موثوقية. ثانيًا، يضمن وجود بقرة واحدة في مجال الرؤية وجود مساحة كافية بين الأبقار بحيث يمكنها المشي بوتيرتها الخاصة وعرض
المشي الطوعي. في الممارسة العملية، يمكن تنفيذ هذه القيود عن طريق تخطي مقاطع الفيديو حيث تكشف Faster-R-CNN (أو أي كاشف كائن آخر) عن أكثر من بقرة واحدة، أو كما تم القيام به في 17، من خلال تنفيذ خوارزمية تتبع تتبع كل بقرة عبر الفيديو.

كانت هناك قيود أخرى لاختيار الفيديو وهي أن الأبقار يجب أن تمشي من اليسار إلى اليمين. ومع ذلك، ستظل طريقتنا المقترحة تعمل إذا كانت الأبقار تمشي في الاتجاه المعاكس. يمكننا تحديد اتجاه المشي من خلال فحص الحركة الأفقية للنقاط الرئيسية على طول المحور السيني. تزداد قيم إحداثيات النقاط الرئيسية على المحور السيني مع مرور الوقت عندما تمشي الأبقار إلى اليمين وتتناقص عندما تمشي إلى اليسار. لذلك، إذا كانت الأبقار ستسير من اليمين إلى اليسار، يمكن ببساطة عكس مسارات النقاط الرئيسية على المحور السيني قبل استخراج الميزات.

5.2. تسجيل الحركة

يتعلم المصنف تصنيف العينات من مجموعة من الأمثلة المعلّمة، والمعروفة أيضًا باسم الحقيقة الأرضية أو المعيار الذهبي. نظرًا لأن المصنف لا يمكن أن يكون دقيقًا أكثر من معيار الذهب الخاص به [25]، فإن مقياس الحركة الموثوق به ضروري. هنا، كانت موثوقية المراقبين الأولية بين المراقبين وفي داخلهم دون المستوى. من الجدير بالذكر أن الموثوقية عادة ما تكون أقل في البيانات المتجانسة لأن احتمال الاتفاق بالصدفة أعلى عندما لا يتم توزيع الدرجات بالتساوي [24]. من غير المحتمل أن يؤدي التسجيل من الملاحظات الحية بدلاً من مقاطع الفيديو إلى تحسين الدرجات، كما أظهرت [39] عدم وجود فرق في موثوقية المراقبين غير المتمرسين بين التسجيل الحي وتسجيل الفيديو وأظهرت تحسين موثوقية المراقبين ذوي الخبرة عند التسجيل من الفيديو. ربما كان من الممكن تحسين جودة الحقيقة الأرضية بشكل أكبر من خلال تنظيم جلسات تسجيل حركة إضافية أو من خلال إجراء جلسات تسجيل أقصر على مدى عدة أيام. ومع ذلك، نظرًا لأن توفر المراقبين كان محدودًا وأن معيار الذهب المثالي لم يكن ضروريًا أو من المحتمل تحقيقه، اتخذنا خطوات أخرى لمعالجة مشكلة انخفاض الموثوقية والاتفاق. أولاً، نظرًا لأن لدينا عدة مراقبين، يمكننا تجاهل الأصوات من المراقبين الأقل موثوقية. ثانيًا، عالجنا مشكلة عدم توازن الفئة (الدرجة) من خلال دمج مستويات المقياس إلى درجة ثنائية: طبيعية وعليلة. على الرغم من أن هذه الخطوات حسنت جودة معيار الذهب لدينا، قد تبقى بعض التحيزات، مما قد يحد من دقة المصنفات.

5.3. اكتشاف العرج

تم تنفيذ اكتشاف العرج كوظيفة تصنيف ثنائية (طبيعي مقابل عليل) وبالتالي ركزت على اكتشاف العرج بدلاً من
تسجيل المشي الدقيق. تم ترك تسجيل الحركة الدقيق للبحث المستقبلي حيث سيتطلب جمع المزيد من لقطات الفيديو مع أمثلة كافية من درجات المشي 3 وما فوق.

كان أداء المصنفات الخطية (أي، الانحدار اللوجستي وSVM مع نواة خطية) أقل من أداء المصنفات غير الخطية. وهذا يعني أنه عند دمج جميع الميزات، فإن الحدود القرار بين الفئات الطبيعية والعليلة غير خطية. لم يؤدِ الشبكة العصبية متعددة الطبقات أداءً جيدًا مثل المصنفات غير الخطية الأخرى، على الأرجح بسبب مجموعة البيانات الصغيرة نسبيًا. يتماشى أداء أفضل ثلاثة مصنفات SVM-R وRF وBG مع استنتاجات [40] و[37]: لقد وجدوا أن هذه المصنفات الثنائية الثلاثة تؤدي بشكل أفضل على 115 مجموعة بيانات مفتوحة المصدر تتناول مجموعة متنوعة من المشكلات الواقعية في الطب وعلم الأحياء (لكن ليس لها علاقة باكتشاف العرج). على الرغم من أنه في هذه المجموعة، حقق مصنف SVM مع نواة شعاعية أفضل أداء من حيث الدقة ودرجة F1، قد لا يكون هذا هو الحال بالنسبة لمجموعات بيانات أخرى. هذه هي تحديات تعلم الآلة المعروفة، والمعروفة أيضًا باسم نظرية “لا غداء مجاني”، التي تقترح أنه لا يمكن لأي خوارزمية أن تتفوق على جميع الآخرين في جميع المشكلات [41]. ستكون توصيتنا إذن هي تجربة عدة مصنفات، وتوفر SVMs مع نواة شعاعية، والغابات العشوائية، ومصنفات تعزيز التدرج نقطة انطلاق جيدة.

5.4. أهمية الميزات

تمت دراسة العلاقة بين سمات الحركة الفردية ودرجات الحركة في عدة دراسات [28، 42، 43، 25]. وقد وجدوا أنه عند تقييم السمات بشكل فردي، كانت السمات مثل الظهر المقوس، والمشي غير المتناظر، وتحريك الرأس، والتردد في تحمل الوزن، والتتبع، مرتبطة بشكل كبير بدرجة الحركة. تم تصميم الميزات المختارة في هذه الدراسة لقياس نفس السمات. تم قياس الظهر المقوس بواسطة قياس انحناء الظهر (HBA)، والمشي غير المتناظر بواسطة فرق طول الخطوة (STL) بين الأطراف اليسرى واليمنى، وتحريك الرأس بواسطة سعة تحريك الرأس (HBA)، والتردد في تحمل الوزن بواسطة مدة الوقوف (STD) ومدة التأرجح (SWD)، وتم قياس التتبع بواسطة مسافة التتبع (TRK).

عادت ميزات BPM وHBA وTRK بأعلى الدرجات في اختبار أهمية التبديل. أظهرت BPM وHBA تمييزًا واضحًا بين الفئات الطبيعية والفئات المعاقة في الشكل 6. كما أبلغ عنه [28]، 42 و43، فإنه يشير إلى أن وضعية الظهر، وتحريك الرأس، والتتبع، أسهل للمراقبين البشريين في التعرف عليها مقارنةً بالمشي غير المتناظر (مثل طول الخطوة). كانت مسافة التتبع على الجانب الأيسر (TRK-L) أعلى
أهمية أكثر من تلك الموجودة على الجانب الأيمن (TRK-R). قد يشير هذا إلى أنه في مجموعة البيانات الخاصة بنا، كان هناك المزيد من الأبقار تتجه للأعلى على الجانب الأيسر مقارنةً بالجانب الأيمن.

يمكن ملاحظة فرق واضح في مدة مراحل الوقوف/التأرجح بين الفئات لكل من مدة الوقوف (STD) ومدة التأرجح (SWD) على الساقين الخلفيتين (الشكل 6)، بينما تكون الفروق بين الفئات أقل وضوحًا على الساقين الأماميتين. يمكن تفسير ذلك بحقيقة أن العرج يحدث بشكل أكثر شيوعًا على الساقين الخلفيتين [6، 28]. أدى تضمين SWD كميزة إلى زيادة أداء التصنيف، على الرغم من أن SWD كان له أقل درجة أهمية. بالمقابل، كانت درجة أهمية STD أكبر من SWD، لكن إضافة ميزة STD إلى مدخلات المصنف أدت إلى انخفاض طفيف في الدقة ودرجة F1. قد يشير ذلك إلى وجود تعدد خطي مع ميزات أخرى.

كانت ميزات STL تحمل ثاني أدنى درجة أهمية وكان من الصعب تمييز الفصل بين الفئات في الشكل 6. ومن المثير للاهتمام أن درجة F1 والحساسية والنوعية كانت أعلى عند تضمين ميزات STL. وهذا يشير إلى أن طول الخطوة يمكن أن يكون مفيدًا عند استخدامه بالاشتراك مع ميزات أخرى. ومن الجدير بالذكر أنه إذا كانت الأبقار تعاني من العرج الثنائي، أي أنها تعاني من العرج في الأطراف اليسرى واليمنى، فإن طول الخطوة سيظهر اختلافًا ضئيلًا أو لا يظهر أي اختلاف [9].

بشكل عام، أدى دمج سمات الحركة المتعددة إلى أداء تصنيف أفضل من استخدام سمة واحدة. أدى استخدام مزيج من 3 و 6 سمات إلى أفضل دقة ودرجات F1 على مصنف SVM مع نواة شعاعية. على الرغم من أنه يمكن استخراج سمات إضافية من مسارات النقاط الرئيسية، إلا أنه غير معروف ما إذا كانت ستؤدي إلى تحسينات كبيرة في تصنيف المشية. توصيتنا هي تضمين على الأقل السمات الحركية التالية في نظام الكشف التلقائي عن العرج: وضعية الظهر، اهتزاز الرأس، ومسافة التتبع، حيث أظهرت هذه السمات مقاييس تصنيف جيدة بشكل عام، وقد أظهرت هذه الميزات ارتباطًا عاليًا مع درجات الحركة.

إن مقارنة أداء مصنفات العرج لدينا مباشرةً مع الأعمال ذات الصلة ليست بالأمر السهل، لأنه على الرغم من أن المهمة المعنية (أي، اكتشاف العرج من مقاطع الفيديو) هي نفسها، إلا أن هناك تباينًا كبيرًا في المواد والأساليب والتقييمات المستخدمة في الأوراق التي تتناول هذا الموضوع. علاوة على ذلك، فإن مراجعة الأدبيات الشاملة تتجاوز نطاق هذه الورقة، ونشير إلى القارئ إلى 44 للحصول على نظرة عامة على التقدمات السابقة والحالية في تحليل مشية الأبقار. سنقوم هنا بمقارنة نتائجنا و
نقارن نتائجنا مع الأعمال السابقة التي نعتبرها مرتبطة مباشرة بأعمالنا.

تم تقديم قياس وضعية الظهر (BPM) لأول مرة بواسطة [6] ومنذ ذلك الحين تم استخدام انحناء الظهر في العديد من الدراسات [6، 19، 20، 21، 22، 13، 7، 8، 17]. يتم قياس BPM عادةً خلال مرحلة الدعم للحوامل الخلفية، وليس خلال مرحلة الدعم للحوامل الأمامية لأن العرج أكثر شيوعًا في الحوامل الخلفية مقارنة بالأمامية. ومع ذلك، قد تؤدي هذه الممارسة إلى تفويت الخوارزمية لحالات العرج الأمامية بشكل منهجي. لمنع ذلك، قمنا بحساب BPM بناءً على مرحلة الدعم للأرجل الأربعة. عند استخدام BPM كصفة واحدة للحركة، تراوحت دقة تصنيف العرج من

[19] إلى

[13]. عند تضمين ميزة BPM فقط في مصنف SVM-R الخاص بنا، حققنا دقة قدرها

، وهو ما يتماشى مع الأدبيات.

العمل المقدم في [8] هو ربما الأكثر ارتباطًا بهذه الدراسة. في [8]، استخدم المؤلفون مزيجًا من الرؤية الحاسوبية التقليدية والمعتمدة على التعلم العميق لتطوير نظام للكشف عن العرج. استخدموا DeepLabCut [15]، وهو نموذج تعلم عميق تم تدريبه لتتبع موقع الحوافر والرأس في مقاطع الفيديو للأبقار أثناء المشي دون علامات مادية. حقق نموذج تقدير الوضع نسبة من النقاط الرئيسية الصحيحة (PCK) من

تم التخلص يدويًا من مقاطع الفيديو التي كانت فيها توقعات النقاط الرئيسية خاطئة للغاية. تم الحصول على مخطط عمود الفقري للبقرة باستخدام طريقة طرح الخلفية على مستوى البكسل. ومع ذلك، قد لا تكون هذه الطريقة فعالة في التعامل مع الخلفيات المتغيرة وظروف الإضاءة. من ناحية أخرى، استخدمنا تقدير الوضع لتتبع جميع النقاط الرئيسية على جسم البقرة، بما في ذلك تلك الموجودة على العمود الفقري. ونتيجة لذلك، فإن طريقتنا المقترحة أكثر قوة في التعامل مع مثل هذه التغيرات. بالإضافة إلى ذلك، قمنا بتصحيح مسارات النقاط الرئيسية الخاطئة تلقائيًا باستخدام نهج تصحيح النقاط الرئيسية وتصحيح الخطوات، مما أزال الحاجة إلى التحقق اليدوي.

في المجموع، استخدموا 212 فيديو لمواشي تسير، حيث تم تصنيف الأبقار التي حصلت على درجة 1 أو 2 على أنها طبيعية، ودرجة 3 أو 4 على أنها عرجاء. تم حساب انحناء الظهر من مخطط العمود الفقري، وتم استخدام النقاط الرئيسية على الحوافر وعلى الرقبة لاستخراج الميزات التالية: اهتزاز الرأس، عدم تناسق طول الخطوة، تتبع الحركة، سرعة الهبوط، عدم تناسق مرحلة الدعم، وسرعة الحركة. تم حساب انحناء الظهر واهتزاز الرأس عندما كانت الحوافر الخلفية على اتصال مع الأرض، وبالتالي لم يتم احتساب العرج في الأطراف الأمامية. على النقيض من ذلك، قمنا بحساب انحناء الظهر على جميع الأرجل الأربعة، واهتزاز الرأس.
تم استخراج السعة من المسار بالكامل. تم إجراء اختيار الميزات على النحو التالي: تم إجراء اختبار كاي-تربيع على مجموعة البيانات الكاملة. أظهر الاختبار أن قياس وضعية الظهر وحركة الرأس كانت الميزات الأكثر أهمية. بالمقابل، وجدنا أن إضافة تتبع الرأس إلى الميزتين الأخريين أدى إلى نتائج أفضل على مجموعة بياناتنا. قد يعني ذلك أنه في مجموعة بياناتهم، لم يكن الأشخاص المعاقون يتتبعون الرأس. تفسير آخر قد يكون أنه مع زيادة عدد الصفات، تزداد تعقيد البيانات، وقد يكون من الضروري استخدام مصنف غير خطي، مثل SVM-R. تم تدريب عدة مصنفات باستخدام انحناء الظهر وحركة الرأس، وعاد مصنف الانحدار اللوجستي بأفضل النتائج، مع دقة تصنيف تبلغ

. بالمقارنة، دقتنا في

قد تبدو متواضعة. ومع ذلك، من المهم أن نلاحظ أن هناك عدة اختلافات في البيانات وإعداد التجربة التي قد تكون أثرت على النتائج. كانت مجموعات البيانات والأساليب المستخدمة في الدراسات مختلفة، مما جعل المقارنة المباشرة صعبة. في دراستنا، كانت معظم درجات الحركة في الطرف الأدنى من المقياس، وقد يكون العدد القليل من الأبقار العرجاء بشدة قد جعل من الصعب على المصنف التمييز بين الأبقار العرجاء وغير العرجاء. بالإضافة إلى ذلك، قد تكون جودة بيانات الحقيقة الأرضية قد أثرت على أداء مصنفاتنا. كما اتبعنا أفضل ممارسات التعلم الآلي من خلال إجراء اختيار الميزات فقط على مجموعات التدريب ومن خلال فصل الأبقار الفردية في مجموعات التدريب والتحقق لمنع تسرب البيانات. إن الفشل في القيام بذلك يمكن أن يضخم نتائج الأداء بشكل مصطنع ويؤدي إلى استنتاجات متفائلة بشكل مفرط. بشكل عام، نتوقع أن تعطي طريقتنا نتائج مماثلة إذا تم تشغيلها على مجموعة بياناتهم، مع ميزة خط أنابيب تلقائي بالكامل لا يتطلب التحقق اليدوي من مسارات النقاط الرئيسية، وطريقة تقدير الوضعية القوية في ظروف الإضاءة والاعتراضات.

في [17، تم تطوير نظام كشف العرج متعدد الأبقار تلقائي بالكامل. استخدموا نموذج Mask-R-CNN، وهو نموذج تعلم عميق، لأداء كشف كائنات الأبقار في الوقت نفسه، وتقدير الوضعية لـ 7 نقاط رئيسية على الرقبة الخلفية والرأس. في المجموع، استخدموا 250 فيديو لـ 10 أبقار مختلفة. تم استخدام النقاط الرئيسية لاستخراج انحناء الظهر وموضع الرأس كسمات حركة. تم استخراج كل سمة حركة لكل إطار فيديو وتجميعها لكل فيديو في ميزات إحصائية مثل المتوسط، الوسيط، الانحراف المعياري، القيم الدنيا والقصوى. قاموا بتدريب مصنف CatBoost للتعزيز التدريجي وحققوا

دقة في كشف العرج الثنائي و

دقة في تسجيل العرج على مقياس من 4 نقاط. في عملنا، على الرغم من أننا أضفنا أربع سمات حركة أخرى، إلا أننا فقط
قمنا بتجميع القيم في القيمة الوسيطة للفيديو. في ضوء الأداء الممتاز لمصنفاتهم، فإن اتجاهًا واعدًا لتوسيع عملنا سيكون استخراج المزيد من الميزات الإحصائية من سمات الحركة، مثل المتوسط، والانحراف المعياري، والقيم الدنيا والقصوى، لتحسين أداء تصنيفنا بشكل أكبر.

6. الخاتمة

في هذه الورقة، طورنا نظام كشف العرج تلقائي بالكامل. باستخدام نموذج تقدير الوضعية T-LEAP، تم استخراج حركة تسع نقاط رئيسية من مقاطع فيديو لأبقار تمشي. ثم تم استخدام مسارات النقاط الرئيسية لحساب ست سمات حركة، وهي قياس وضعية الظهر، اهتزاز الرأس، مسافة التتبع، طول الخطوة، مدة الوقوف، ومدة التأرجح. وجدنا أن السمات الثلاث الأكثر أهمية كانت قياس وضعية الظهر، اهتزاز الرأس، ومسافة التتبع وأن تضمين سمات حركة متعددة أدى إلى تصنيف أفضل من استخدام سمة حركة واحدة. بالنسبة للحقيقة الأرضية، أظهرنا أن دمج درجات المراقبين بشكل مدروس يمكن أن يحسن موثوقية الاتفاق بين المراقبين. يجب أن تقيم الأعمال المستقبلية النظام في بيئة أقل تقييدًا، على سبيل المثال، مع وجود أبقار متعددة في مجال الرؤية. يمكن أن تركز منطقة أخرى للبحث المستقبلي على الاستفادة من الجوهر الزمني لمقاطع الفيديو، من خلال، على سبيل المثال، تضمين المزيد من الميزات الإحصائية لكل سمة حركة.

الشكر والتقدير

هذا المنشور هو جزء من مشروع التعلم العميق لصحة الإنسان والحيوان (برقم المشروع EDL P16-25-P5) من برنامج البحث التعلم العميق الفعال (https://efficientdeeplearning.nl) الذي يتم تمويله (جزئيًا) من قبل مجلس البحث الهولندي (NWO).

إعلان المصالح

يعلن المؤلفون أنهم ليس لديهم مصالح مالية متنافسة معروفة أو علاقات شخصية قد تكون ظهرت لتؤثر على العمل المبلغ عنه في هذه الورقة.

References

[1] P. T. Thomsen, J. K. Shearer, H. Houe, Prevalence of lameness in dairy cows, The Veterinary Journal (2023) 105975.
[2] H. R. Whay, J. K. Shearer, The impact of lameness on welfare of the dairy cow, Veterinary Clinics: Food Animal Practice 33 (2) (2017) 153-164.
[3] H. Enting, D. Kooij, A. Dijkhuizen, R. Huirne, E. Noordhuizen-Stassen, Economic losses due to clinical lameness in dairy cattle, Livestock production science 49 (3) (1997) 259-267.
[4] J. Huxley, Impact of lameness and claw lesions in cows on health and production, Livestock Science 156 (1-3) (2013) 64-70.
[5] X. Song, T. Leroy, E. Vranken, W. Maertens, B. Sonck, D. Berckmans, Automatic detection of lameness in dairy cattle-Vision-based trackway analysis in cow’s locomotion, Computers and Electronics in Agriculture 64 (1) (2008) 39-44, iSBN: 0168-1699 eprint: 9809069v1. doi:10.1016/j.compag. 2008.05.016
[6] A. Poursaberi, C. Bahr, A. Pluk, A. V. Nuffel, D. Berckmans, A. Van Nuffel, D. Berckmans, Real-time automatic lameness detection based on back posture extraction in dairy cattle: Shape analysis of cow with image processing techniques, Computers and Electronics in Agriculture 74 (1) (2010) 110-119, iSBN: 0168-1699 Publisher: Elsevier B.V. doi:10.1016/j.compag.2010.07.004 URL http://dx.doi.org/10.1016/j.compag.2010.07.004
[7] Z. Zheng, X. Zhang, L. Qin, S. Yue, P. Zeng, Cows’ legs tracking and lameness detection in dairy cattle using video analysis and Siamese neural networks, Computers and Electronics in Agriculture 205 (2023) 107618. doi: 10.1016/j. compag. 2023.107618. URL haps://uvs sciencedirect com/science/article/pil S0168169923000066
[8] K. Zhao, M. Zhang, J. Ji, R. Zhang, J. M. Bewley, Automatic lameness scoring of dairy cows based on the analysis of head- and back-hoof linkage features using machine learning methods, Biosystems Engineering 230 (2023) 424-441. doi:10.1016/j.biosystemseng.2023.05.003
URL https://www.sciencedirect.com/science/article/pii/ S153751102300106X
[9] N. Blackie, E. Bleach, J. Amory, J. Scaife, Associations between locomotion score and kinematic measures in dairy cows with varying hoof lesion types, Journal of Dairy Science 96 (6) (2013) 3564-3572, iSBN: 0022-0302 Publisher: Elsevier. doi:10.3168/jds.2012-5597.
URL http://linkinghub.elsevier.com/retrieve/pii/S0022030213002282
[10] Y. Karoui, A. A. B. Jacques, A. B. Diallo, E. Shepley, E. Vasseur, A Deep Learning Framework for Improving Lameness Identification in Dairy Cattle, Proceedings of the AAAI Conference on Artificial Intelligence 35 (18) (2021) 15811-15812, number: 18.

URL https://ojs.aaai.org/index.php/AAAI/article/view/17902
[11] D. Wu, Q. Wu, X. Yin, B. Jiang, H. Wang, D. He, H. Song, Lameness detection of dairy cows based on the YOLOv3 deep learning algorithm and a relative step size characteristic vector, Biosystems Engineering 189 (2020) 150-163, publisher: Elsevier Ltd. doi:10.1016/j.biosystemseng. 2019.11.017.
URL https://doi.org/10.1016/j.biosystemseng.2019.11.017
[12] X. Kang, X. D. Zhang, G. Liu, Accurate detection of lameness in dairy cattle with computer vision: A new and individualized detection strategy based on the analysis of the supporting phase, Journal of Dairy Science 103 (11) (2020) 10628-10638, publisher: Elsevier. doi:10.3168/jds. 2020-18288.
URL https://www-journalofdairyscience-org.ezproxy.library.wur.nl/ article/S0022-0302(20)30713-X/abstract
[13] B. Jiang, H. Song, H. Wang, C. Li, Dairy cow lameness detection using a back curvature feature, Computers and Electronics in Agriculture 194 (2022) 106729. doi:10.1016/j.compag. 2022.106729
URL https://www.sciencedirect.com/science/article/pii/ S0168169922000461
[14] E. Arazo, R. Aly, K. McGuinness, Segmentation Enhanced Lameness Detection in Dairy Cows from RGB and Depth Video, arXiv:2206.04449 [cs] (Jun. 2022). doi:10.48550/arXiv.2206.04449.
URL http://arxiv.org/abs/2206.04449
[15] A. Mathis, P. Mamidanna, K. M. Cury, T. Abe, V. N. Murthy, M. W. Mathis, M. Bethge, DeepLabCut: markerless pose estimation of user-defined body parts with deep learning, Nature Neuroscience 21 (9) (2018) 1281-1289, number: 9 Publisher: Nature Publishing Group. doi:10.1038/s41593-018-0209-y.
URL https://www.nature.com/articles/s41593-018-0209-y.
[16] H. Russello, R. van der Tol, G. Kootstra, T-LEAP: Occlusion-robust pose estimation of walking cows using temporal information, Computers and Electronics in Agriculture 192 (2022) 106559. doi:10.1016/j.compag.2021.106559
URL https://www.sciencedirect.com/science/article/pii/ S0168169921005767
[17] S. Barney, S. Dlay, A. Crowe, I. Kyriazakis, M. Leach, Deep learning pose estimation for multi-cattle lameness detection, Scientific Reports 13 (1) (2023) 4499.
[18] M. Taghavi, H. Russello, W. Ouweltjes, C. Kamphuis, I. Adriaens, Cow key point detection in indoor housing conditions with a deep learning model, Journal of Dairy Science (2023).
[19] S. Viazzi, C. Bahr, A. Schlageter-Tello, T. Van Hertem, C. Romanini, A. Pluk, I. Halachmi, C. Lokhorst, D. Berckmans, Analysis of individual classification of lameness using automatic measurement of back posture in dairy cattle, Journal of Dairy Science 96 (1) (2012) 257-266, publisher: Elsevier. doi:10.3168/jds 2012-5806.
URL http://dx.doi.org/10.3168/jds.2012-5806
[20] T. Van Hertem, S. Viazzi, M. Steensels, E. Maltz, A. Antler, V. Alchanatis, A. A. Schlageter-Tello, K. Lokhorst, E. C. Romanini, C. Bahr, D. Berckmans, I. Halachmi, Automatic lameness detection based on consecutive 3D-video recordings, Biosystems Engineering 119 (2014) 108-116, iSBN: 9789088263330 Publisher: IAgrE. doi: 10.1016/j.biosystemseng.2014.01.009.

URL http://dx.doi.org/10.1016/j.biosystemseng.2014.01.009
[21] S. Viazzi, C. Bahr, T. Van Hertem, A. Schlageter-Tello, C. E. B. Romanini, I. Halachmi, C. Lokhorst, D. Berckmans, Comparison of a three-dimensional and twodimensional camera system for automated measurement of back posture in dairy cows. Computers and Electronics in Agriculture 100 (2014) 139-147, iSBN: 0168-1699

Publisher: Elsevier B.V. doi:10.1016/j.compag. 2013.11.005.
URL http://dx.doi.org/10.1016/j.compag.2013.11.005
[22] T. Van Hertem, A. S. Tello, S. Viazzi, M. Steensels, C. Bahr, C. E. B. Romanini, K. Lokhorst, E. Maltz, I. Halachmi, D. Berckmans, A. Schlageter Tello, S. Viazzi, M. Steensels, C. Bahr, C. E. B. Romanini, K. Lokhorst, E. Maltz, I. Halachmi, D. Berckmans, Implementation of an automatic 3D vision monitor for dairy cow locomotion in a commercial farm, Biosystems Engineering 173 (2018) 166-175, iSBN: 1537-5110 Publisher: Elsevier. doi:10.1016/j.biosystemseng.2017.08.011.
[23] K. Zhao, J. Bewley, D. He, X. Jin, Automatic lameness detection in dairy cattle based on leg swing analysis with an image processing technique, Computers and Electronics in Agriculture 148 (2018) 226-236.
[24] A. Schlageter-Tello, E. A. Bokkers, P. W. Groot Koerkamp, T. Van Hertem, S. Viazzi, C. E. Romanini, I. Halachmi, C. Bahr, D. Berckmans, K. Lokhorst, Effect of merging levels of locomotion scores for dairy cows on intra- and interrater reliability and agreement, Journal of Dairy Science 97 (9) (2014) 5533-5542, publisher: Elsevier. doi:10.3168/jds.2014-8129.
URL http://dx.doi.org/10.3168/jds.2014-8129
[25] A. Schlageter-Tello, E. A. Bokkers, P. W. Groot Koerkamp, T. Van Hertem, S. Viazzi, C. E. Romanini, I. Halachmi, C. Bahr, D. Berckmans, K. Lokhorst, Relation between observed locomotion traits and locomotion score in dairy cows, Journal of Dairy Science 98 (12) (2015) 8623-8633. doi:10.3168/jds.2014-9059
URL https://linkinghub.elsevier.com/retrieve/pii/S0022030215006633
[26] D. Sprecher, D. Hostetler, J. Kaneene, A LAMENESS SCORING SYSTEM THAT USES POSTURE AND GAIT TO PREDICT DAIRY CATTLE REPRODUCTIVE PERFORMANCE, Science (97) (1997).
[27] P. Thomsen, L. Munksgaard, F. Tøgersen, Evaluation of a lameness scoring system for dairy cows, Journal of dairy science 91 (1) (2008) 119-126.
[28] F. C. Flower, D. M. Weary, Effect of Hoof Pathologies on Subjective Assessments of Dairy Cow Gait, Journal of Dairy Science 89 (1) (2006) 139-146. doi:10.3168/jds.S0022-0302(06)72077-X.
URL https://www.sciencedirect.com/science/article/pii/ S002203020672077X
[29] K. Krippendorff, Computing krippendorff’s alpha-reliability (2011).
[30] B. Engel, G. Bruin, G. Andre, W. Buist, Assessment of observer performance in a subjective scoring system: visual classification of the gait of cows, The Journal of Agricultural Science 140 (3) (2003) 317-333, publisher: Cambridge University Press. doi:10.1017/S0021859603002983.
URL http://www.cambridge.org/core/journals/
journal-of-agricultural-science/article/assessment-of-observer-performance-in-a-subjectiv A4C2BDAAE4803FE2DFE34013FC8F6DE9#access-block
[31] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, R. Girshick, Detectron2, https://github com/facebookresearch/detectron2 (2019).
[32] A. Savitzky, M. J. Golay, Smoothing and differentiation of data by simplified least squares procedures., Analytical chemistry 36 (8) (1964) 1627-1639.
[33] J. W. Cooley, J. W. Tukey, An algorithm for the machine calculation of complex fourier series, Mathematics of computation 19 (90) (1965) 297-301.
[34] L. Buitinck, G. Louppe, M. Blondel, F. Pedregosa, A. Mueller, O. Grisel, V. Niculae, P. Prettenhofer, A. Gramfort, J. Grobler, et al., Api design for machine learning software: experiences from the scikit-learn project, arXiv preprint arXiv:1309.0238 (2013).
[35] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, Smote: synthetic minority over-sampling technique, Journal of artificial intelligence research 16 (2002) 321-357.
[36] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine Learning Research 12 (2011) 2825-2830.
[37] J. Wainer, G. Cawley, Nested cross-validation when selecting classifiers is overzealous for most practical applications, Expert Systems with Applications 182 (2021) 115222.
[38] L. Breiman, Random forests, Machine learning 45 (2001) 5-32.
[39] A. Schlageter-Tello, E. Bokkers, P. G. Koerkamp, T. Van Hertem, S. Viazzi, C. Romanini, I. Halachmi, C. Bahr, D. Berckmans, K. Lokhorst, Comparison of locomotion scoring for dairy cows by experienced and inexperienced raters using live or video observation methods, Animal Welfare 24 (1) (2015) 69-79.
[40] J. Wainer, Comparison of 14 different families of classification algorithms on 115 binary datasets, arXiv preprint arXiv:1606.00930 (2016).
[41] D. H. Wolpert, The lack of a priori distinctions between learning algorithms, Neural computation 8 (7) (1996) 1341-1390.
[42] T. Borderas, A. Fournier, J. Rushen, A. De Passille, Effect of lameness on dairy cows’ visits to automatic milking systems, Canadian Journal of Animal Science 88 (1) (2008) 1-8.
[43] N. Chapinal, A. De Passille, D. Weary, M. Von Keyserlingk, J. Rushen, Using gait score, walking speed, and lying behavior to detect hoof lesions in dairy cows, Journal of dairy science 92 (9) (2009) 4365-4374.
[44] A. Nejati, A. Bradtmueller, E. Shepley, E. Vasseur, Technology applications in bovine gait analysis: A scoping review, Plos one 18 (1) (2023) e0266287.
[45] Q. Wang, H. Bovenhuis, Validation strategy can result in an overoptimistic view of the ability of milk infrared spectra to predict methane emission of dairy cattle, Journal of dairy science 102 (7) (2019) 6288-6295.

*Corresponding authors
Email addresses: helena.russello@wur.nl (Helena Russello ), gert.kootstra@wur.nl (Gert Kootstra )
Preprint submitted to Elsevier
Code available at: https://github.com/hrussel/lameness-detection
https://www.stereolabs.com/zed-2/

Journal: Computers and Electronics in Agriculture, Volume: 223
DOI: https://doi.org/10.1016/j.compag.2024.109040
Publication Date: 2024-06-04

Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits

Helena Russello , Rik van der Tol , Menno Holzhauer , Eldert J. van Henten , Gert Kootstra Agricultural Biosystems Engineering group, Wageningen University Research, Wageningen, The Netherlands Ruminant Health Department, Royal GD AH, Deventer, The Netherlands

Abstract

This study presents an automated lameness detection system that uses deep-learning image processing techniques to extract multiple locomotion traits associated with lameness. Using the T-LEAP pose estimation model, the motion of nine keypoints was extracted from videos of walking cows. The videos were recorded outdoors, with varying illumination conditions, and T-LEAP extracted of correct keypoints. The trajectories of the keypoints were then used to compute six locomotion traits: back posture measurement, head bobbing, tracking distance, stride length, stance duration, and swing duration. The three most important traits were back posture measurement, head bobbing, and tracking distance. For the ground truth, we showed that a thoughtful merging of the scores of the observers could improve intra-observer reliability and agreement. We showed that including multiple locomotion traits improves the classification accuracy from with only one trait to with the three most important traits and to with all six locomotion traits.

Keywords: lameness, detection, cows, locomotion, pose-estimation, deep-learning

August 15, 2025

1. Introduction

Lameness is a painful gait disorder in dairy cows and is often characterized by abnormal locomotion of the cow.

A recent literature review [1] estimated the global prevalence of lameness at

, with little change in the last 30 years. Lameness has a negative impact on welfare 2 and leads to substantial economic losses [3] due to decreased milk production and reproduction [4] as well as premature culling [3]. While lameness is commonly assessed by trained observers performing visual locomotion scoring of the herd, the procedure is time-consuming and cannot realistically be performed on a regular basis. Hence, dairy farms could benefit from automatic lameness detection.

To date, a number of studies have investigated ways to automate locomotion scoring and lameness detection using camera systems. Video cameras are an attractive sensor for this application as they are relatively inexpensive, non-intrusive, and scale well with large herds. A three-step approach is commonly taken to detect lameness from videos: (1) use computer vision methods to localize body parts of interest, (2) compute one or more locomotion traits from the extracted body parts, and (3) train a classifier to score lameness using the locomotion traits as features. In the past, the body parts were localized using classical computer vision methods such as background subtraction [5, 6, 7, 8]. These methods worked in experimental settings but were sensitive to changes in background and light, making them less applicable in practice. Others placed physical markers (tags or paint marks) on the cows’ body parts and tracked the markers with specialized software [9, 10]. In practical settings, however, physical markers don’t scale well to large herds as they need to be placed on each cow and cleaned regularly to remain visible. More recently, with the emergence of deep neural networks, studies started using deep-learning-based object detection [11, 12, 13, 7] to localize the legs or the back of the cows, object segmentation [14] to extract the body contour from the background, or markerless (i.e., without physical markers) pose estimation [15, 16, 8, 17, 18] to localize multiple body parts in videos. Although they typically require more data than classical approaches, the deep-learning methods cope well with complex background and light conditions and can sometimes even cope with occlusions such as fences [16, 18.

Once localized in the images or video frames, the outline of the spine, for instance, can be used to compute the back posture [6, 19, 20, 21, 22, 13, 7, and the location of the legs to compute the tracking distance [5, 9] or stride
length [9, 11, 7]. To the best of our knowledge, almost all studies on lameness detection from videos use only one locomotion trait as a feature to score lameness, and so far, only [23, [17], and [8] combined multiple locomotion traits.

Using the locomotion trait(s) as feature(s), supervised learning classifiers can then be trained to score lameness. In supervised learning, classifiers learn from given examples, also known as ground truth or golden standard. Manual locomotion scores, that is, locomotion scores provided by one or more observers, make up the ground truth of lameness detection classifiers. The subjective nature of manual locomotion scoring is a well-known problem [24] and often leads to low intra- and inter-observer reliability and agreement. However, a classifier can only be as good as its ground truth, so information about the reliability of the locomotion scale is necessary. Yet, observer reliability and agreement are seldom reported, let alone analyzed.

Three critical gaps emerge from the studies discussed so far: (1) the use of obsolete image processing methods remains frequent, (2)few studies combine multiple locomotion traits for lameness classification, and (3) the reliability of the ground truth is seldom reported. This paper addresses the three gaps mentioned above and proposes a non-intrusive and fully automated approach to camera-based lameness detection that includes multiple locomotion traits. Additionally, the code of this paper is open-source

We used videos of walking cows that were scored on a 5 -point locomotion scoring scale by four observers. We first reported and discussed the intra- and inter-observer reliability and agreement of the ground truth. We merged scores from the multiple observers to a binary scale. We then trained T-LEAP [16, a deep-learning markerless pose estimation model, to automatically extract the motion of multiple body parts (later referred to as keypoints) from videos of walking cows. The sequences of keypoints were used to compute six locomotion traits that are known to be correlated with locomotion scores [25], namely back posture measurement, head bobbing, tracking distance, stride length, stance duration, and swing duration. Using the locomotion traits mentioned above as input features, we trained multiple machine-learning models to classify the gait as normal or lame. We evaluated the performance of each model and showed the impact of using different combinations of locomotion traits on the lameness classification.

2. Materials

2.1. Data acquisition

The data were collected in Tilburg, The Netherlands, at a commercial dairy farm whose herd contained about a hundred Holstein-Frisian cows. The data were collected between 9 am and 4 pm on 8 different days between May and July 2019. The cows were filmed from the side while they walked freely through an outdoor passageway. A ZED RGB-D stereo camera

was placed 2 meters above the ground, at 4.5 m from the fence of the passageway. The camera directly faced the passageway and recorded in landscape mode at Full-HD (1080p) resolution at 30 frames per second. The recordings were saved into short videos of about 7.6 seconds, which was the average time a cow needed to walk the visible part of the passageway ( 9.5 meters). The same data acquisition campaign was used by [16] on the same farm. In total, 1101 videos were collected, and a subset of 272 videos were selected according to the following criteria: there was only one cow on the passageway, and the cow walked from the left to the right without distraction or interruption.

During the data collection, no process was set in place to automatically link the videos to an individual cow (e.g., by means of an RFID tag reader). The cows were, therefore, assigned a unique identifier at a later time by manually grouping the individual cows. We identified 98 unique cows, out of which 24 cows were present in the videos only once, 21 cows twice, 25 three times, 17 four times, 6 five times, 3 six times, 1 seven times, and 1 eight times. For the cows that were present multiple times, some were recorded at different times on the same day, and some on different days.

2.2. Locomotion scoring

The locomotion scoring was performed using the 5 -point discrete scale described by Sprecher et al. 1997 26, where a score of 1 corresponds to normal gait, 2 to midly lame, 3 to moderately lame, 4 to lame and 5 to severely lame. Four observers scored the videos: one expert (A) with 20 years of experience in visual locomotion scoring and three observers (B, C, D) with no prior experience in locomotion scoring but with a background in animal science and dairy farming. The inexperienced observers were trained by the expert (A) before the scoring session. During the scoring session, each video was played twice in a row to give enough time to observe the locomotion. To ensure consistency, the observers were asked to give the

lowest score if they were hesitating between two scores. All the videos were scored on the same day. After the scoring session, the observers indicated no cow recognition, i.e., that they did not recognize the individual cows that appeared in multiple videos. Table 1 shows the distribution of the scores assigned by the four observers. The distribution of the scores was highly imbalanced and indicated a homogeneous herd, where most cows were distributed throughout the first two levels of the scale (normal, mildly lame), which is typical of herds with a low prevalence of lameness [27].

Table 1: Distribution of the locomotion scores assigned by the observers

Observer	Locomotion score
Observer	1	2	3	4	5	Total
A	115	99	27	31	0	Total
B	109	80	54	26	3	272
C	101	119	34	15	3	272
D	141	80	38	12	1	272
Distribution

2.3. Observers reliability and agreement

Manual locomotion scoring is subjective [28]. Investigating the reliability and agreement between (inter-rater) and among (intra-rater) raters can inform on the quality of the data. Reliability estimates the capability of the raters to differentiate between the different scores, whereas agreement assesses the capability of the raters to assign the same score to the same data point. Reliability was measured with Krippendorff’s

[29] for ordinal values, and agreement was presented as the Percentage of Agreement (PA). The inter-observer and intra-observer measures are reported in Table 2. Note that, as the videos were only scored once, the intra-observer agreement was estimated by comparing the scores between pairs of videos of the same cows recorded at less than 48 -hour intervals.

Table 2: Inter-observer (A,B,C,D) and intra-observer reliability and agreement of the locomotion scoring.

Observer	(A,B,C,D)	A	B	C	D
Krippendorff’s	0.602	0.611	0.552	0.653	0.585
Percentage of Agreement	55.8	56.4	49.1	60.0	58.2

2.4. Merging the locomotion scores

In order to create the ground truth for the classification dataset, each sample (video) was assigned a single ground-truth label based on the locomotion scores provided by multiple observers. To achieve this, the scores from the four observers were combined into one value by calculating the rounded-down average of the two observers with the highest intra-observer reliability.

The majority of the studies on lameness detection focus on 2-level (normal, lame) or 3-level (normal, moderately lame, lame) locomotion scales rather than on a 5 -level scale [26, 28]. The main motivation for resorting to smaller resolutions in locomotion scales is the distribution of the scores throughout the scale. Severely lame cows are rare, as most get treatment or are culled before they reach this level of lameness [30]. This results in a heavily unbalanced score distribution, most scores being levels 1,2 , and 3 . It is then challenging to train a classifier on unbalanced datasets, especially when little examples are available for some classes. As shown in Table 1, the distribution of the locomotion scores was highly imbalanced. Therefore, in order to balance the dataset, we merged the levels of the scale into a binary scale where level 1 indicated a normal gait, and levels 2,3,4 and 5 indicated a lame gait. This resulted in an agreement of

, and reliability of 0.590 . Note that reliability metrics such as Krippendorff’s

can decrease when the scoring scale is smaller because the chance of agreement is larger.

After combining the scores of the multiple observers and turning to a binary scale, the ground truth for the classification dataset, consisting of 272 videos, contained 143 videos labeled as normal, and 129 videos labeled as lame.

3. Methods

Our methodology consisted of three main parts: pose estimation, gait features extraction, and lameness classification. These parts are described in detail in the following subsections, and a graphical summary of the methods is provided in Figure 1.

Figure 1: Summary of the video processing procedure.

3.1. Pose estimation

Pose estimation models can be used to predict the position of keypoints (body parts) in images and videos without requiring physical markers. TLEAP is a recent, deep-learning-based, temporal pose estimation model that was trained to detect keypoints on the body of cows in videos [16]. The model used sequences of successive frames to predict the coordinate of the keypoints, and was shown to perform better than static approaches in the presence of occlusions (such as fences). In this study, we used T-LEAP to extract nine keypoint coordinates from the video frames (Figure 2). In the next paragraphs, we describe the steps necessary for image cropping, pose estimation, and correction.

3.1.1. Detect-and-crop

The T-LEAP model required the input frames to be square and cropped around the cow’s body. The cows were automatically localized in the
video frames using the Faster Region-based Convolutional Neural Network (Faster R-CNN), an object-detection model that returns the coordinates of a bounding box (bbox) around each object of interest (here, cows). We used the Faster R-CNN model (with ResNeXt-101 backbone) trained on the COCO-2017 dataset from the Detectron2 library [31]. The COCO-2017 dataset contained 118 K training images with annotations for 80 categories of objects, among which 8014 bounding-box annotations of cows. The Faster R-CNN model from Detectron2 worked out of the box and could detect the cows in our video frames without fine-tuning. Each frame of each video was fed to the object-detection model, which returned a list of bounding boxes, one for each detected cow. For each frame, the bounding box was made square by extending the top and bottom coordinates to match the width while keeping the cow vertically centered. A 100-pixel padding was added to all four sides to ensure that the body of the cow was fully visible in the cropped area. The image was cropped to the coordinates of the extended bounding box and re-scaled to a size of

pixels. The coordinates of the cropping bounding box were saved to transform the keypoint predictions back to the true coordinates for the video frame.

3.1.2. Keypoint detection

We trained T-LEAP to predict the location of 9 keypoints. They represented the location of the following anatomical landmarks: Nose, Forehead, Withers, Sacrum, Caudal thoracic vertebrae, and the four Hooves (Figure 2). The location of these nine keypoints was needed for extracting the gait features described in subsection 3.2. T-LEAP was trained with sequences of 2 consecutive frames as input because the authors reported the best performance with

[16].

A pose estimation dataset was created for training and evaluating TLEAP, using 28 videos of unique cows randomly selected from of the 272 available videos. The coordinates of the nine keypoints were annotated for each frame of the 28 videos and divided into 968 non-overlapping sequences of 2 frames. We refer to each set of consecutive frames as a sample. T-LEAP was trained with a random subset of

of the samples (i.e., 774 training samples) and evaluated on the remaining

of the samples (i.e., 194 test samples). We used the same training procedure and hyper-parameters settings as described in the original T-LEAP paper [16].

The trained T-LEAP model was then used to predict the location of the nine keypoints on all 272 videos of walking cows, including the 28 videos used for training. Each video frame was cropped around the body of the cow,
and sequences of 2 consecutive frames were fed to the pose estimation model. The keypoint coordinates predicted by the model were then transformed to the true coordinates of the video. For each video, this resulted in the coordinates

of each keypoint for each frame

. We refer to the collection of keypoints of one video as “keypoints trajectories”. In essence, these trajectories represent the motion of the anatomical landmarks localized by the pose-estimator in the 2D image plane.

Figure 2: The 9 keypoints (anatomical landmarks) as described in 16. The keypoints are named as follows: 1: Left-hind hoof, 2: Right-hind hoof, 3: Left-front hoof, 4: Right-front hoof 5: Nose, 6: Forehead, 7: Withers, 8: Caudal thoracic vertebrae, 9: Sacrum.

3.1.3. Keypoint correction

In our set of 272 videos, we identified 98 individual cows. There were 28 videos of unique cows included in training the pose estimation model, and thus 70 cows that the pose estimation model did not see. In their generalization experiment, the authors of T-LEAP reported a percentage of correct keypoints (PCKh@0.2) of 93.8% on known cows (i.e., cows included in the training set) and a performance of

on unknown cows (i.e., cows not included in the training set). It was, therefore, expected to have errors in the predicted keypoint trajectories. To deal with that, we developed a method for correcting the keypoints. First, to identify and correct large outliers in the trajectories, we used a Median-Absolute-Deviation (MAD) filter with a temporal window of size 3. We then applied a Savitzky-Golay filter [32]
(window=10, order=3) to smooth the trajectories temporally. Figure 3 shows examples of trajectories with outliers before and after applying the filters.

Figure 3: Example of the keypoint trajectories extracted with T-LEAP (left), and after filtering (right) for a normal gait (top) and a lame gait (bottom).

3.2. Gait features extraction

Using the keypoint trajectories, we computed six locomotion traits that were shown to be correlated with locomotion scores [25, namely Back Posture Measurement (BPM), Head Bobbing Amplitude (HBA), Tracking distance (TRK), Stride Length (STL), Stance Duration (STD) and Swing Duration (SWD). All features relied on step detection, that is, knowing when each hoof was moving (swing phase) or remained still (stance phase). Hence, in the following paragraphs, we first describe the implementation of the step detection, followed by the implementation of the gait features.

3.2.1. Step detection

For each leg, the horizontal movement (x-coordinate) of the hoof was used to detect the stance and swing phases. The stance phase starts when a hoof lands on the floor and ends when the hoof moves forward again. At that moment, the swing phase starts. The hoof continues moving forward
for the whole duration of the swing phase until it lands and remains still for another stance phase. The start and end frames of the stance phases were detected by finding when the x-coordinates of the hoof remained the same, that is, by finding plateaus of at least 10 frames where the absolute difference in x-coordinates between two frames was

pixels, to account for small jitters. We define mid-swing as a frame between the liftoff and landing of the hoof, just before the hoof starts to slow down. The mid-swing moments were detected by finding the peaks of the acceleration of the x-coordinates. The horizontal acceleration of the hoof was computed by taking the second-order derivative of the x -coordinates and then passed through a uniform filter of size 3. An example of the x -coordinate trajectories is shown in Figure 4 , with the stance and mid-swing phases identified by the step detection.

Figure 4: Example of the step detection, using the trajectories of the x-coordinates of the hooves. The vertical lines mark the beginning and end of the stance phase. The crosses mark the peak of the swing phase.

3.2.2. Step correction

The step detection was automatically controlled and corrected using the following procedure: for any given leg, mid-swings must happen before or after the stance phases, and the mid-swings must happen during the supporting phase of the opposite leg (left-right). When the step detection
failed to meet these requirements, this indicated that the keypoint predictions were too noisy on that hoof. Only four videos were found to have problematic step detection. The frames with problematic steps were then removed from the keypoint trajectories, resulting in trajectories with one or several gaps. The trajectories were then trimmed to the part with the most remaining frames.

3.2.3. Back posture measurement (BPM)

To estimate the back posture, or curvature of the back, a similar approach as described in [6] was taken. A circle was fitted through the three keypoints on the spine. The curvature of a circle can be found by taking the inverse of its radius. The radius (

) of the fitted circle was normalized with the head length (

) of the cow (in pixels), as the length of cows can differ. The head length was taken as the Euclidean distance between the keypoints on the forehead and the nose. The BPM was then calculated as follows:

For each leg, the BPM was computed at each mid-swing phase. If there were multiple swing phases, the median BPM value was kept for that leg. The largest BPM over all four legs was used as the final BPM value.

3.2.4. Head bobbing amplitude (HBA)

Head bobbing is defined as an exaggerated movement of the head when an affected limb lands and lifts from the ground [25, 9]. Hence, in the presence of head bobbing, the head moves significantly up and down cyclically (at least once per gait cycle). Sound subjects are expected to have a more steady head stance. Examples of a noticeable head bob and steady head stance are shown in Figure 5. The amplitude of the vertical movement (

-signal) of the forehead keypoint was used as a measure of head bobbing. The amplitude of the y-signal was computed with fast Fourier transforms [33] as follows: let

be the number of frames in a video, let

be the number of frames per gait cycle in a video,

the frequency,

the Fourier transform of the signal, and

the amplitude at frequency

. The value of the HBA was then assigned as the largest amplitude in a gait cycle:

Figure 5: Example of y-signal with and without head bobbing.

3.2.5. Tracking distance (TRK)

The tracking distance is defined as the horizontal distance (x-coordinate) between the landing position of the front hoof and the subsequent landing position of the hind hoof of the same side. If the hind hoof lands at the same location as the front hoof, it indicates no serious walking problem [5], and the TRK value is equal (or close) to 0 . The tracking distance was measured on the left (

) and right (

) side of the cow and was normalized to the head length (

) as follows: for any given side (left, right), let

and

be the x-coordinates of the front and hind hooves, Let

be the start frame of a stance phase on the front hoof, and

the start frame of the subsequent stance phase on the hind hoof. When there was more than one value per side, the median TRK value of that side was returned.

3.2.6. Stride length difference (STL)

The stride length is defined as the horizontal distance between two successive landings of the same hoof. The stride length (

) was measured for each hoof between each successive stance phase

and normalized to the head length

. If there was more than one stride length per hoof, the median value was kept. We measured the difference in stride length between the left and right sides for the hind (

) and front (

) legs as follows:

3.2.7. Stance duration difference (STD)

We define the stance duration as the time (in seconds) between the start (a) and end (b) of each stance phase. The time in seconds was derived from the frame rate of the video recording (here, 30 fps ).

The stance duration

was measured per hoof for each stance phase. If a leg had more than one stance phase, the median duration was used. We measured the difference in duration between the left and right sides for the hind (

) and front (

) legs as follows:

3.2.8. Swing duration difference (SWD)

We define the swing duration as the time (in seconds) between the (a) and end (b) of each swing phase. The time in seconds was derived from the frame rate of the video recording (here, 30 fps ).

The swing duration (

) was measured per hoof for each swing phase. If a leg had more than one swing phase, the median duration was used. We measured the difference in duration between the left and right sides for the hind (SWD

) and front (SWD

) legs as follows:

A summary of the features extracted is listed in Table 3, and Figure 6 presents the distribution of the values of each feature per lameness class.

Table 3: List of the features extracted from the keypoint trajectories.

Feature	Description
BPM	Back posture measurement
HBA	Head bobbing amplitude
TRK	Tracking distance on the left side
TRK	Tracking distance on the right side
	Stride length difference between left- and right-front hooves
	Stride length difference between left- and right-hind hooves
	Stance duration difference between left- and right-front hooves
	Stance duration difference between left- and right-hind hooves
	Swing duration difference between left- and right-front hooves
	Swing duration difference between left- and right-hind hooves

Figure 6: Distribution of the features per lameness class, where 0 corresponds to normal, and 1 to lame.

3.3. Lameness classification

The layout of our machine-learning experiments is described in the next paragraphs. We first split the data into training and validation sets using cross-validation. We then trained and evaluated different binary classifiers to detect lameness using all the extracted features. Lastly, we investigated the importance of features on classification performance.

3.3.1. Data preparation

Considering the relatively small dataset size ( 272 videos), the dataset was split into training and validation sets using a 5 -fold cross-validation (CV) with stratified grouping. In order to prevent data leakage, the grouping was performed on the cow IDs to ensure that, in each fold, there was no overlap of cow IDs between the training and the validation set. Given this non-overlapping constraint, the stratification creates folds that retain, as much as possible, the same class distribution [34]. To ensure a balanced class distribution during training, we applied the Synthetic Minority Oversampling Technique (SMOTE) [35] to the minority classes in the training sets. SMOTE generates new training samples whose feature values are close to the other samples in the minority class. Lastly, the features were re-scaled as machinelearning models often require the features to be on a similar scale. The range of the features was re-scaled using Robust Scaling [36], which uses statistics that are robust to outliers for scaling the data.

3.3.2. Classification models

We compared the performance of the following six classifiers: Logistic Regression (LR), Random Forest (RF), Support Vector with a linear kernel (SVL) and with a radial kernel (SVR), Multi-Layer Perceptron (MLP) and Gradient Boosting Machines (GB). These classifiers were selected as they showed good performance in previous research on lameness detection [8, 7, 14]. We used a flat cross-validation approach to tune the hyper-parameters and train the models, as it is computationally less expensive than nested cross-validation, and generally results in the selection of an algorithm of similar quality to that selected via nested cross-validation 37. The hyperparameters of the classifiers were first optimized using a random crossvalidated search of 100 iterations over the 5 -folds. The classifiers were then re-trained on the 5 -folds with the best set of hyper-parameters.

3.3.3. Evaluation metrics

The performance of the classification models was evaluated with the following metrics: accuracy, F1-score, sensitivity, and specificity. The F1score was macro-averaged; that is, the metric was calculated per class and then averaged. The macro-average is especially useful with imbalanced datasets, as all classes contribute equally to the metric.

3.3.4. Feature importance

An additional experiment was run to investigate whether including multiple features could lead to improvements in lameness classification. The predictive value of a feature was evaluated by measuring the feature importance, that is, how much a feature contributed to a correct classification. To measure the feature importance, we selected the permutation importance method [38] as it can be applied to any classifier. The importance of features was evaluated on the best-performing classifier among the 6 classifiers that were trained with all the features. The permutation importance method was performed as follows: For each cross-validation fold, the model was fitted on the training dataset and evaluated on the F1-score on the validation set. Then, a feature column from the validation set was randomly shuffled, and the model was evaluated again. The importance score was then the difference between the F1-score on the non-shuffled and the shuffled validation data. The permutations were repeated 100 times for each feature. The features were then ranked in the order of their mean importance score. To estimate whether including multiple features could lead to improvements in the lameness classification, the classifier was then retrained with the most important feature, the two most important features, and so on, gradually adding one feature in the order of their importance.

4. Results

4.1. Pose estimation

The test results of T-LEAP are presented in Table 4. On average, there were

of correctly detected keypoints (PCKh@0.2). In other words, the Euclidean distance between the predicted keypoint and its ground truth was smaller than

of the head length in

of the cases. This is in line with the results presented in the original paper [16], where they achieved a

detection rate on the same model with 17 keypoints. The keypoint correction and filtering were run on all 272 videos, and the MAD filter (of window size 3 ) identified

of outlier keypoints, whose coordinates were then corrected
to the median value of the temporal window. Because of the lack of keypoint annotations on all videos, the keypoint correction could only be assessed qualitatively. The trajectories of the keypoints before and after the filtering were plotted for each video and controlled visually. The quality of the filtered trajectories was deemed balanced, in that most of the outliers could be corrected and the trajectories appeared smooth, without over-correction or flattening. The outliers that could not be corrected sufficiently led to a wrong step detection. These steps were then discarded from trajectories, as detailed in section 3.2.

Table 4: Percentage of Correct Keypoints (PCKh@0.2) of T-LEAP on the test set. The keypoints are named as follows: 1: Left-hind hoof, 2: Right-hind hoof, 3: Left-front hoof, 4: Right-front hoof 5: Nose, 6: Forehead, 7: Withers, 8: Sacrum, 9: Caudal thoracic vertebrae.

Keypoint										Mean
PCKh@0.2	98.45	1	99.48	98.45	100	100	100	100	100	99.60

4.2. Lameness detection

The results of the different binary classifiers are listed in Table 5. The SVM with radial kernel, Random Forests, and Gradient Boosting classifiers performed best, with an accuracy above

. SVM-R had a higher specificity, while the Random Forests and Gradient Boosting had a higher sensitivity. The logistic regression, the SVM with linear kernel, and the Multi-Layer Perceptron performed slightly worse.

Table 5: Results of the binary classifiers using all the features. Values are expressed in %. The best results are highlighted in bold.

Model	Accuracy	F1-score	Sensitivity	Specificity
Logistic Regression	78.49	77.26	77.33	77.90
SVM linear kernel	77.25	76.31	75.39	77.90
SVM radial kernel			76.78
Random Forests	79.66	78.44	83.68	74.64
Gradient Boosting	79.12	77.79		72.05
Multi-Layer Perceptron	78.97	77.60	80.74	74.59

4.3. Feature importance

A plot with the scores returned by the permutation importance is shown in Figure 7. For each feature, the score indicates how much a random permutation of the feature values impacted the prediction scores, averaged over 100 permutations. The Back Posture Measurement (BPM) had the highest permutation score, followed by the Head Bobbing Amplitude (HBA) and Left Tracking Distance (TRK_L). The remaining features showed less importance.

Figure 7: Results of the feature importance over 100 random permutations.

Using the permutation importance results, the SVM classifier with radial kernel (SVM-R) was then retrained by gradually adding one feature, in the order of their importance. The classification results of the classifier using these different combinations of features are presented in Table 6. In terms of accuracy and F1-score, using two or more features improves the classification results compared to only using BPM. The best classification scores are reached by using combinations of 3 and 6 features.

Table 6: Results (in %) of the SVM-R classifier after gradually adding one feature per order of their importance score.

SVM-R Features	Accuracy	F1-score	Sensitivity	Specificity
BPM	76.66	74.81	63.26	86.69
BPM, HBA	79.31	77.50	77.42	77.32
BPM, HBA, TRK	79.87	78.22	76.35	80.14
BPM, HBA, TRK, STD	79.47	77.87	77.09	78.89
BPM, HBA, TRK, STD, STL	79.18	78.03	78.31	79.17
BPM, HBA, TRK, STD, STL, SWD	80.07	78.70	76.78	81.15

5. Discussion

5.1. Video processing

The video processing consisted of the following steps: using Faster-R-CNN to detect and isolate the cows from the video frames, using T-LEAP to extract time-series of keypoint locations, and using the MAD and Savitzky-Golay filters to reduce noise from the keypoint predictions. For our set of videos, the pre-trained Faster-R-CNN worked out of the box and detected the location of the cows in each video frame. The performance of T-LEAP was on par with the results described in the original paper [16], and it would require little effort to be transferred to videos recorded in new farms, as [18] showed that little new training data was needed to fine-tune the T-LEAP model. However, some keypoint mis-detections needed to be corrected. The parameters for the MAD outlier filter and the smoothing Savitzky-Golay filter had to be tuned manually until a good trade-off between under- and overcorrection was found. With no or insufficient correction of the keypoint trajectories, the features could give erroneous values. While with over-correction, one would run the risk of removing the true signal of keypoint trajectories, and the extracted features wouldn’t be discriminatory. For instance, if the signal of the forehead would be too flattened, the head bobbing would be systematically missed.

The videos were selected such that there was only one cow at a time in the field of view. This constraint makes the gait analysis more reliable in two ways. First, having a single cow in the field of view ensures that the cows don’t occlude each other’s body parts, making the pose estimation more reliable. Second, a single cow in the field of view ensures enough space between the cows such that they can walk at their own pace and display a
voluntary gait. In practice, this constraint could be implemented by skipping the videos where the Faster-R-CNN (or any other object detector) detects more than one cow, or as done in 17, by implementing a tracking algorithm that follows each cow through the video.

Another constraint for selecting the video was that the cows had to walk from left to right. However, our proposed method would still work if the cows were walking in the opposite direction. We can determine the walking direction by examining the horizontal movement of the keypoints along the x-axis. The values of the keypoints x -coordinates increase over time when the cows walk to the right and decrease when they walk to the left. Therefore, if the cows were to walk from right to left, their keypoint trajectories could simply be mirrored on the x -axis before extracting the features.

5.2. Locomotion scoring

A classifier learns to classify samples from a set of labeled examples, also known as ground-truth or golden-standard. Because a classifier can only be as accurate as its golden-standard [25], a reliable locomotion scale is necessary. Here, the initial inter- and intra-observer reliability was under par. It is worth noting that the reliability is usually lower in homogeneous data because the probability of agreement by chance is higher when scores are not equally distributed [24. It is unlikely that scoring from live observations instead of from videos would have improved the scores, as [39] showed no difference in the reliability of inexperienced observers between live and video scoring and showed improved reliability of experienced observers when scoring from video. The quality of the ground-truth could perhaps have been further improved by organizing additional locomotion scoring sessions or by having shorter scoring sessions over multiple days. However, given that the availability of the observers was limited and that a perfect golden standard was not necessary nor likely achievable, we took other steps to address the problem of low reliability and agreement. Firstly, because we had multiple observers, we could discard the votes from the least reliable observers. Secondly, we addressed the problem of class (score) imbalance by merging the levels of the scale to a binary score: normal and lame. Although these steps improved the quality of our golden standard, some biases might have remained, which could limit the classifiers’ accuracy.

5.3. Lameness detection

The lameness detection was performed as a binary classification task (normal vs. lame) and therefore focused on lameness detection rather than
fine-grained gait scoring. Fine-grained locomotion scoring is left for future research as it would require collecting more video footage with sufficient examples of gait scores of 3 and above.

The performance of the linear classifiers (i.e., logistic regression and SVM with linear kernel) was lower than that of the non-linear classifiers. This implies that when combining all the features, the decision boundary between the normal and lame classes is non-linear. The Multi-Layer Perceptron didn’t perform as well as the other non-linear classifiers, most likely because of the relatively small dataset. The performance of the three best classifiers SVM-R, RF, and BG, aligns with the conclusions of 40] and [37]: they found these three binary classifiers to perform the best on 115 open-source datasets tackling a variety of real-world problems in medicine and biology (but not related to lameness detection). Although, on this dataset, the SVM classifier with radial kernel achieved the best performance in terms of accuracy and F1-score, it might not be the case for other datasets. This is a well-known machine-learning challenge, also known as the “no-free-lunch” theorem, that suggests that no algorithm can outperform all others for all problems [41. Our recommendation would then be to try several classifiers, and the SVMs with radial kernel, random forests, and gradient boosting classifiers provide a good starting point.

5.4. Feature importance

Multiple studies investigated the relationship between individual locomotion traits and locomotion scores [28, 42, 43, 25]. They found that, when scored individually, the traits arched back, asymmetric gait, head bobbing, reluctance to bear weight and tracking-up were highly correlated with the locomotion score. The features selected in this study were designed to measure the same traits. The arched back was measured by the Back Curvature Measurement (HBA), the asymmetric gait by the Stride Length (STL) difference between left and right limbs, the head bobbing by the Head Bobbing Amplitude (HBA), the reluctance to bear weight by the Stance Duration (STD) and Swing Duration (SWD), and the tracking up was measured by the Tracking distance (TRK).

The BPM, HBA, and TRK features returned the highest scores in the permutation importance test. BPM and HBA displayed a clear demarcation between the normal and lame classes in Figure 6. As reported by [28], 42 and 43, it suggests that the back posture, head bobbing, and tracking-up are, for human observers, easier to recognize than an asymmetric gait (e.g. stride length). The tracking distance on the left side (TRK-L) had a higher
importance than the one on the right side (TRK-R). This could indicate that, in our dataset, there were more cows tracking-up on the left than on the right side.

Both for the Stance Duration (STD) and the Swing Duration (SWD) on the hind legs (Fig. 6), one can see a clear difference in the duration of the stance/swing phases between the classes, whereas classes differences are less obvious on the front legs. This could be explained by the fact that lameness happens more often on the hind legs [6, 28]. Including SWD as a feature increased the classification performance, even though SWD had the lowest importance score. In contrast, STD had a larger importance score than SWD, but adding the STD feature to the input of the classifier led to a small decrease in accuracy and F1-score. This could indicate multi-collinearity with other features.

The STL features had the second lowest importance score and the class separation was harder to distinguish in Figure 6. Interestingly, the F1score, sensitivity, and specificity were higher when the STL features were included. This suggests that the stride length can be informative when used in combination with other features. It is worth noting that if the cows have bilateral lameness, i.e., are lame on left and right limbs, then the stride length would show little to no difference [9].

Overall, combining multiple locomotion traits led to a better classification performance than using a single trait. Using a combination of 3 and 6 traits led to the best accuracy and F1-scores on the SVM classifier with a radial kernel. Even though additional traits could be extracted from the keypoint trajectories, it is unknown whether they would lead to significant improvements in the gait classification. Our recommendation would be to include at least the following locomotion traits in an automatic lameness detection system: back posture, head bobbing, and tracking distance, as they demonstrated good overall classification metrics, and these features have been shown to be highly correlated with the locomotion scores [28, 42, 43.

Directly comparing the performance of our lameness classifiers against related work is not straightforward, because even though the task at hand (i.e., detecting lameness from videos) is the same, there is a large variation in the material, methods, and evaluations used in papers that address it. Furthermore, a comprehensive literature review is out of the scope of this paper, and we refer the reader to 44 for an overview of past and current advances in bovine gait analysis. We will here compare our results and
contrast our findings with previous work that we deem directly related to ours.

The Back Posture Measurement (BPM) was first introduced by [6] and curvature of the back has since then been used in numerous studies [6, 19, 20, 21, 22, 13, 7, 8, 17. The BPM is commonly measured during the supporting phase of the hind hooves, and not during the supporting phase of the front hooves because lameness is more common on the hind hooves than on the front ones. However, this practice could lead to the algorithm systematically missing front lameness cases. To prevent this, we computed the BPM based on the supporting phase of the four legs. When using BPM as a single locomotion trait, the accuracy of lameness classification ranged from

[19] to

[13]. When only including the BPM feature in our SVM-R classifier, we reached an accuracy of

, which is in line with the literature.

The work presented in [8] is perhaps the most closely related to this study. In [8], the authors used a combination of traditional and deep-learningbased computer vision to develop a lameness detection system. They used DeepLabCut [15], a deep-learning model trained to track the location of the hoofs and the head in videos of walking cows without physical markers. The pose estimation model achieved a percentage of correct keypoints (PCK) of

and the videos where the keypoint predictions were too erroneous were manually discarded. The outline of the cow’s spine was obtained using a background subtraction method at the pixel level. However, this method may not be effective in handling varying backgrounds and lighting conditions. On the other hand, we used pose estimation to track all the keypoints on the cow’s body, including those on the spine. As a result, our proposed method is more robust in dealing with such changes. Additionally, we automatically corrected erroneous keypoint trajectories with our keypoint-correction and step-correction approaches, which removed the need for manual validation.

In total, they used 212 videos of walking cows, where cows with a score of 1 or 2 were classified as normal, and a score of 3 or 4 as lame. The back curvature was computed from the outline of the spine, and the keypoints on the hooves and on the neck were used to extract the following features: head bobbing, stride length asymmetry, tracking up, landing speed, supporting phase asymmetry, and moving speed. The back curvature and head bobbing were computed when the hind hooves were in contact with the floor and, therefore, didn’t account for lameness on the front limbs. In contrast, we computed the back curvature over all four legs, and the head bobbing
amplitude was extracted from the entire trajectory. The feature selection was performed as follows: a Chi-square test was run on the whole dataset. The test revealed that back posture measurement and head bobbing were the most important features. In contrast, we found that adding tracking-up to the other two features led to better results on our dataset. This could mean that, in their dataset, lame subjects were not tracking up. Another explanation could be that as the number of traits increases, the complexity of the data increases, and a non-linear classifier, such as SVM-R, would be needed. Several classifiers were trained with the back curvature and head bobbing, and the logistic regression classifier returned the best results, with a classification accuracy of

. In comparison, our accuracy of

may seem modest. However, it’s important to note that there are several differences in the data and experimental setup that could have impacted the results. The datasets and methods used in the studies differed, making a direct comparison challenging. In our study, most of the locomotion scores were on the lower end of the scale, and the small number of severely lame cows may have made it harder for the classifier to distinguish between lame and non-lame cows. Additionally, the quality of the ground truth data might have affected the performance of our classifiers. We also followed best machine-learning practices by conducting feature selection only on the training sets and by separating individual cows in the training and validation sets to prevent data leakage. Failing to do so can artificially inflate performance results and lead to overly optimistic conclusions 45 . Overall, we would expect our method to yield similar results if it were run on their dataset, with the advantage of a fully automatic pipeline that doesn’t require manual validation of the keypoint trajectories, and a pose estimation method robust to light conditions and occlusions.

In [17, a fully automated multi-cow lameness detection system was developed. They used a Mask-R-CNN, a deep-learning model, to simultaneously perform object-detection of the cows, and pose estimation of 7 keypoints on the back neck and head. In total, they used 250 videos of 10 different cows. The keypoints were used to extract the back curvature and head position locomotion traits. Each locomotion trait was extracted per video frame and aggregated per video into statistical features such as the mean, median, standard deviation, min, and max values. They trained the CatBoost gradient boosting classifier and achieved a

accuracy on binary lameness detection and

accuracy on a 4-point scale lameness scoring. In our work, although we included four more locomotion traits, we only
aggregated the values into the median value of the video. In light of their classifiers’ excellent performance, a promising direction for extending our work would be to extract more statistical features from the locomotion traits, such as mean, standard deviation, and min and max values, to improve our classification performance further.

6. Conclusion

In this paper, we developed a fully automated lameness detection system. Using the T-LEAP pose estimation model, the motion of nine keypoints was extracted from videos of walking cows. The trajectories of the keypoints were then used to compute six locomotion traits, namely back posture measurement, head bobbing, tracking distance, stride length, stance duration, and swing duration. We found that the three most important traits were back posture measurement, head bobbing, and tracking distance and that including multiple locomotion traits led to a better classification than with a single locomotion trait. For the ground truth, we showed that a thoughtful merging of the scores of the observers could improve intra-observer reliability and agreement. Future work should evaluate the system in a less constrained environment, for instance, with multiple cows in the field of view. Another area for future research could focus on leveraging the temporal essence of the videos, by for instance, including more statistical features per locomotion traits.

Acknowledgements

This publication is part of the project Deep Learning for Human and Animal Health (with project number EDL P16-25-P5) of the research program Efficient Deep Learning (https://efficientdeeplearning.nl) which is (partly) financed by the Dutch Research Council (NWO).

Declaration of interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

*Corresponding authors
Email addresses: helena.russello@wur.nl (Helena Russello ), gert.kootstra@wur.nl (Gert Kootstra )
Preprint submitted to Elsevier
Code available at: https://github.com/hrussel/lameness-detection
https://www.stereolabs.com/zed-2/

الكشف التلقائي عن العرج في الأبقار الحلوب باستخدام الفيديو وتقدير الوضعية وخصائص الحركة المتعددة Video-based automatic lameness detection of dairy cows using pose estimation and multiple locomotion traits

الكشف التلقائي عن العرج في الأبقار الحلوب باستخدام الفيديو وتقدير الوضعية وخصائص الحركة المتعددة

الملخص

1. المقدمة

2. المواد

2.1. جمع البيانات

2.2. تقييم الحركة

2.3. موثوقية المراقبين والاتفاق

2.4. دمج درجات الحركة

3. الطرق

3.1. تقدير الوضع

3.1.1. الكشف والقص

3.1.2. كشف النقاط الرئيسية

3.1.3. تصحيح النقاط الرئيسية

3.2. استخراج ميزات المشي

3.2.1. اكتشاف الخطوات

3.2.2. تصحيح الخطوات

3.2.3. قياس وضع الظهر (BPM)

3.2.4. سعة اهتزاز الرأس (HBA)

3.2.5. مسافة التتبع (TRK)

3.2.6. فرق طول الخطوة (STL)

3.2.7. فرق مدة الوقوف (STD)

3.2.8. فرق مدة التأرجح (SWD)

3.3. تصنيف العرج

3.3.1. إعداد البيانات

3.3.2. نماذج التصنيف

3.3.3. مقاييس التقييم

3.3.4. أهمية الميزات

4. النتائج

4.1. تقدير الوضع

4.2. اكتشاف العرج

4.3. أهمية الميزات

5. المناقشة

5.1. معالجة الفيديو

5.2. تسجيل الحركة

5.3. اكتشاف العرج

5.4. أهمية الميزات

5.5. المقارنة مع الأعمال ذات الصلة

6. الخاتمة

الشكر والتقدير

إعلان المصالح

References

Video-based Automatic Lameness Detection of Dairy Cows using Pose Estimation and Multiple Locomotion Traits

Abstract

1. Introduction

2. Materials

2.1. Data acquisition

2.2. Locomotion scoring

2.3. Observers reliability and agreement

2.4. Merging the locomotion scores

3. Methods

3.1. Pose estimation

3.1.1. Detect-and-crop

3.1.2. Keypoint detection

3.1.3. Keypoint correction

3.2. Gait features extraction

3.2.1. Step detection

3.2.2. Step correction

3.2.3. Back posture measurement (BPM)

3.2.4. Head bobbing amplitude (HBA)

3.2.5. Tracking distance (TRK)

3.2.6. Stride length difference (STL)

3.2.7. Stance duration difference (STD)

3.2.8. Swing duration difference (SWD)

3.3. Lameness classification

3.3.1. Data preparation

3.3.2. Classification models

3.3.3. Evaluation metrics

3.3.4. Feature importance

4. Results

4.1. Pose estimation

4.2. Lameness detection

4.3. Feature importance

5. Discussion

5.1. Video processing

5.2. Locomotion scoring

5.3. Lameness detection

5.4. Feature importance

5.5. Comparison with related work

6. Conclusion