العالِم العربي - جراحة الخيول وتوقع البقاء باستخدام نماذج الذكاء الاصطناعي: مقارنة أداء مجموعات البيانات الأصلية، والمُعَوضة، والمتوازنة، والمختارة من حيث الميزات Horse Surgery and Survival Prediction with Artificial Intelligence Models: Performance Comparison of Original, Imputed, Balanced, and Feature-Selected Datasets

المجلة: Kafkas Universitesi Veteriner Fakultesi Dergisi
DOI: https://doi.org/10.9775/kvfd.2023.30908
تاريخ النشر: 2024-01-01
المؤلف: Pınar Cihan
الموضوع الرئيسي: البحث الطبي البيطري في الخيول

نظرة عامة

تقدم هذه القسم نظرة عامة على الجوانب الأساسية لأنظمة الذكاء الاصطناعي (AI)، مع التركيز على اعتمادها على النماذج الرياضية. تتطلب هذه النماذج تدريبًا باستخدام بيانات عينة، والتي تعتبر المدخلات الأساسية لعملية التعلم. تعتبر بيانات التدريب ضرورية لأنها تحتوي على المعلومات التي تمكن نموذج الذكاء الاصطناعي من التمييز وإنتاج المخرجات المطلوبة بفعالية. وهذا يبرز أهمية جودة البيانات وملاءمتها في تطوير أنظمة ذكاء اصطناعي قوية.

مقدمة

تسلط مقدمة ورقة البحث الضوء على الأهمية المتزايدة للذكاء الاصطناعي (AI) في مجالات مختلفة، لا سيما في مهام التنبؤ والتصنيف. بينما لا تزال تطبيقات الذكاء الاصطناعي في الطب البيطري تتطور، تشير الدراسات الحالية إلى نتائج واعدة، خاصة فيما يتعلق بصحة الخيول. يبرز المغص، وهو حالة شائعة وقد تكون قاتلة في الخيول، الحاجة إلى التشخيص السريع والتدخل. يمكن إدارة حوالي 90% من حالات المغص بالعلاج الطبي، ولكن قد تتطلب الـ 10% المتبقية تدخلًا جراحيًا، مما يبرز أهمية اتخاذ القرارات في الوقت المناسب من قبل الأطباء البيطريين.

تهدف الدراسة إلى تحسين التنبؤات بشأن الحاجة إلى الجراحة ومعدلات البقاء على قيد الحياة في الخيول التي تعاني من المغص، باستخدام مجموعة بيانات تتميز بالقيم المفقودة وعدم التوازن. تشمل المساهمات الرئيسية تطبيق طريقة missForest لملء البيانات المفقودة، مما يحسن دقة التنبؤ، واستخدام طريقة SMOTE لمعالجة عدم توازن البيانات، مما يؤدي إلى تحسين أداء النموذج. تقيم الدراسة 15 نموذجًا من نماذج الذكاء الاصطناعي، مما يكشف أن نموذج Random Forest (RF) يتفوق على الآخرين بناءً على دقة وقياسات المساحة تحت المنحنى (AUC). بالإضافة إلى ذلك، تشير النتائج إلى أن استخدام عدد أقل من الميزات يمكن أن يؤدي إلى تحسين نجاح الجراحة ودقة التنبؤ بالبقاء.

طرق

تحدد قسم “المواد والطرق” تصميم التجربة والإجراءات المستخدمة في الدراسة. يوضح المواد المستخدمة، بما في ذلك الكواشف المحددة، والمعدات، وأي عينات بيولوجية، مما يضمن إمكانية تكرار التجارب. تشمل المنهجية البروتوكولات لجمع البيانات، بما في ذلك أي تحليلات إحصائية تم تطبيقها لتفسير النتائج.

بالإضافة إلى ذلك، قد يصف القسم الظروف التجريبية، مثل درجة الحرارة، والمدة، وأي ضوابط تم تنفيذها للتحقق من صحة النتائج. تعتبر صرامة الطرق ضرورية لتأسيس موثوقية النتائج، وأي انحرافات عن الإجراءات القياسية يتم ملاحظتها لتوفير الشفافية في عملية البحث. بشكل عام، يعد هذا القسم مكونًا أساسيًا لفهم كيفية استنتاج استنتاجات الدراسة.

نتائج

في هذه الدراسة، تم تقييم أداء التنبؤ لـ 15 نموذجًا مختلفًا من الذكاء الاصطناعي (AI) بعد تنفيذ تقنيات المعالجة المسبقة والمعالجة اللاحقة لتعزيز الدقة في التنبؤ بنتائج الجراحة والبقاء. تضمنت مرحلة المعالجة المسبقة تحويل الميزات، ومعالجة القيم المفقودة، وتوازن مجموعة البيانات، بينما شملت مرحلة المعالجة اللاحقة عملية اختيار الميزات. تم تحليل ثلاث نسخ من مجموعة البيانات: مجموعة البيانات الأصلية مع البيانات المفقودة، ومجموعة بيانات تم ملؤها باستخدام طريقة missForest، ومجموعة بيانات متوازنة بعد الملء.

أشارت النتائج إلى أن مجموعة البيانات المتوازنة حققت أعلى دقة متوسطة لكل من التنبؤ بالجراحة (80.76%) والتنبؤ بالبقاء (77.96%)، متفوقة على مجموعة البيانات الأصلية، التي كانت لديها دقة أقل تبلغ 78.73% و72.44%، على التوالي. وهذا يشير إلى أن معالجة وتوازن مجموعة البيانات قد حسنت بشكل كبير دقة التنبؤ. تؤكد الدراسة أن استخدام متوسط عبر نماذج الذكاء الاصطناعي المتعددة، بدلاً من الاعتماد على نموذج واحد، يظهر أن نجاح التنبؤات مستقل عن النموذج. يتم تقديم مزيد من التفاصيل حول دقة وقيم AUC لمجموعة البيانات المتوازنة في الجدول 3 والشكل 2، على التوالي، مما يوجه التحليلات اللاحقة في البحث.

مناقشة

في هذه الدراسة، تم استخدام نهج منهجي للتنبؤ بالحاجة إلى التدخل الجراحي واحتمالات البقاء في الخيول المصابة بالمغص، باستخدام مجموعة بيانات متاحة للجمهور تتكون من 299 حصانًا و27 ميزة. شملت المنهجية تحويل البيانات، وملء القيم المفقودة باستخدام طريقة missForest، ومعالجة عدم توازن البيانات من خلال تقنية العينة الزائدة للأقليات الاصطناعية (SMOTE). أشارت النتائج إلى أن مجموعات البيانات المملوءة والمتوازنة حسنت بشكل كبير دقة التنبؤ مقارنة بمجموعة البيانات الأصلية، مما يبرز أهمية التعامل الصحيح مع البيانات في تحسين أداء النموذج.

قيمت الدراسة 15 نموذجًا من نماذج تصنيف الذكاء الاصطناعي، حيث أظهرت طريقة Random Forest (RF) قدرات تنبؤية متفوقة لكل من نتائج الجراحة والبقاء. تشير النتائج إلى أن RF يمكن تطبيقها بفعالية في الطب البيطري، لا سيما لمجموعات البيانات المعقدة مثل مجموعة بيانات المغص الخيلي، التي تتميز بالقيم المفقودة والفئات غير المتوازنة. بالإضافة إلى ذلك، تم إجراء اختيار الميزات لتحسين أداء النموذج، مما يكشف أن تقليل عدد الميزات يمكن أن يؤدي إلى تحسين الدقة دون المساس بالقوة التنبؤية. بشكل عام، تؤكد هذه البحث على إمكانيات الذكاء الاصطناعي في التطبيقات البيطرية وضرورة توفر مجموعات بيانات متاحة للجمهور لتعزيز التعاون والابتكار بين التخصصات.

Journal: Kafkas Universitesi Veteriner Fakultesi Dergisi
DOI: https://doi.org/10.9775/kvfd.2023.30908
Publication Date: 2024-01-01
Author(s): Pınar Cihan
Primary Topic: Veterinary Equine Medical Research

Overview

The section provides an overview of the foundational aspects of artificial intelligence (AI) systems, emphasizing their reliance on mathematical models. These models necessitate training with sample data, which serves as the critical input for the learning process. The training data is essential as it contains the information that enables the AI model to discern and produce the desired outputs effectively. This highlights the importance of data quality and relevance in the development of robust AI systems.

Introduction

The introduction of the research paper highlights the growing significance of artificial intelligence (AI) in various fields, particularly in prediction and classification tasks. While AI applications in veterinary medicine are still developing, existing studies indicate promising results, especially concerning horse health. Colic, a prevalent and potentially fatal condition in horses, underscores the need for rapid diagnosis and intervention. Approximately 90% of colic cases can be managed with medical treatment, but the remaining 10% may require surgical intervention, emphasizing the importance of timely decision-making by veterinarians.

The study aims to enhance predictions regarding the need for surgery and survival rates in horses suffering from colic, utilizing a dataset characterized by missing values and imbalance. Key contributions include the application of the missForest method for imputing missing data, which improves prediction accuracy, and the use of the SMOTE method to address data imbalance, leading to enhanced model performance. The research evaluates 15 AI models, revealing that the Random Forest (RF) model outperforms others based on accuracy and area under the curve (AUC) metrics. Additionally, the findings suggest that utilizing fewer features can lead to better surgical success and survival prediction accuracy.

Methods

The “Material and Methods” section outlines the experimental design and procedures employed in the study. It details the materials used, including specific reagents, equipment, and any biological samples, ensuring reproducibility of the experiments. The methodology encompasses the protocols for data collection, including any statistical analyses applied to interpret the results.

Additionally, the section may describe the experimental conditions, such as temperature, duration, and any controls implemented to validate the findings. The rigor of the methods is crucial for establishing the reliability of the results, and any deviations from standard procedures are noted to provide transparency in the research process. Overall, this section serves as a foundational component for understanding how the study’s conclusions were drawn.

Results

In this study, the prediction performances of 15 different artificial intelligence (AI) models were evaluated after implementing pre-processing and post-processing techniques to enhance accuracy in predicting surgery and survival outcomes. The pre-processing phase involved transforming features, addressing missing values, and balancing the dataset, while the post-processing phase included a feature selection process. Three dataset versions were analyzed: the original dataset with missing data, an imputed dataset using the missForest method, and a balanced dataset post-imputation.

The results indicated that the balanced dataset achieved the highest average accuracy for both surgery (80.76%) and survival predictions (77.96%), outperforming the original dataset, which had lower accuracies of 78.73% and 72.44%, respectively. This suggests that the processing and balancing of the dataset significantly improved prediction accuracy. The study emphasizes that the use of an average across multiple AI models, rather than relying on a single model, demonstrates that the success of predictions is model-independent. Further details on the accuracy and AUC values for the balanced dataset are provided in Table 3 and Figure 2, respectively, guiding subsequent analyses in the research.

Discussion

In this study, a systematic approach was employed to predict the need for surgical intervention and survival probabilities in horses with colic, utilizing a publicly available dataset comprising 299 horses and 27 features. The methodology included data transformation, imputation of missing values using the missForest method, and addressing data imbalance through the Synthetic Minority Over Sampling Technique (SMOTE). The results indicated that the imputed and balanced datasets significantly improved prediction accuracy compared to the original dataset, highlighting the importance of proper data handling in enhancing model performance.

The study evaluated 15 artificial intelligence classification models, with the Random Forest (RF) method demonstrating superior predictive capabilities for both surgery and survival outcomes. The findings suggest that RF can be effectively applied in veterinary medicine, particularly for complex datasets like the Horse Colic dataset, which is characterized by missing values and imbalanced classes. Additionally, feature selection was performed to optimize model performance, revealing that reducing the number of features could lead to improved accuracy without compromising predictive power. Overall, this research underscores the potential of artificial intelligence in veterinary applications and the necessity for publicly accessible datasets to foster interdisciplinary collaboration and innovation.