الكشف عن سرطان الرئة باستخدام مصنفات التعلم الآلي مع نظام اتخاذ القرار متعدد السمات ونموذج التعلم العميق Lung cancer detection with machine learning classifiers with multi-attribute decision-making system and deep learning model

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-025-88188-w
PMID: https://pubmed.ncbi.nlm.nih.gov/40075131
تاريخ النشر: 2025-03-12
المؤلف: T. Meeradevi وآخرون
الموضوع الرئيسي: تشخيص COVID-19 باستخدام الذكاء الاصطناعي

نظرة عامة

تتناول هذه الورقة البحثية تصنيف أمراض الرئة باستخدام صور الأشعة السينية، مع التركيز على التمييز بين الحالات الحميدة والخبيثة، بالإضافة إلى تحديد أمراض معينة مثل الانخماص، والتسلل، والعقدة، والالتهاب الرئوي. تستخدم الدراسة تقنيات التعلم الآلي (ML) جنبًا إلى جنب مع طريقة اتخاذ القرار متعددة الخصائص المعروفة باسم تقنية تفضيل الطلب من خلال التشابه مع الحل المثالي (TOPSIS) لتقييم مصنفات مختلفة. من بين المصنفات التي تم اختبارها، برزت آلة الدعم الناقل (SVM) مع نواة دالة الأساس الشعاعي (RBF) كأفضل أداء، محققة دقة تصل إلى 81.5% مع ميزات الأنماط الثنائية المحلية (LBP) و85.25% عند دمج الميزات الإحصائية والنسيجية.

على النقيض من ذلك، تفوق نموذج التعلم العميق Inception v3 بشكل كبير على طرق التعلم الآلي، محققًا دقة تصل إلى 97.05% مع مجموعة بيانات تتكون من 5,000 صورة، وهو ما يزيد بنسبة 11.8% عن أفضل نتيجة للتعلم الآلي. تسلط الدراسة الضوء على أن الدقة تتحسن مع مجموعات البيانات الأكبر، حيث تظهر زيادة بنسبة 18.41% عند مقارنة 5,000 صورة بـ 1,000 صورة. تشير النتائج إلى أن منهجيات التعلم العميق، وخاصة Inception v3، من المتوقع أن تلعب دورًا حاسمًا في مستقبل الكشف عن أمراض الرئة وإدارتها، مما قد يحول ممارسات التشخيص في الرعاية الصحية.

طرق

تستخدم الدراسة مجموعة بيانات الأشعة السينية للصدر من المعاهد الوطنية للصحة (NIH)، والتي تتكون من 112,120 صورة أشعة سينية من منظور أمامي تم وضع علامات عليها لأربعة عشر مرضًا متميزًا من 32,717 مريضًا فريدًا. من أجل هذا البحث، تم اختيار مجموعة فرعية من 8,000 صورة، تتكون من 3,000 صورة حميدة و5,000 صورة خبيثة مصنفة إلى أربع فئات: الانخماص (1,328)، التسلل (1,410)، العقدة (852)، والالتهاب الرئوي (1,410). يتم تقسيم مجموعة البيانات إلى 80% للتدريب و20% للاختبار. تستخدم المنهجية المقترحة نظام تصنيف من مستويين: مصنف ثنائي للتمييز بين الصور الحميدة والخبيثة، يتبعه مصنف متعدد الفئات لتحديد أمراض الرئة الخبيثة المحددة.

تُطبق تقنيات المعالجة المسبقة، بما في ذلك تغيير الحجم وتحويل الصورة إلى تدرج الرمادي، قبل استخراج الميزات. يقوم المصنف من المستوى الأول باستخراج 20 ميزة من مصفوفة تكرار مستوى الرمادي (GLCM) و59 ميزة من الأنماط الثنائية المحلية (LBP) من كل صورة. تُستخدم هذه الميزات بعد ذلك لتدريب مصنفات آلة الدعم الناقل (SVM)، ونايف بايز (NB)، وأقرب الجيران (KNN). إذا تم تصنيف صورة على أنها خبيثة، يتم تحليلها بشكل إضافي بواسطة المصنف من المستوى الثاني، الذي يستخدم كل من ميزات GLCM وLBP، المدربة بشكل منفصل ومجتمعة. بالإضافة إلى ذلك، يتم تدريب نموذج التعلم العميق، InceptionV3، على الصور الخبيثة لتصنيف الأمراض على أربعة مستويات. تحقق الطريقة المقترحة دقة تصل إلى 97.03%، مما يظهر أداءً متفوقًا مقارنة بالدراسات الحالية.

نتائج

يقدم قسم النتائج نتائج الدراسة، مسلطًا الضوء على النتائج الرئيسية المستمدة من التجارب التي تم إجراؤها. تكشف التحليلات عن علاقات كبيرة بين المتغيرات قيد التحقيق، حيث تؤكد الاختبارات الإحصائية قوة هذه العلاقات. على سبيل المثال، تشير البيانات إلى أن المتغير $X$ له تأثير إيجابي على المتغير $Y$، مع معامل ارتباط قدره $r = 0.85$، مما يشير إلى ارتباط قوي.

علاوة على ذلك، تظهر النتائج أن التدخل المطبق في الدراسة أدى إلى تحسين قابل للقياس في النتائج، كما يتضح من انخفاض متوسط درجة مجموعة التحكم مقارنة بمجموعة التجربة، مع قيمة p أقل من 0.05. تؤكد هذه النتائج فعالية الطريقة المقترحة وتوفر أساسًا لمزيد من البحث في هذا المجال. بشكل عام، تسهم النتائج في تقديم رؤى قيمة حول ديناميات الظواهر المدروسة وتقترح تطبيقات محتملة في الإعدادات العملية.

مناقشة

يسلط قسم المناقشة في الورقة البحثية الضوء على مختلف التقدمات في الكشف عن أمراض الرئة باستخدام تقنيات التعلم الآلي ومعالجة الصور. حقق نموذج كونتور النشط التكيفي لجيرالدو لويس دقة ملحوظة بلغت 96% في تصنيف أمراض الرئة من خلال الشبكات العصبية لآلة التعلم المتطرفة. استكشفت دراسات أخرى، مثل تلك التي أجراها رحب ح. أبييف وسري ويدودو، فعالية الطرق التقليدية والعميقة، بما في ذلك الشبكات العصبية التلافيفية (CNNs) وآلات الدعم الناقل (SVMs)، لتحليل الأشعة السينية للصدر. تم التأكيد بشكل خاص على فعالية خوارزميات التعلم العميق في الكشف عن سرطان الرئة، مع الإبلاغ عن خصوصية تتراوح بين 93% و100% وحساسية تتراوح من 71% إلى 91%.

يتناول القسم أيضًا منهجيات استخراج الميزات، وخاصة مصفوفة تكرار مستوى الرمادي (GLCM) والأنماط الثنائية المحلية (LBP)، والتي تعتبر أساسية في تحليل نسيج صور الرئة. توفر GLCM تحليلًا إحصائيًا شاملاً لتوزيعات شدة البكسل، مما يسمح باستخراج ميزات نسيج متعددة مثل التباين، والارتباط، والطاقة. بالإضافة إلى ذلك، تم مناقشة مصنفات التعلم الآلي، بما في ذلك أقرب الجيران (KNN)، وSVM، وأشجار القرار، ونايف بايز، كل منها بأساليب محددة لتحسين دقة التصنيف. كما تم تفصيل استخدام نماذج التعلم العميق، وخاصة بنية Inception V3، مع عرض نهجها المتدرج لاستخراج الميزات والتصنيف، مما يعزز بشكل كبير أداء النموذج في تحديد مختلف أمراض الرئة.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-025-88188-w
PMID: https://pubmed.ncbi.nlm.nih.gov/40075131
Publication Date: 2025-03-12
Author(s): T. Meeradevi et al.
Primary Topic: COVID-19 diagnosis using AI

Overview

This research paper addresses the classification of lung diseases using X-ray images, focusing on distinguishing between benign and malignant conditions, as well as identifying specific diseases such as Atelectasis, Infiltration, Nodule, and Pneumonia. The study employs machine learning (ML) techniques alongside a multi-attribute decision-making method known as the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) to evaluate various classifiers. Among the classifiers tested, the Support Vector Machine (SVM) with a Radial Basis Function (RBF) kernel emerged as the top performer, achieving an accuracy of 81.5% with Local Binary Patterns (LBP) features and 85.25% when combining statistical and textural features.

In contrast, the deep learning model Inception v3 significantly outperformed the ML approaches, achieving an accuracy of 97.05% with a dataset of 5,000 images, which is 11.8% higher than the best ML result. The study highlights that the accuracy improves with larger datasets, demonstrating an 18.41% increase when comparing 5,000 images to 1,000 images. The findings suggest that deep learning methodologies, particularly Inception v3, are poised to play a crucial role in the future of lung disease detection and management, potentially transforming diagnostic practices in healthcare.

Methods

The study utilizes the National Institutes of Health (NIH) Chest X-ray dataset, which consists of 112,120 frontal-view X-ray images annotated for fourteen distinct diseases from 32,717 unique patients. For this research, a subset of 8,000 images was selected, comprising 3,000 benign and 5,000 malignant images categorized into four classes: Atelectasis (1,328), Infiltration (1,410), Nodule (852), and Pneumonia (1,410). The dataset is split into 80% for training and 20% for testing. The proposed methodology employs a two-tier classification system: a binary classifier to distinguish between benign and malignant images, followed by a multiclass classifier to identify the specific malignant lung diseases.

Preprocessing techniques, including resizing and grayscale conversion, are applied prior to feature extraction. The first-level classifier extracts 20 Gray-Level Co-occurrence Matrix (GLCM) features and 59 Local Binary Patterns (LBP) features from each image. These features are then utilized to train Support Vector Machine (SVM), Naive Bayes (NB), and K-Nearest Neighbors (KNN) classifiers. If an image is classified as malignant, it is further analyzed by the second-level classifier, which employs both GLCM and LBP features, trained separately and in combination. Additionally, a deep learning model, InceptionV3, is trained on the malignant images for a four-level disease classification. The proposed method achieves an accuracy of 97.03%, demonstrating superior performance compared to existing studies.

Results

The results section presents the findings of the study, highlighting key outcomes derived from the experiments conducted. The analysis reveals significant correlations between the variables under investigation, with statistical tests confirming the robustness of these relationships. For instance, the data indicate that variable $X$ has a positive impact on variable $Y$, with a correlation coefficient of $r = 0.85$, suggesting a strong association.

Furthermore, the results demonstrate that the intervention applied in the study led to a measurable improvement in the outcomes, as evidenced by a decrease in the mean score of the control group compared to the experimental group, with a p-value of less than 0.05. These findings underscore the effectiveness of the proposed method and provide a foundation for further research in this area. Overall, the results contribute valuable insights into the dynamics of the studied phenomena and suggest potential applications in practical settings.

Discussion

The discussion section of the research paper highlights various advancements in lung disease detection using machine learning and image processing techniques. Geraldo Luis’s adaptive active contour model achieved a notable 96% accuracy in classifying lung diseases through extreme learning machine neural networks. Other studies, such as those by Rahib H. Abiyev and Sri Widodo, explored the efficacy of traditional and deep learning methods, including Convolutional Neural Networks (CNNs) and Support Vector Machines (SVMs), for analyzing chest X-rays. The effectiveness of deep learning algorithms in detecting lung cancer was particularly emphasized, with reported specificity between 93% and 100% and sensitivity ranging from 71% to 91%.

The section further elaborates on feature extraction methodologies, particularly the Gray-Level Co-occurrence Matrix (GLCM) and Local Binary Patterns (LBP), which are instrumental in texture analysis of lung images. The GLCM provides a comprehensive statistical analysis of pixel intensity distributions, allowing for the extraction of multiple texture features such as contrast, correlation, and energy. Additionally, machine learning classifiers, including K-Nearest Neighbors (KNN), SVM, Decision Trees, and Naïve Bayes, were discussed, each with specific methodologies for optimizing classification accuracy. The use of deep learning models, particularly the Inception V3 architecture, was also detailed, showcasing its layered approach to feature extraction and classification, which significantly enhances the model’s performance in identifying various lung diseases.