تطبيق الذكاء الاصطناعي في علم الخلايا العنقية: مراجعة منهجية لنماذج التعلم العميق، ومجموعات البيانات، والمقاييس المبلغ عنها Application of artificial intelligence in cervical cytology: a systematic review of deep learning models, datasets, and reported metrics

المجلة: Frontiers in Big Data، المجلد: 8
DOI: https://doi.org/10.3389/fdata.2025.1678863
PMID: https://pubmed.ncbi.nlm.nih.gov/41550300
تاريخ النشر: 2026-01-02
المؤلف: Miguel Angel Valles-Coral وآخرون
الموضوع الرئيسي: الذكاء الاصطناعي في اكتشاف السرطان

نظرة عامة

تستكشف هذه المراجعة المنهجية تطبيق الذكاء الاصطناعي (AI)، وبالتحديد نماذج التعلم العميق، في علم الخلايا العنقية للكشف المبكر عن الآفات السابقة للسرطان. من خلال تحليل 77 مقالة تمت مراجعتها من قبل الأقران نُشرت بين عامي 2022 و2025، تحدد المراجعة هيمنة الهياكل الهجينة، وخاصة تلك التي تدمج الشبكات العصبية التلافيفية (CNNs) مع آليات الانتباه أو نماذج المحولات البصرية (ViT). كانت مجموعات البيانات الأكثر استخدامًا هي SIPaKMeD وHerlev، على الرغم من وجود اتجاه متزايد نحو استخدام مجموعات البيانات الخاصة. تسلط المراجعة الضوء على أنه بينما يبلغ متوسط الدقة المبلغ عنها عبر الدراسات 87.76%، مع تجاوز بعض النماذج الهجينة وViT 92%، فإن القيود المنهجية مثل معايير التشخيص غير المتسقة والحد من التحقق المتبادل تعيق تعميم هذه النتائج.

تؤكد المراجعة على الحاجة إلى تحسين بروتوكولات التقييم، حيث إن العديد من الدراسات تقلل من الإبلاغ عن المقاييس السريرية ذات الصلة مثل الخصوصية والمساحة تحت المنحنى (AUC). كما تشير إلى التحديات التي تطرحها الاعتماد على عدد محدود من مجموعات البيانات، والتي قد لا تعكس التباين السريري الضروري للتطبيقات في العالم الحقيقي. تعتبر هذه التحليل الشامل مرجعًا أساسيًا للبحوث المستقبلية، بهدف تعزيز موثوقية وقابلية تطبيق أنظمة الذكاء الاصطناعي في علم الخلايا العنقية من خلال معالجة الفجوات المنهجية الحالية وتعزيز تطوير مجموعات بيانات متنوعة ومُعَلَّمة بشكل جيد.

مقدمة

يستمر سرطان عنق الرحم في كونه قضية صحية هامة للنساء على مستوى العالم، خاصة في المناطق التي تعاني من نقص في الوصول إلى الرعاية الصحية. لقد سهل اختبار بابانيكولاو تاريخيًا الكشف المبكر عن الشذوذات العنقية، مما يمنع التقدم إلى السرطان الغازي. تقوم هذه المراجعة بتقييم منهجي لتطبيق الذكاء الاصطناعي (AI) في علم الخلايا العنقية، مع التركيز على النماذج ومجموعات البيانات المستخدمة، فضلاً عن نتائج أدائها. انتقلت تطورات الطرق الحسابية في الطب من الأنظمة الحدسية إلى نماذج التعلم العميق المتقدمة، خاصة في التصوير الطبي، مع تزايد الاهتمام بالهياكل الهجينة التي تجمع بين الشبكات العصبية التلافيفية (CNNs) والمحولات البصرية (ViTs) لتحسين التعرف على الأنماط في الصور الخلوية.

على الرغم من التقدم، لا تزال التحديات قائمة، بما في ذلك المفهوم الخاطئ بأن تصنيف الخلايا وحده يكفي لتشخيص السرطان، وعدم كفاية مجموعات البيانات المستخدمة، وتعقيد بعض النماذج. يهدف الذكاء الاصطناعي إلى المساعدة في التعرف التلقائي على الآفات الخلوية من خلال تقنيات رؤية الكمبيوتر، ومع ذلك، فإن المخاطر مثل الإيجابيات الكاذبة أو السلبية يمكن أن تؤثر على رعاية المرضى. تسلط المراجعة الضوء على زيادة البحث في الذكاء الاصطناعي لتحليل الصور الخلوية، مصنفة الدراسات بناءً على اعتمادها على مجموعات البيانات العامة مقابل الخاصة. بينما تظهر بعض الدراسات جدوى أتمتة المهام السيتوباثولوجية، يعتمد العديد منها على مجموعات بيانات لا تمثل بدقة السيناريوهات السريرية. تهدف المراجعة إلى تقديم نظرة شاملة على تطبيقات الذكاء الاصطناعي في علم الخلايا العنقية، مع معالجة بناء النماذج، واستخدام مجموعات البيانات، والمقاييس المبلغ عنها، مع التأكيد على الحاجة إلى تحسين القابلية للتعميم والملاءمة السريرية في الأبحاث المستقبلية.

طرق البحث

تهدف المراجعة المنهجية إلى تقييم تطبيق الذكاء الاصطناعي (AI) في تصنيف صور علم الخلايا العنقية، مع التركيز على النماذج السائدة، ومجموعات البيانات، ومقاييس الأداء. تتبع المنهجية عملية هيكلية من ثلاث مراحل – التخطيط، والتنفيذ، والتقارير – تتماشى مع إرشادات PRISMA واستراتيجية PICO. تم تسهيل إدارة المراجع باستخدام Mendeley (الإصدار 1.19.8)، بينما تم استخدام Excel لتتبع اختيار الدراسات. بالإضافة إلى ذلك، تم إنشاء مخطط تدفق PRISMA باستخدام draw.io (الإصدار عبر الإنترنت، تم الوصول إليه في 13 يونيو 2025).

تشمل المراجعة طرقًا ونماذج متنوعة، بما في ذلك الشبكات العصبية، وتقنيات التدريب، والهياكل الهجينة، لتوفير نظرة شاملة على المشهد الحالي في تصنيف صور علم الخلايا العنقية المدفوعة بالذكاء الاصطناعي. من المتوقع أن تسلط هذه التحليل الضوء على الاتجاهات الرئيسية ونتائج الأداء في هذا المجال، مما يساهم في فهم وتقدم تطبيقات الذكاء الاصطناعي في التشخيص الطبي.

النتائج

تسلط نتائج المراجعة المنهجية الضوء على اكتشافات هامة تتعلق بمقاييس أداء نماذج الذكاء الاصطناعي (AI) المطبقة على صور علم الخلايا العنقية. بينما أبلغت غالبية الدراسات عن قيم دقة عالية، كانت القيود المنهجية شائعة، بما في ذلك الاعتماد على مجموعات البيانات العامة التقليدية، وعدم كفاية التحقق الخارجي، وعدم اكتمال الإبلاغ عن مقاييس مستوى الفئة، ووصف إجراءات التدريب غير الكافي. تتطلب هذه القضايا تفسيرًا حذرًا لمقاييس الأداء المبلغ عنها. كشفت التحليل أن 93.9% من 71 مقالة تمت مراجعتها أبلغت بشكل أساسي عن الدقة، مع قيم تتراوح من 63.08% إلى 100%. كان متوسط الدقة عبر 121 إدخال أداء 87.76%، مع ارتباط أعلى الأداءات بنماذج قائمة على المحولات البصرية (ViT) والنماذج الهجينة.

بالإضافة إلى الدقة، تم الإبلاغ عن الدقة والاسترجاع في حوالي 65% من الدراسات، مع قيم متوسطة تبلغ 87.01% و78.06%، على التوالي، مما يشير إلى تركيز متزايد على أداء مستوى الفئة. كان متوسط نتيجة F1، المبلغ عنها في 54% من الدراسات، 64.65%، لكنها وصلت إلى ما يقرب من 99% في النماذج متعددة الفئات المحسّنة جيدًا. حققت النماذج الهجينة أعلى متوسط دقة بلغ 96.63%، بينما أظهرت النماذج التقليدية مثل Random Forest أداءً أقل (63-83%). على الرغم من النتائج الواعدة التي تشير إلى أن نماذج الذكاء الاصطناعي يمكن أن تتطابق أو تتجاوز القدرات التشخيصية البشرية في بعض المهام، تؤكد المراجعة على الحاجة إلى تحسين ممارسات الإبلاغ، والتحقق الخارجي، والتحليل الإحصائي الشامل في الأبحاث المستقبلية لتعزيز موثوقية وقابلية تطبيق تنفيذات الذكاء الاصطناعي في هذا المجال.

المناقشة

تقدم قسم المناقشة من ورقة البحث مراجعة منهجية لتطبيقات الذكاء الاصطناعي (AI) في علم الخلايا العنقية، باستخدام منهجية PRISMA. تميز المراجعة بشكل نقدي بين اكتشاف الآفات الخلوية وتشخيص السرطان، مع معالجة كل من القيود التقنية والاعتبارات الأخلاقية للنماذج التي تم تقييمها. يكشف التحليل عن الاستخدام السائد للشبكات العصبية التلافيفية (CNNs) والهياكل الهجينة، مع اتجاه ملحوظ نحو نماذج أكثر تعقيدًا تدمج استراتيجيات حسابية متعددة. أظهرت النماذج الهجينة، التي شكلت 61% من الدراسات، مقاييس أداء متفوقة مقارنة بالنماذج الأحادية، مما يشير إلى تحول نحو أساليب تتعامل بشكل أفضل مع التباين في الصور الخلوية.

تسلط المراجعة الضوء أيضًا على الاعتماد على مجموعات البيانات الكلاسيكية، مثل Herlev وSIPaKMeD، التي تُفضل بسبب تعليقاتها المنظمة وتوافرها. ومع ذلك، تشير إلى تزايد الاهتمام بمجموعات البيانات الخاصة التي تلبي سياقات سريرية محددة. تشير النتائج إلى أنه بينما تحقق النماذج الهجينة دقة عالية، خاصة على مجموعات البيانات الكلاسيكية، هناك حاجة ملحة لتنوع أكبر في مجموعات البيانات لتعزيز قابلية تعميم حلول الذكاء الاصطناعي في البيئات السريرية الواقعية. يدعو المؤلفون إلى تطوير معايير موحدة وتعزيز مجموعات البيانات المفتوحة الوصول لتسهيل إعادة الإنتاج والمقارنة عبر الدراسات.

Journal: Frontiers in Big Data, Volume: 8
DOI: https://doi.org/10.3389/fdata.2025.1678863
PMID: https://pubmed.ncbi.nlm.nih.gov/41550300
Publication Date: 2026-01-02
Author(s): Miguel Angel Valles-Coral et al.
Primary Topic: AI in cancer detection

Overview

This systematic review investigates the application of artificial intelligence (AI), specifically deep learning models, in cervical cytology for the early detection of precancerous lesions. Analyzing 77 peer-reviewed articles published between 2022 and 2025, the review identifies a predominance of hybrid architectures, particularly those integrating convolutional neural networks (CNNs) with attention mechanisms or Vision Transformer (ViT) models. The most frequently utilized datasets were SIPaKMeD and Herlev, although there is a growing trend towards the use of private datasets. The review highlights that while the average accuracy reported across studies is 87.76%, with some hybrid and ViT models exceeding 92%, methodological limitations such as inconsistent diagnostic criteria and limited cross-validation hinder the generalizability of these findings.

The review emphasizes the need for improved evaluation protocols, as many studies underreport clinically relevant metrics like specificity and area under the curve (AUC). It also notes the challenges posed by the reliance on a limited number of datasets, which may not reflect the clinical variability necessary for real-world applications. This comprehensive analysis serves as a foundational reference for future research, aiming to enhance the reliability and clinical applicability of AI systems in cervical cytology by addressing current methodological gaps and promoting the development of diverse, well-annotated datasets.

Introduction

Cervical cancer continues to be a significant health issue for women globally, particularly in regions with inadequate healthcare access. The Papanicolaou test has historically facilitated early detection of cervical abnormalities, preventing progression to invasive cancer. This review systematically evaluates the application of artificial intelligence (AI) in cervical cytology, focusing on the models and datasets employed, as well as their performance outcomes. The evolution of computational methods in medicine has transitioned from heuristic systems to advanced deep learning models, particularly in medical imaging, with a growing interest in hybrid architectures that combine convolutional neural networks (CNNs) and vision transformers (ViTs) for improved pattern recognition in cellular images.

Despite advancements, challenges remain, including the misconception that cell classification alone suffices for cancer diagnosis, the inadequacy of datasets used, and the complexity of some models. AI aims to assist in the automatic identification of cytological lesions through computer vision techniques, yet risks such as false positives or negatives can compromise patient care. The review highlights a surge in AI research for cytological image analysis, categorizing studies based on their reliance on public versus proprietary datasets. While some studies demonstrate the feasibility of automating cytopathological tasks, many rely on datasets that do not accurately represent clinical scenarios. The review aims to provide a comprehensive overview of AI applications in cervical cytology, addressing the construction of models, dataset utilization, and reported performance metrics, while emphasizing the need for improved generalizability and clinical relevance in future research.

Methods

The systematic review aims to evaluate the application of artificial intelligence (AI) in the classification of cervical cytology images, focusing on prevalent models, datasets, and performance metrics. The methodology adheres to a structured three-phase process—planning, execution, and reporting—consistent with PRISMA guidelines and the PICO strategy. Reference management was facilitated using Mendeley (version 1.19.8), while Excel was employed for tracking the selection of studies. Additionally, a PRISMA flow diagram was created using draw.io (online version, accessed June 13, 2025).

The review encompasses various methods and models, including neural networks, training techniques, and hybrid architectures, to provide a comprehensive overview of the current landscape in AI-driven cervical cytology image classification. This analysis is expected to highlight key trends and performance outcomes in the field, thereby contributing to the understanding and advancement of AI applications in medical diagnostics.

Results

The results of the systematic review highlight significant findings regarding the performance metrics of artificial intelligence (AI) models applied to cervical cytology images. While a majority of studies reported high accuracy values, methodological limitations were prevalent, including reliance on classical public datasets, inadequate external validation, incomplete class-level metric reporting, and insufficient training procedure descriptions. These issues necessitate cautious interpretation of the reported performance metrics. The analysis revealed that 93.9% of the 71 reviewed articles primarily reported accuracy, with values ranging from 63.08% to 100%. The mean accuracy across 121 performance entries was 87.76%, with the highest performances linked to Vision Transformer (ViT)-based and hybrid models.

In addition to accuracy, precision and recall were reported in approximately 65% of the studies, with mean values of 87.01% and 78.06%, respectively, indicating a growing focus on class-level performance. The F1-score, reported in 54% of studies, averaged 64.65%, but reached nearly 99% in well-optimized multiclass models. Hybrid models achieved the highest average accuracy of 96.63%, while classical models like Random Forest showed lower performance (63-83%). Despite promising results that suggest AI models can match or exceed human diagnostic capabilities in certain tasks, the review underscores the need for improved reporting practices, external validation, and thorough statistical analysis in future research to enhance the reliability and clinical applicability of AI implementations in this field.

Discussion

The discussion section of the research paper presents a systematic review of artificial intelligence (AI) applications in cervical cytology, employing the PRISMA methodology. The review critically distinguishes between cellular lesion detection and cancer diagnosis, addressing both technical limitations and ethical considerations of the AI models evaluated. The analysis reveals a predominant use of convolutional neural networks (CNNs) and hybrid architectures, with a notable trend towards more complex models that integrate multiple computational strategies. Hybrid models, which accounted for 61% of the studies, demonstrated superior performance metrics compared to monolithic models, indicating a shift towards approaches that better handle the variability in cytological images.

The review also highlights the reliance on classic datasets, such as Herlev and SIPaKMeD, which are favored for their structured annotations and availability. However, it notes a growing interest in proprietary datasets that cater to specific clinical contexts. The findings suggest that while hybrid models achieve high accuracy, particularly on classic datasets, there is a critical need for broader dataset diversity to enhance the generalizability of AI solutions in real-world clinical settings. The authors advocate for the development of standardized benchmarks and the promotion of open-access datasets to facilitate reproducibility and comparability across studies.