مجموعة بيانات صور عالية الدقة بالذكاء الاصطناعي لتشخيص التليف تحت المخاطي الفموي وسرطان الخلايا الحرشفية High-resolution AI image dataset for diagnosing oral submucous fibrosis and squamous cell carcinoma

المجلة: Scientific Data، المجلد: 11، العدد: 1
DOI: https://doi.org/10.1038/s41597-024-03836-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39333529
تاريخ النشر: 2024-09-27
المؤلف: Nisha Chaudhary وآخرون
الموضوع الرئيسي: أمراض الفم وعلاجها

نظرة عامة

تقدم هذه القسم نظرة عامة على التحديات المرتبطة بالتشخيص النسيجي المرضي لسرطان الفم، مع تسليط الضوء على الصعوبات التي يسببها نقص الأطباء المتخصصين وتباين التفسيرات التشخيصية. لمواجهة هذه التحديات، يقدم المؤلفون قاعدة بيانات ORCHID (قاعدة بيانات صور نسيج سرطان الفم)، وهي مجموعة بيانات متخصصة مصممة لتسهيل تطوير نماذج الذكاء الاصطناعي (AI) للتشخيص السريع لسرطان الفم والحالات السابقة للسرطان.

تتكون قاعدة بيانات ORCHID من مجموعة شاملة متعددة المراكز من صور النسيج عالية الدقة التي تم التقاطها بتكبير فعال 1000X. تشمل فئات مختلفة من سرطان الفم والحالات السابقة للسرطان، مثل تليف الغشاء المخاطي الفموي (OSMF) وسرطان الخلايا الحرشفية الفموية (OSCC)، بالإضافة إلى تصنيفات فرعية مفصلة حسب الدرجة لـ OSCC (متميز جيدًا، متميز بشكل معتدل، ومتميز بشكل ضعيف). يهدف هذا المورد إلى تعزيز قدرات التشخيص المعتمدة على الذكاء الاصطناعي، مما يسهم في تحسين استراتيجيات الكشف المبكر والعلاج لسرطان الفم.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على العبء العالمي الكبير لسرطان الخلايا الحرشفية الفموية (OSCC) وتليف الغشاء المخاطي الفموي (OSMF)، لا سيما في جنوب شرق آسيا، حيث استخدام التبغ والبتلة شائع. يتميز OSMF بترسب الأنسجة الليفية في تجويف الفم، مما يؤدي إلى إعاقات وظيفية وزيادة خطر الإصابة بالسرطان. الكشف المبكر والتدخل ضروريان لإدارة كلا الحالتين؛ ومع ذلك، تواجه طرق التشخيص الحالية، بما في ذلك الفحص السريري، التصوير الشعاعي، والتحليل النسيجي المرضي، قيودًا في الدقة والخصوصية، لا سيما في التمييز بين OSMF وOSCC في مراحله المبكرة.

تؤكد الورقة على الحاجة إلى تقنيات تشخيص متقدمة وتطوير قواعد بيانات نسيجية شاملة لتعزيز الكشف المبكر والتشخيص الدقيق. بينما تم استخدام الخوارزميات المعتمدة على الكمبيوتر لأتمتة تقسيم الصور الملونة بـ H&E، فإن نقص مجموعات البيانات الكبيرة والموسومة بشكل جيد للأمراض الفموية، وخاصة OSMF، يمثل تحديًا كبيرًا لتدريب نماذج الذكاء الاصطناعي الفعالة. يقترح المؤلفون إنشاء قاعدة بيانات ORCHID، التي تهدف إلى توفير مورد قيم للمجتمع العلمي لتحسين أدوات التشخيص المعتمدة على الذكاء الاصطناعي في رعاية صحة الفم، مما يعزز في النهاية نتائج المرضى.

طرق

توضح قسم الطرق تصميم التجربة والتقنيات التحليلية المستخدمة في الدراسة. استخدم الباحثون نهجًا كميًا، حيث تم استخدام التحليلات الإحصائية لتقييم البيانات التي تم جمعها من تجارب مختلفة. تضمنت المنهجيات المحددة تجارب مختبرية محكومة، حيث تم التلاعب بالمتغيرات بشكل منهجي لمراقبة تأثيراتها على النتائج المعنية.

شملت جمع البيانات مقاييس نوعية وكمية، مع التركيز على ضمان الموثوقية والصلاحية من خلال التجارب المتكررة وتقنيات أخذ العينات المناسبة. تم إجراء التحليل باستخدام برامج إحصائية متقدمة، مما سهل تطبيق اختبارات مختلفة، بما في ذلك تحليل الانحدار وANOVA، لتحديد دلالة النتائج. بشكل عام، تم تصميم الإطار المنهجي لاختبار الفرضيات بدقة وتقديم استنتاجات قوية بناءً على الأدلة التجريبية.

مناقشة

تتناول قسم المناقشة في الورقة البحثية الاعتبارات الأخلاقية، والمنهجيات، وعمليات التحقق المرتبطة بدراسة تصنيف سرطان الفم باستخدام الصور النسيجية المرضية. تم جمع عينات الأنسجة بشكل أخلاقي من مؤسسات متعددة في الهند، مع ضمان الحصول على موافقة مستنيرة من المشاركين والامتثال للإرشادات الأخلاقية. استخدمت الدراسة صبغة الهيماتوكسيلين والإيوزين (H&E)، مع توحيد صارم عبر مختبرات مختلفة لتقليل تباين الصبغ. تم الحصول على صور عالية الجودة باستخدام عدسة تكبير 1000X وخضعت لتوضيح خبير لضمان الوضوح والدقة، مع التركيز على القضاء على العيوب التي قد تؤثر على التفسير.

لتحسين اتساق مجموعة البيانات، تم تطبيق تقنية تطبيع صبغة راينهارد لمعالجة التباينات في الصبغ عبر العينات. تم إنشاء رقع الصور لتحليل التعلم العميق، باستخدام نموذج InceptionV3، الذي تم تدريبه مسبقًا وتعديله لتصنيف عينات OSMF وسرطان الخلايا الحرشفية الفموية (OSCC). أظهر النموذج دقة عالية في التمييز بين هذه الفئات، محققًا دقة تدريب تبلغ 99.18% للتصنيف العام و92.81% لدرجات OSCC. تؤكد النتائج على قوة النموذج وإمكانية تطبيقه السريري، بينما تسلط الضوء أيضًا على الحاجة إلى مزيد من التحسين والتوسع في مجموعة البيانات لتحسين أداء التصنيف، لا سيما لأنواع OSCC الفرعية.

القيود

تتعلق قيود مجموعة بيانات ORCHID بشكل أساسي بصورها عالية الدقة، التي، على الرغم من كونها مفصلة، تقدم مجال رؤية محدود مقارنة بصور الشرائح الكاملة (WSIs). يمكن أن تؤدي هذه القيود إلى إغفال معلومات سياقية حاسمة ضرورية للتحليل المرضي الشامل، حيث تركز الصور عالية الدقة على مناطق معينة من الاهتمام وقد تتجاهل المناطق ذات الأهمية التشخيصية الموجودة في WSIs. على الرغم من أن الصور عالية الدقة تتطلب طاقة حوسبة أقل من WSIs، إلا أنها لا تزال تتطلب موارد كبيرة للتحليل، مما قد يعيق إمكانية الوصول إلى أدوات الذكاء الاصطناعي المتقدمة في البيئات ذات البنية التحتية الحاسوبية المحدودة. بالإضافة إلى ذلك، تتطلب التوضيحات الدقيقة لهذه الصور خبرة ووقتًا كبيرين، مما قد يؤدي إلى تباين يؤثر على اتساق وقابلية إعادة إنتاج تدريب نموذج الذكاء الاصطناعي.

على الرغم من هذه التحديات، توفر مجموعة بيانات ORCHID صورًا واضحة تكشف عن شذوذات دقيقة، مما يؤسس لأساس البحث المستقبلي حول الحالات الفموية مثل سرطان الخلايا الحرشفية الفموية (OSCC) وتليف الغشاء المخاطي الفموي (OSMF). يؤكد المؤلفون على أهمية تطوير نهج هجينة تستفيد من نقاط القوة لكل من التصوير عالي الدقة وWSIs، إلى جانب تعزيز الأساليب الحاسوبية لإدارة البيانات على نطاق واسع. يدعون إلى دمج مجموعات بيانات متنوعة، بما في ذلك البيانات الجزيئية والنسيجية، لتحسين دقة التشخيص من خلال تقنيات الذكاء الاصطناعي المتقدمة. من خلال جعل مجموعة البيانات متاحة للجمهور، يهدف المؤلفون إلى تشجيع المساهمات من باحثين آخرين، مما يعزز نمو مورد أكثر شمولاً وتنوعًا لتطوير أدوات التشخيص المعتمدة على الذكاء الاصطناعي في رعاية صحة الفم.

Journal: Scientific Data, Volume: 11, Issue: 1
DOI: https://doi.org/10.1038/s41597-024-03836-6
PMID: https://pubmed.ncbi.nlm.nih.gov/39333529
Publication Date: 2024-09-27
Author(s): Nisha Chaudhary et al.
Primary Topic: Oral Health Pathology and Treatment

Overview

The section provides an overview of the challenges associated with the histopathological diagnosis of oral cancer, highlighting the difficulties posed by a shortage of experienced pathologists and variability in diagnostic interpretations. To address these challenges, the authors introduce the ORCHID (ORal Cancer Histology Image Database), a specialized dataset designed to facilitate the development of artificial intelligence (AI) models for the rapid diagnosis of oral cancer and precancerous conditions.

The ORCHID database comprises a comprehensive multicenter collection of high-resolution histology images captured at 1000X effective magnification. It includes various categories of oral cancer and precancer, such as oral submucous fibrosis (OSMF) and oral squamous cell carcinoma (OSCC), along with detailed grade-level sub-classifications for OSCC (well-differentiated, moderately-differentiated, and poorly-differentiated). This resource aims to enhance AI-based diagnostic capabilities, ultimately contributing to improved early detection and treatment strategies for oral cancer.

Introduction

The introduction of this research paper highlights the significant global burden of oral squamous cell carcinoma (OSCC) and oral submucous fibrosis (OSMF), particularly in Southeast Asia, where tobacco and betel quid use is prevalent. OSMF is characterized by fibrous tissue deposition in the oral cavity, leading to functional impairments and an increased risk of malignancy. Early detection and intervention are critical for managing both conditions; however, current diagnostic methods, including clinical examination, radiographic imaging, and histopathological analysis, face limitations in accuracy and specificity, particularly in differentiating between OSMF and early-stage OSCC.

The paper emphasizes the need for advanced diagnostic techniques and the development of comprehensive histopathology databases to enhance early detection and accurate diagnosis. While computer-based algorithms have been employed to automate the segmentation of H&E stained images, the lack of large, well-annotated datasets for oral diseases, especially OSMF, poses a significant challenge for training effective AI models. The authors propose the creation of the ORCHID database, aimed at providing a valuable resource for the scientific community to improve AI-based diagnostic tools in oral healthcare, ultimately enhancing patient outcomes.

Methods

The Methods section outlines the experimental design and analytical techniques employed in the study. The researchers utilized a quantitative approach, employing statistical analyses to evaluate the data collected from various experiments. Specific methodologies included controlled laboratory experiments, where variables were systematically manipulated to observe their effects on the outcomes of interest.

Data collection involved both qualitative and quantitative measures, with an emphasis on ensuring reliability and validity through repeated trials and appropriate sampling techniques. The analysis was conducted using advanced statistical software, which facilitated the application of various tests, including regression analysis and ANOVA, to determine the significance of the findings. Overall, the methodological framework was designed to rigorously test the hypotheses and provide robust conclusions based on empirical evidence.

Discussion

The discussion section of the research paper outlines the ethical considerations, methodologies, and validation processes involved in the study of oral cancer classification using histopathological images. Tissue samples were ethically collected from multiple institutions in India, ensuring informed consent from participants and adherence to ethical guidelines. The study utilized Hematoxylin and Eosin (H&E) staining, with rigorous standardization across various laboratories to minimize staining variability. High-quality images were acquired using a 1000X magnification lens and underwent expert annotation to ensure clarity and accuracy, with a focus on eliminating artifacts that could affect interpretation.

To enhance the dataset’s consistency, a Reinhard stain normalization technique was applied to address staining discrepancies across samples. Image patches were generated for deep learning analysis, specifically using the InceptionV3 model, which was pre-trained and fine-tuned for classifying normal, Oral Submucous Fibrosis (OSMF), and Oral Squamous Cell Carcinoma (OSCC) samples. The model demonstrated high accuracy in distinguishing between these classes, achieving training accuracies of 99.18% for the general classification and 92.81% for the OSCC grades. The results underscore the model’s robustness and potential clinical applicability, while also highlighting the need for further refinement and expansion of the dataset to improve classification performance, particularly for OSCC subtypes.

Limitations

The limitations of the ORCHID dataset are primarily related to its high-resolution images, which, while detailed, offer a restricted field of view compared to whole slide images (WSIs). This limitation can result in the omission of critical contextual information necessary for thorough pathological analysis, as high-resolution images focus on specific areas of interest and may overlook diagnostically significant regions present in WSIs. Although high-resolution images demand less computational power than WSIs, they still require substantial resources for analysis, which may hinder the accessibility of advanced AI tools in settings with limited computational infrastructure. Additionally, the precise annotation of these images necessitates considerable expertise and time, potentially leading to variability that affects the consistency and reproducibility of AI model training.

Despite these challenges, the ORCHID dataset provides clear images that reveal subtle abnormalities, establishing a foundation for future research on oral conditions such as oral squamous cell carcinoma (OSCC) and oral submucous fibrosis (OSMF). The authors emphasize the importance of developing hybrid approaches that leverage the strengths of both high-resolution imaging and WSIs, alongside enhancing computational methods for managing large-scale data. They advocate for the integration of diverse datasets, including molecular and histological data, to improve diagnostic accuracy through advanced AI techniques. By making the dataset publicly accessible, the authors aim to encourage contributions from other researchers, fostering the growth of a more extensive and diverse resource for the development of AI-based diagnostic tools in oral healthcare.