شبكة عصبية عميقة مع اتصالات متبقية للتعرف على الأحرف المكتوبة بخط اليد التاميلية Deep inception neural network with residual connections for Tamil handwritten character recognition

المجلة: Scientific Reports، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1038/s41598-026-36330-7
PMID: https://pubmed.ncbi.nlm.nih.gov/41578123
تاريخ النشر: 2026-01-23
المؤلف: Hariharan Periyasamy وآخرون
الموضوع الرئيسي: تقنيات التعرف على النصوص المكتوبة بخط اليد

نظرة عامة

تتناول الأبحاث المقدمة تطوير نظام التعرف على الأحرف المكتوبة بخط اليد باللغة التاميلية (THCR) باستخدام إطار عمل جديد لشبكة الأعصاب العميقة المسماة TamHNet. يتضمن هذا الإطار اتصالات متبقية ويستخدم فلتر ثنائي غير خطي لمعالجة الصور المكتوبة بخط اليد، مما يعالج التحديات التي تطرحها الاختلافات الأسلوبية الدقيقة بين الأحرف التاميلية. يتم تدريب النموذج على مجموعة بيانات مخصصة، وهي مجموعة بيانات الأحرف التاميلية المعزولة (TICD)، والتي تشمل عينات متنوعة من الكتابة اليدوية. من الجدير بالذكر أن TamHNet يستخدم تحسينًا دقيقًا متكيفًا مع مجال بنية Inception-ResNet، مما يحسن أوزان النموذج وانحيازاته من خلال فك تجميد الطبقات بشكل انتقائي. تعزز هذه الطريقة تمييز الميزات، مما يؤدي إلى دقة تحقق ملحوظة تبلغ 99.8%، متجاوزة الأساليب الحالية الرائدة في هذا المجال.

تختتم الدراسة بتسليط الضوء على فعالية بنية TamHNet، التي، عند دمجها مع المعالجة المسبقة المناسبة ومحسن آدم، توفر حلاً قويًا لـ THCR. ومع ذلك، تعترف بالقيود، خاصة في تقسيم النقاط المعزولة، مما يؤثر على التعرف على بعض الأحرف، مثل “Aayutha Ezhuthu.” ستسعى الأعمال المستقبلية إلى تحسين عملية التقسيم وتوسيع قدرات النموذج لتشمل متغيرات الكتابة التاميلية القديمة، مما يعزز الأداء العام وقابلية تطبيق نظام التعرف.

مقدمة

تسلط المقدمة الضوء على أهمية اللغة التاميلية، التي يتحدث بها أكثر من 80 مليون فرد بشكل رئيسي في جنوب الهند وسريلانكا وسنغافورة وماليزيا، مما يجعلها خامس أكثر اللغات تحدثًا في الهند. تتكون الكتابة التاميلية من 247 حرفًا، بما في ذلك 12 حرفًا متحركًا و18 حرفًا ساكنًا، مع 216 حرفًا ساكنًا مزودًا بحركات تتشكل من خلال التركيبات. يتم التأكيد على التحدي المتمثل في تحويل المخطوطات التاميلية المكتوبة بخط اليد إلى تنسيقات قابلة للقراءة بواسطة الآلة، خاصة من خلال عدسة التعرف الضوئي على الأحرف (OCR) والتعرف على نصوص المشهد (STR)، التي تخدم أغراضًا متميزة في استخراج النصوص.

تم تصميم OCR للوثائق المنظمة، حيث يترجم بفعالية النص المطبوع أو المكتوب بخط اليد إلى تنسيق متسق قابل للقراءة بواسطة الآلة، بينما يتناول STR التعرف على النصوص في المشاهد الطبيعية، مستخدمًا تقنيات التعلم العميق المتقدمة لإدارة خطوط واتجاهات متنوعة. تشير المقدمة إلى التعقيدات المرتبطة بالتعرف على الأحرف التاميلية بسبب طبيعتها المتصلة، والتشابهات الهيكلية، والتداخلات، مما يعقد التمييز بين الأحرف المتشابهة. لمواجهة هذه التحديات، تقترح الدراسة TamHNet، وهو نظام مصمم للتعرف على 246 حرفًا تاميلًا من خلال الاستفادة من ميزاتها الهيكلية الفريدة، على الرغم من أنه يستثني رمز التدوين ذو النقاط الثلاث، “Aayutha Eluthu.”

النتائج

يقدم قسم النتائج تقييمًا لأداء نموذج TamHNet في التعرف على الأحرف التاميلية المكتوبة بخط اليد. تضمنت الإعدادات التجريبية تقسيم مجموعة البيانات إلى 70% للتدريب و30% للتحقق، مع التركيز على 104 فئات فريدة مستمدة من 247 حرفًا. تم إنشاء مصفوفة ارتباك بحجم 104 × 104 لتقييم أداء النموذج، على الرغم من أن التصور الكامل كان غير عملي. بدلاً من ذلك، تم استخدام مخططات ارتباك مفككة مصنفة حسب مجموعات التسميات لتعزيز قابلية التفسير.

على الرغم من الإشارات إلى احتمال الإفراط في التكيف في التصورات المجمعة، أظهرت منحنيات خسارة التدريب والتحقق تقاربًا مناسبًا للنموذج. يحدد TamHNet بفعالية الأحرف المكتوبة بخط اليد من صور الوثائق، مستفيدًا من التمثيل الفريد لـ 104 رموز لضمان الاتساق وتجنب التكرار في عمليات التدريب والتحقق.

المناقشة

يمثل النموذج المقترح، TamHNet، تقدمًا كبيرًا في التعرف على الأحرف المكتوبة بخط اليد باللغة التاميلية، حيث حقق دقة قصوى مثيرة للإعجاب تبلغ 99.8%. يعتمد هذا النموذج على شبكة Inception المعدلة بدقة مع اتصالات متبقية، مما يعزز أدائها مقارنة بالأساليب الرائدة الحالية. تقدم الأبحاث مجموعة بيانات شاملة من الأحرف التاميلية المعزولة، المخزنة في Mendeley، والتي تشكل أساسًا لتدريب النموذج والتحقق منه. تسلط المناقشة الضوء على فعالية دمج تقنيات التعلم العميق مع استراتيجيات التحسين، موضحة كيف يتفوق TamHNet على الأساليب السابقة، مثل تلك التي تستخدم آلات الدعم المتجه والشبكات التنافسية التوليدية، التي أبلغت عن معدلات دقة أقل.

تستعرض الورقة أيضًا منهجيات مختلفة موجودة حاليًا للتعرف على الأحرف التاميلية، مع التأكيد على الحاجة إلى نماذج قوية يمكنها التعامل مع تعقيدات المخطوطات المكتوبة بخط اليد. تم هيكلة إطار عمل TamHNet في ثلاث مراحل رئيسية: جمع البيانات وزيادتها، المعالجة المسبقة، وتصميم النموذج، والتي تساهم مجتمعة في أدائها العالي. يعزز استخدام محسن آدم كفاءة التدريب من خلال تعديل معدلات التعلم، مما يعالج التحديات المرتبطة بأنماط الكتابة المتنوعة. بشكل عام، تؤكد النتائج على إمكانيات TamHNet كحل رائد للتعرف على الأحرف المكتوبة بخط اليد باللغة التاميلية، مما يمهد الطريق للأبحاث والتطبيقات المستقبلية في هذا المجال.

Journal: Scientific Reports, Volume: 16, Issue: 1
DOI: https://doi.org/10.1038/s41598-026-36330-7
PMID: https://pubmed.ncbi.nlm.nih.gov/41578123
Publication Date: 2026-01-23
Author(s): Hariharan Periyasamy et al.
Primary Topic: Handwritten Text Recognition Techniques

Overview

The research presented focuses on the development of a Tamil Handwritten Character Recognition (THCR) system utilizing a novel Deep Inception Neural Network framework, termed TamHNet. This framework incorporates residual connections and employs a non-linear bilateral filter for preprocessing handwritten images, addressing the challenges posed by the subtle stylistic variations among Tamil characters. The model is trained on a custom dataset, the Tamil Isolated Character Dataset (TICD), which includes diverse handwriting samples. Notably, TamHNet employs domain-adaptive fine-tuning of the Inception-ResNet architecture, optimizing the model’s weights and biases through selective unfreezing of layers. This approach enhances feature discrimination, leading to a remarkable validation accuracy of 99.8%, surpassing existing state-of-the-art methods.

The study concludes by highlighting the effectiveness of the TamHNet architecture, which, combined with appropriate preprocessing and the Adam optimizer, provides a robust solution for THCR. However, it acknowledges limitations, particularly in the segmentation of isolated dots, which affects the recognition of certain characters, such as “Aayutha Ezhuthu.” Future work will aim to refine the segmentation process and extend the model’s capabilities to include older Tamil script variants, thereby enhancing the overall performance and applicability of the recognition system.

Introduction

The introduction highlights the significance of the Tamil language, spoken by over 80 million individuals primarily in southern India, Sri Lanka, Singapore, and Malaysia, making it the fifth-most-spoken language in India. The Tamil script comprises 247 characters, including 12 vowels and 18 consonants, with an additional 216 vowelized consonants formed through combinations. The challenge of converting handwritten Tamil manuscripts into machine-readable formats is underscored, particularly through the lens of Optical Character Recognition (OCR) and Scene Text Recognition (STR), which serve distinct purposes in text extraction.

OCR is tailored for structured documents, effectively translating printed or handwritten text into a consistent machine-readable format, while STR addresses the recognition of text in natural scenes, employing advanced deep learning techniques to manage diverse fonts and orientations. The introduction notes the complexities involved in recognizing Tamil characters due to their cursive nature, structural similarities, and overlapping strokes, which complicate differentiation among similar letters. To tackle these challenges, the study proposes TamHNet, a system designed to recognize 246 Tamil characters by leveraging their unique structural features, although it excludes the three-dot notation symbol, “Aayutha Eluthu.”

Results

The results section presents an evaluation of the TamHNet model’s performance in recognizing handwritten Tamil characters. The experimental setup involved splitting the dataset into 70% for training and 30% for validation, focusing on 104 unique classes derived from 247 characters. A 104 × 104 confusion matrix was generated to assess the model’s performance, although complete visualization was impractical. Instead, decomposed confusion plots categorized by label subsets were utilized to enhance interpretability.

Despite indications of potential overfitting in the grouped visualizations, the training and validation loss curves demonstrated appropriate convergence of the model. The TamHNet effectively identifies handwritten characters from document images, leveraging the unique representation of the 104 symbols to ensure consistency and avoid redundancy in training and validation processes.

Discussion

The proposed model, TamHNet, represents a significant advancement in Tamil handwritten character recognition, achieving an impressive maximum accuracy of 99.8%. This model is built upon a fine-tuned Inception Network with residual connections, which enhances its performance compared to existing state-of-the-art methods. The research introduces a comprehensive dataset of Tamil isolated characters, stored in Mendeley, which serves as a foundation for training and validating the model. The discussion highlights the efficacy of integrating deep learning techniques with optimization strategies, showcasing how TamHNet outperforms previous approaches, such as those utilizing Support Vector Machines and Generative Adversarial Networks, which reported lower accuracy rates.

The paper also reviews various existing methodologies for Tamil character recognition, emphasizing the need for robust models that can handle the complexities of handwritten scripts. The TamHNet framework is structured into three main phases: data collection and augmentation, preprocessing, and model design, which collectively contribute to its high performance. The use of the Adam optimizer further enhances training efficiency by adapting learning rates, thus addressing challenges associated with varying handwriting styles. Overall, the findings underscore the potential of TamHNet as a leading solution for Tamil handwritten character recognition, paving the way for future research and applications in this domain.