إطار استرجاع المعلومات باستخدام تضمينات الرسم البياني المعرفي ونمذجة عدم اليقين باستخدام المنطق الاحتمالي الناعم Information retrieval framework using knowledge graph embeddings and uncertainty modelling using probabilistic soft logic

المجلة: Discover Computing، المجلد: 29، العدد: 1
DOI: https://doi.org/10.1007/s10791-025-09859-w
تاريخ النشر: 2026-02-04
المؤلف: Romil Rawat وآخرون
الموضوع الرئيسي: الشبكات العصبية المتقدمة

نظرة عامة

لقد أبرز الزيادة الأسية في البيانات غير المنظمة وشبه المنظمة ضرورة وجود أنظمة استرجاع معلومات متقدمة (IR) قادرة على استخراج رؤى ذات صلة من مستودعات المعرفة الواسعة. غالبًا ما تفشل نماذج IR التقليدية، التي تستخدم بشكل أساسي المطابقة الإحصائية أو المعتمدة على الكلمات الرئيسية، في التقاط العلاقات الدلالية وإدارة الغموض، خاصة في المجالات التي تتسم بعدم اليقين، مثل المجالات الطبية الحيوية والقانونية.

تقدم هذه الدراسة إطارًا جديدًا يجمع بين التفكير الاحتمالي، وتضمين الرسوم البيانية المعززة بالتحويل، ونمذجة عدم اليقين الديناميكية لتعزيز موثوقية وقابلية تفسير أنظمة IR. تم تقييم النموذج المقترح على مجموعات بيانات مرجعية (CN15k و O*NET20k)، حيث أظهر تحسينات كبيرة في مقاييس الاسترجاع القياسية، بما في ذلك مكاسب التراكم المخصومة العادية عند 20 (nDCG@20)، ومتوسط الترتيب العكسي (MRR)، وHits@1، بالإضافة إلى المقياس الجديد للأداء الواعي بعدم اليقين (UAP). إن دمج المعرفة المنظمة مع الدلالات السياقية وقياسات الثقة المعايرة لا يحسن فقط دقة الاسترجاع ولكن أيضًا يضمن أن النظام يتواصل بمستويات الثقة بشكل شفاف، مما يجعله مناسبًا للتطبيقات ذات المخاطر العالية. تؤكد النتائج على أهمية تطوير أطر استرجاع تعطي الأولوية لكل من الدقة والموثوقية، مما يمهد الطريق لأنظمة ذكاء اصطناعي أكثر ذكاءً وقابلية للتفسير قادرة على العمل بفعالية تحت عدم اليقين. تشمل اتجاهات البحث المستقبلية توسيع الإطار ليشمل الرسوم البيانية متعددة اللغات والمجالات، وتنفيذ التعلم المستمر، وتحسين الأداء في البيئات ذات الموارد المنخفضة.

مقدمة

في هذا القسم، يقوم المؤلفون بتقييم عمومية إطار استرجاع المعلومات الواعي بعدم اليقين من خلال تجارب على مجموعات بيانات المجال القانوني، وبشكل خاص LEDGAR وCaseLaw. تعمل هذه المجموعات، التي تم هيكلتها كأزواج تمثل علاقات قانونية متنوعة (مثل “يستشهد”، “يعدل”)، كمنصة اختبار قوية لتقييم قدرات التفكير في الإطار في سياقات قانونية معقدة. تركز التجارب على توقع الروابط، واسترجاع الكيانات، ومعايرة الثقة، باستخدام مقاييس استرجاع قياسية مثل Hits@1، Hits@10، nDCG@20، وMRR، جنبًا إلى جنب مع مقياس UAP الذي يتضمن ثقة التوقع.

تكشف النتائج من مجموعة بيانات LEDGAR أن الإطار الكامل يحقق Hits@1 = 0.51، nDCG@20 = 0.59، MRR = 0.61، وUAP = 0.78. تشير المقارنات مع نماذج الأساس، بما في ذلك TransE وPSL-only، إلى أن دمج تضمينات المعتمدة على التحويل، والتفكير الاحتمالي، ونمذجة عدم اليقين يحسن بشكل كبير من دقة الاسترجاع والترتيب الواعي بالثقة. تؤكد النتائج على قوة الإطار وقدرته على التكيف عبر مجموعات بيانات متنوعة، مما يعزز قدرته على تقديم توقعات قابلة للتفسير وموثوقة في مجالات ذات مخاطر عالية مثل القانون، حيث تكون الثقة وقابلية التفسير أمرين أساسيين.

النتائج

يؤكد قسم القابلية للتكرار على أهمية الالتزام بسير عمل موثق لتكرار نتائج الدراسة. يُطلب من الباحثين استخدام نفس مجموعات البيانات، وبشكل خاص CN15k وO*NET20k، وتطبيق خطوات المعالجة المسبقة المتطابقة. بالإضافة إلى ذلك، يجب تكوين النموذج مع المعلمات الفائقة المحددة وإعدادات الوحدة، بما في ذلك معلمات تضمين المعتمدة على التحويل وتعريفات قواعد PSL.

لضمان تكرار دقيق، يجب أن تتبع إجراءات التدريب والتقييم الطرق الموضحة، باستخدام نفس مقاييس الأداء. يشير المؤلفون إلى أنهم مستعدون لمشاركة الشيفرة اللازمة، وملفات التكوين، ومجموعات البيانات النموذجية عند الطلب المعقول، مما يسهل على الباحثين المستقبليين التحقق من أداء الإطار من خلال تكرار دقيق للتجارب.

المناقشة

ت outlines قسم المناقشة في ورقة البحث تقييمًا شاملاً لإطار جديد لاسترجاع المعلومات الواعي بعدم اليقين (IR) الذي يدمج التفكير الاحتمالي، والتضمينات السياقية، وتقدير عدم اليقين. يستخدم الإطار نهجًا تجريبيًا كميًا، يقارن بين تكوينات متعددة: PSL-only، TEKGE-only، ونموذج هجين. تسلط مراجعة الأدبيات الضوء على قيود نماذج الاسترجاع الحالية، خاصة عدم قدرتها على التعامل بفعالية مع عدم اليقين والعمق الدلالي، والتي يعالجها الإطار المقترح من خلال دمج TEKGE لتمثيل دلالي غني، وDUQL لتقدير عدم اليقين في الوقت الحقيقي، وPSL للتفكير المنطقي القابل للتفسير.

تم تصميم الهيكل المقترح لتعزيز أداء الاسترجاع في بيئات صاخبة وغير مكتملة، مما يظهر قوة من خلال تجارب متخصصة تحاكي عدم انتظام البيانات. تشير النتائج إلى أن الإطار الكامل يحافظ على مقاييس أداء عالية، مثل UAP، حتى في ظل ظروف صعبة، بينما تعاني النماذج التي تفتقر إلى مكونات واعية بعدم اليقين من انخفاضات كبيرة في الدقة. تشمل المساهمات الفريدة للإطار نمذجة عدم اليقين المزدوجة (المعرفية والعشوائية)، والتوقعات المعايرة عبر المعايرة المتوافقة، ودمج المنطق مع التعلم العميق، مما يميزه عن النماذج السابقة التي تعالج هذه العناصر عادةً بشكل منفصل. بشكل عام، تؤكد الدراسة على أهمية دمج الصلة الدلالية مع القدرة على مقاومة عدم اليقين، مما يمهد الطريق لأنظمة IR أكثر موثوقية وقابلية للتفسير في مجالات ذات مخاطر عالية.

Journal: Discover Computing, Volume: 29, Issue: 1
DOI: https://doi.org/10.1007/s10791-025-09859-w
Publication Date: 2026-02-04
Author(s): Romil Rawat et al.
Primary Topic: Advanced Graph Neural Networks

Overview

The exponential increase in unstructured and semi-structured data has underscored the necessity for advanced information retrieval (IR) systems capable of extracting relevant insights from extensive knowledge repositories. Traditional IR models, which primarily utilize statistical or keyword-based matching, often fall short in capturing semantic relationships and managing ambiguity, particularly in fields characterized by uncertainty, such as biomedical and legal domains.

This study introduces a novel framework that synergizes probabilistic reasoning, transformer-enhanced knowledge graph embeddings, and dynamic uncertainty modeling to enhance the reliability and interpretability of IR systems. Evaluated on benchmark datasets (CN15k and O*NET20k), the proposed model exhibited significant improvements in standard retrieval metrics, including normalized Discounted Cumulative Gain at 20 (nDCG@20), Mean Reciprocal Rank (MRR), and Hits@1, as well as in the newly developed Uncertainty-Aware Performance (UAP) metric. The integration of structured knowledge with contextual semantics and calibrated confidence measures not only improves retrieval accuracy but also ensures that the system communicates confidence levels transparently, making it suitable for high-stakes applications. The findings emphasize the importance of developing retrieval frameworks that prioritize both accuracy and trustworthiness, paving the way for more intelligent and interpretable AI systems capable of operating effectively under uncertainty. Future research directions include extending the framework to multilingual and domain-specific knowledge graphs, implementing continual learning, and optimizing for low-resource environments.

Introduction

In this section, the authors evaluate the generalizability of their uncertainty-aware information retrieval framework through experiments on legal domain datasets, specifically LEDGAR and CaseLaw. These datasets, structured as triples representing various legal relationships (e.g., “cites,” “amends”), serve as a robust testbed for assessing the framework’s reasoning capabilities in complex legal contexts. The experiments focus on link prediction, entity retrieval, and confidence calibration, utilizing standard retrieval metrics such as Hits@1, Hits@10, nDCG@20, and MRR, alongside the UAP metric that incorporates prediction confidence.

The results from the LEDGAR dataset reveal that the full framework achieves Hits@1 = 0.51, nDCG@20 = 0.59, MRR = 0.61, and UAP = 0.78. Comparisons with baseline models, including TransE and PSL-only, indicate that the integration of transformer-based embeddings, probabilistic reasoning, and uncertainty modeling significantly improves retrieval accuracy and confidence-aware ranking. The findings confirm the framework’s robustness and adaptability across various datasets, reinforcing its capability to deliver interpretable and trustworthy predictions in high-stakes domains like law, where confidence and explainability are paramount.

Results

The section on reproducibility emphasizes the importance of adhering to a documented workflow to replicate the study’s results. Researchers are instructed to utilize the same datasets, specifically CN15k and O*NET20k, and to apply identical preprocessing steps. Additionally, the model must be configured with the specified hyperparameters and module settings, including transformer-based embedding parameters and PSL rule definitions.

To ensure accurate replication, training and evaluation procedures must follow the outlined methods, employing the same performance metrics. The authors indicate that they are willing to share the necessary code, configuration files, and sample datasets upon reasonable request, thereby facilitating future researchers in validating the framework’s performance through precise replication of the experiments.

Discussion

The discussion section of the research paper outlines a comprehensive evaluation of a novel framework for uncertainty-aware information retrieval (IR) that integrates probabilistic reasoning, contextual embeddings, and uncertainty quantification. The framework employs a quantitative, experimental approach, comparing multiple configurations: PSL-only, TEKGE-only, and a hybrid model. The literature review highlights the limitations of existing retrieval models, particularly their inability to effectively handle uncertainty and semantic depth, which the proposed framework addresses through the integration of TEKGE for rich semantic representation, DUQL for real-time uncertainty estimation, and PSL for interpretable logical reasoning.

The proposed architecture is designed to enhance retrieval performance in noisy and incomplete environments, demonstrating robustness through specialized experiments that simulate data irregularities. Results indicate that the full framework maintains high performance metrics, such as UAP, even under challenging conditions, while models lacking uncertainty-aware components experience significant declines in precision. The framework’s unique contributions include dual uncertainty modeling (epistemic and aleatoric), calibrated predictions via conformal calibration, and the integration of logic with deep learning, distinguishing it from prior models that typically treat these elements in isolation. Overall, the research emphasizes the importance of combining semantic relevance with uncertainty resilience, paving the way for more reliable and interpretable IR systems in high-stakes domains.