SBOMs إلى AIBOMs الوكيلة: توسيع المخططات، التنسيق الوكالي وتقييم القابلية للتكرار SBOMs into Agentic AIBOMs: Schema Extensions, Agentic Orchestration and Reproducibility Evaluation

المجلة: Digital Threats Research and Practice، المجلد: 7، العدد: 2
DOI: https://doi.org/10.1145/3798285
تاريخ النشر: 2026-03-09
المؤلف: Petar Radanliev وآخرون
الموضوع الرئيسي: الحوسبة العلمية وإدارة البيانات

نظرة عامة

تنتقد الورقة القيود المفروضة على فواتير المواد البرمجية التقليدية (SBOMs) في تلبية احتياجات الأمان، وإمكانية التكرار، والضمان للأنظمة البرمجية الحديثة. بينما تعمل SBOMs كأدوات شفافية أساسية، فإن طبيعتها الثابتة تفشل في التقاط سلوكيات وقت التشغيل، والتغيرات البيئية، وتقييمات الثغرات السياقية. لمعالجة هذه النواقص، يقترح المؤلفون فواتير المواد الذكية للذكاء الاصطناعي (AIBOMs)، التي تدمج إطار ISO/IEC 20153:2025 (CSAF v2.0) لتفسير الثغرات بشكل موحد. يضمن هذا النهج أن تكون التأكيدات على القابلية للاستغلال مبنية على هيكل استشاري يمكن التحقق منه آليًا، مما يعزز التوافقية وقابلية التدقيق مع الحفاظ على إشراف بشري في عمليات اتخاذ القرار.

تشير التقييمات التجريبية إلى أن AIBOMs تحسن بشكل كبير من دقة الاعتماد في وقت التشغيل، ودقة إمكانية التكرار، وتوافق القابلية للاستغلال مقارنةً بأنظمة الأصل التقليدية، كل ذلك مع تحمل الحد الأدنى من الحمل الحاسوبي. تأتي مزايا الإطار من قدرته على التقاط سياق وقت التشغيل والحفاظ على اتساق حالة الثغرات، بدلاً من الاعتماد على إعادة التشغيل العامة أو تسجيل البيانات الوصفية. يجادل المؤلفون بأن AIBOMs توفر أساسًا قويًا لضمان سلسلة التوريد البرمجية من الجيل التالي، القابلة للتطبيق على أنظمة برمجية متنوعة تتعرض للتكوين الديناميكي والاعتمادات من الأطراف الثالثة. في النهاية، تدعو الورقة إلى التحول من الجرد الثابت إلى القطع الأثرية الأمنية النشطة التي تتوسطها الوكالات، مما يبرز الحاجة إلى آليات الشفافية المستقبلية للانخراط في التفكير بدلاً من مجرد التسجيل في بيئات البرمجيات المعقدة بشكل متزايد.

مقدمة

تتناول مقدمة هذه الورقة البحثية المشهد المتطور لأمان سلسلة التوريد البرمجية، مع تسليط الضوء على قيود فواتير المواد البرمجية التقليدية (SBOMs). بينما توفر SBOMs جردًا وصفيًا لمكونات البرمجيات، فإنها تفشل في حساب الطبيعة الديناميكية للأنظمة البرمجية الحديثة، التي تعتمد بشكل متزايد على تقنيات مثل التحميل الديناميكي والخدمات الفيدرالية. تعيق هذه inadequacy تقييم الثغرات الفعال وضمان الأمان. تقترح الورقة تحولًا تحويليًا نحو فواتير المواد الذكية للذكاء الاصطناعي (AIBOMs)، التي تدمج بيانات تكوين البرمجيات مع التفكير المستقل والتليمتري في وقت التشغيل، مما يعزز قدرات SBOMs.

يقدم إطار AIBOM المقترح وعيًا بسياق التنفيذ، وتطور الاعتماد الديناميكي، وآثار قرارات قابلة للتدقيق من خلال بنية متعددة الوكلاء. يعمل كل وكيل ضمن مساحة إدراك وسياسة قرار محددة، مع التركيز على جوانب مثل إعادة بناء البيئة، واكتشاف الانحراف في وقت التشغيل، والتفكير في القابلية للاستغلال. يهدف هذا الإطار إلى تعزيز SBOMs بالذكاء التشغيلي، مما يسمح بتقييمات الثغرات السياقية وتأكيدات إمكانية التكرار. يؤكد المؤلفون على أهمية المعيار ISO/IEC 20153:2025 الذي تم التصديق عليه مؤخرًا، والذي يدعم دمج هذه القدرات المتقدمة في سير العمل المنظم. في النهاية، تحدد الورقة المساهمات التي تعزز تتبع الاعتماد، وتضمن إمكانية التكرار، وتسهّل تدقيقات الامتثال، مما يعالج التحديات الحرجة في مشهد الأمن السيبراني الحالي.

طرق

تتركز منهجية تحليل البيانات في هذا البحث على تحقيق دقة الاعتماد الديناميكي من خلال أدوات استيراد الخطافات وتحميل الكائنات المشتركة. يتم تأسيس الحقيقة الأساسية من خلال تسجيل جميع الوحدات والمكتبات أثناء التنفيذ، مع حساب مقاييس رئيسية لكل عبء عمل: معدل الالتقاط، ومعدل الإيجابيات الكاذبة (FPR)، ومعدل السلبيات الكاذبة (FNR). بالإضافة إلى ذلك، تشمل التحليلات قياس الوقت حتى الاكتمال عبر اللقطات السابقة، والوسطى، واللاحقة لتقييم الاعتمادات المتأخرة، مع تقديم النتائج في جداول مقارنة تميز بين الاعتمادات المعلنة مقابل الاعتمادات الضمنية في وقت التشغيل.

تم إجراء دراسات إلغاء للتحقق من ضرورة كل وكيل منهجي (MCP، A2A، AGNTCY) من خلال تعطيلها بشكل مستقل، مما يعزز النزاهة المنهجية للنهج. علاوة على ذلك، تم جمع أدلة تجريبية لوظيفة VEX، مع التركيز على صحة تعيينات VEX تحت ملفات تعريف تنفيذ وقت التشغيل، والحساسية للتغيرات البيئية، والاتساق عبر بيئات التشغيل الموثوقة الفيدرالية (TREs). على الرغم من أن التكامل مع سياسة CSAF مخطط له للعمل المستقبلي، تؤكد التجارب على الجدوى التشغيلية لتأكيدات القابلية للاستغلال المدركة من السياق المستمدة من الأدلة في وقت التشغيل.

لتقييم خط أنابيب AIBOM مقابل أنظمة الأصل وإمكانية التكرار المعمول بها (ReproZip، SciUnit، وProvStore)، نفذت الدراسة عبء عمل مشترك عبر ثلاثة خطوط تحليلية: إخفاء البيانات، ومعالجة البيانات، وتدريب النماذج. أظهرت تنفيذ AIBOM أداءً متفوقًا في دقة الأصل في وقت التشغيل واستقرار الثغرات السياقية، مما يوفر تليمتري مباشر وسياق تنفيذ قابل للتحقق منه تشفيرياً. بالمقابل، ركزت ReproZip وSciUnit بشكل أساسي على تعبئة التجارب والتحقق، مما يفتقر إلى التصفية السياقية للثغرات المعروفة (CVEs)، بينما، على الرغم من انخفاض تكلفتها، لم تتمكن ProvStore من التحقق من إمكانية التكرار الاستشاري وتقييم قابلية استغلال البرمجيات في السياق.

نتائج

تُعرض نتائج الدراسة من خلال ثلاثة مقاييس رئيسية: معدلات نجاح إمكانية التكرار، ودرجات انحراف TRE (خطأ ذي صلة بالمهمة)، ومؤشرات الحساسية. هذه المقاييس ضرورية لتقييم موثوقية وفعالية الأساليب التحليلية المستخدمة. يتم وضع كل مقياس في سياق تداعيات المخاطر على الحوكمة التحليلية المنظمة، لا سيما في سياق سيناريوهات النشر النموذجية.

على وجه التحديد، يتم توضيح فعالية الأساليب من خلال الإبلاغ عن أداء متوقع (EP) ومقاييس أداء قياسية (SP)، والتي يتم تقديمها لكل عبء عمل وفي شكل مجمع باستخدام فترات ويلسون. يسمح هذا النهج بفهم دقيق لمقاييس الأداء، مما يسهل اتخاذ قرارات أفضل في البيئات المنظمة.

مناقشة

تسلط قسم المناقشة في الورقة الضوء على أهمية الأنتولوجيات والأدوات الموحدة في تعزيز تبادل معلومات الأمن السيبراني وإدارة الثغرات. بشكل ملحوظ، يهدف إطار CYBEX، الذي تم تطويره تحت الاتحاد الدولي للاتصالات (ITU-T)، إلى إنشاء معيار عالمي للتواصل في مجال الأمن السيبراني، مما يسهل الوصف المنهجي ودمج المواصفات الحالية مثل الثغرات المعروفة (CVE) وعدد الثغرات الشائعة (CWE). يسمح هذا الدمج بإنشاء نظام بيئي تعاوني حيث يمكن تبادل المعرفة في مجال الأمن السيبراني على مستوى عالمي، مما يفيد بشكل خاص الكيانات ذات الموارد المحدودة. كما تؤكد الورقة على أهمية فواتير المواد البرمجية (SBOMs) في إدارة مخاطر سلسلة التوريد البرمجية، مشيرة إلى أنه بينما يمكن أن تساعد SBOMs في تحديد الثغرات، فإن العدد الكبير من الثغرات (أكثر من 200,000 في فهرس CVE) يتطلب تصفية سياقية لتحديد أولويات جهود إدارة المخاطر بشكل فعال.

علاوة على ذلك، تناقش الورقة التحديات المرتبطة بمعالجة SBOMs، لا سيما الحجم الكبير من فحوصات الثغرات المطلوبة، والتي يمكن أن تثقل كاهل فرق الأمن السيبراني. تشير إلى أن حوالي 95% من الثغرات المدرجة في SBOMs قد لا تكون قابلة للاستغلال، مما يؤدي إلى عدم الكفاءة في الفحوصات اليدوية. يجادل المؤلفون بضرورة وجود أدوات آلية يمكنها إدارة بيانات الثغرات بكفاءة والاندماج بسلاسة مع أنظمة إدارة الثغرات الحالية. يتم تسليط الضوء على أداة Dependency-Track كحل مفتوح المصدر واعد يوفر معلومات في الوقت الحقيقي حول الثغرات. تختتم المناقشة بالتأكيد على ضرورة وجود تدابير قوية لأمان سلسلة التوريد البرمجية وتطوير مولدات SBOM متوافقة يمكنها عكس تعقيدات الاعتمادات البرمجية بدقة.

قيود

تتناول قسم القيود عدة قيود رئيسية واتجاهات البحث المستقبلية المتعلقة بالدراسة. أولاً، بينما تم الانتهاء من تنفيذ توليد تأكيدات VEX السياقية، لا تزال جوانب مثل إنفاذ السياسات المقيدة وإدارة دورة حياة المشورة الشاملة خارج النطاق الحالي، مما يتطلب مزيدًا من التحقيق في العوامل المؤسسية والتنظيمية والاجتماعية-التقنية. ثانيًا، تم تصميم السياسات الحالية للوكلاء لتكون قائمة على القواعد وقابلة للتدقيق؛ يمكن أن تستكشف الأبحاث المستقبلية دمج أنظمة دعم القرار المعتمدة على التعلم، مع الحفاظ على القابلية للتفسير والحتمية.

بالإضافة إلى ذلك، تسلط الورقة الضوء على التحديات المرتبطة بالنشر على نطاق واسع عبر الحدود التنظيمية الفيدرالية، لا سيما فيما يتعلق ببدء الثقة، والمصالحة الاستشارية، والتحقق من الأصل عبر المجالات. ركزت التجارب التي أجريت على خمسة أعباء عمل، بما في ذلك Spark وملفات تعريف GPU، لكنها لم تشمل وظائف الحوسبة عالية الأداء (HPC) طويلة الأمد أو بناء لغات مختلطة مع أدوات مخصصة. كانت اختبارات تسميم المشورة محدودة على مرايا OSV/NVD. يجب أن تهدف الأعمال المستقبلية إلى توسيع التقييمات لتشمل جدولة HPC، ودمج مجموعة أوسع من مصادر التغذية وواجهات التصديق، وتوسيع دراسات المستخدمين لتحسين راحة واجهة المستخدم للمراجعين. كما يعتزم المؤلفون دمج VEX وCSAF من النهاية إلى النهاية لتعزيز الاتصال بين سياق الثغرات وبوابات السياسات.

Journal: Digital Threats Research and Practice, Volume: 7, Issue: 2
DOI: https://doi.org/10.1145/3798285
Publication Date: 2026-03-09
Author(s): Petar Radanliev et al.
Primary Topic: Scientific Computing and Data Management

Overview

The paper critiques the limitations of conventional Software Bills of Materials (SBOMs) in addressing the security, reproducibility, and assurance needs of modern software systems. While SBOMs serve as essential transparency tools, their static nature fails to capture runtime behaviors, environmental variations, and contextual vulnerability assessments. To address these shortcomings, the authors propose agentic Artificial Intelligence Bills of Materials (AIBOMs), which integrate the ISO/IEC 20153:2025 (CSAF v2.0) framework for standardized vulnerability interpretation. This approach ensures that exploitability assertions are based on a machine-verifiable advisory structure, enhancing interoperability and auditability while maintaining human oversight in decision-making processes.

Empirical evaluations indicate that AIBOMs significantly improve runtime dependency fidelity, reproducibility accuracy, and exploitability alignment compared to traditional provenance systems, all while incurring minimal computational overhead. The framework’s advantages stem from its ability to capture runtime context and maintain vulnerability-state consistency, rather than from general-purpose replay or metadata logging. The authors argue that AIBOMs provide a robust foundation for next-generation software supply-chain assurance, applicable to various software systems that experience dynamic composition and third-party dependencies. Ultimately, the paper advocates for a shift from static inventories to active, agent-mediated security artifacts, emphasizing the need for future transparency mechanisms to engage in reasoning rather than mere recording in increasingly complex software ecosystems.

Introduction

The introduction of this research paper addresses the evolving landscape of software supply-chain security, highlighting the limitations of traditional Software Bills of Materials (SBOMs). While SBOMs provide a descriptive inventory of software components, they fail to account for the dynamic nature of modern software systems, which increasingly rely on techniques such as dynamic loading and federated services. This inadequacy hampers effective vulnerability assessment and security assurance. The paper proposes a transformative shift towards Agentic Artificial Intelligence Bills of Materials (AIBOMs), which integrate software composition data with autonomous reasoning and runtime telemetry, thereby enhancing the capabilities of SBOMs.

The proposed AIBOM framework introduces execution-context awareness, dynamic dependency evolution, and auditable decision traces through a multi-agent architecture. Each agent operates within a defined perception space and decision policy, focusing on aspects such as environment reconstruction, runtime drift detection, and exploitability reasoning. This framework aims to augment SBOMs with operational intelligence, allowing for contextual vulnerability assessments and reproducibility assertions. The authors emphasize the importance of the recently ratified ISO/IEC 20153:2025 standard, which supports the integration of these advanced capabilities into regulated workflows. Ultimately, the paper outlines contributions that enhance dependency tracking, ensure reproducibility, and facilitate compliance audits, addressing critical challenges in the current cybersecurity landscape.

Methods

The methodology for data analysis in this research focuses on achieving dynamic dependency accuracy through the instrumentation of import hooks and shared-object loading. Ground truth is established by recording all modules and libraries during execution, with key metrics computed for each workload: Capture Rate, False Positive Rate (FPR), and False Negative Rate (FNR). Additionally, the analysis includes measuring time-to-completeness across pre-, mid-, and post-snapshots to assess late-bound dependencies, with results presented in comparative tables contrasting declared versus runtime-implicit dependencies.

Ablation studies were conducted to validate the necessity of each methodological agent (MCP, A2A, AGNTCY) by independently disabling them, reinforcing the methodological integrity of the approach. Furthermore, experimental evidence of VEX functionality was gathered, focusing on the correctness of VEX assignments under runtime execution profiles, sensitivity to environmental changes, and consistency across federated Trusted Runtime Environments (TREs). Although integration with CSAF policy is planned for future work, the experiments confirm the operational viability of context-aware exploitability assertions derived from runtime evidence.

To benchmark the AIBOM pipeline against established provenance and reproducibility frameworks (ReproZip, SciUnit, and ProvStore), the study executed a shared workload across three analytic pipelines: data anonymization, data processing, and model training. The AIBOM implementation demonstrated superior performance in runtime provenance fidelity and contextual vulnerability stability, providing live telemetry and cryptographically verifiable execution context. In contrast, ReproZip and SciUnit primarily focused on experiment packaging and validation, lacking contextual filtering for CVEs, while ProvStore, despite its low overhead, fell short in validating advisory reproducibility and assessing software exploitability in context.

Results

The results of the study are conveyed through three primary metrics: reproducibility success rates, cross-TRE (Task-Relevant Error) deviation scores, and sensitivity indices. These metrics are essential for evaluating the reliability and robustness of the analytical methods employed. Each metric is contextualized with respect to risk implications for regulated analytic governance, particularly in the context of example deployment scenarios.

Specifically, the effectiveness of the methods is illustrated through the reporting of expected performance (EP) and standard performance (SP) metrics, which are presented both per workload and in an aggregated format using Wilson intervals. This approach allows for a nuanced understanding of the performance metrics, facilitating better decision-making in regulated environments.

Discussion

The discussion section of the paper highlights the significance of standardized ontologies and tools in enhancing cybersecurity information exchange and vulnerability management. Notably, the CYBEX framework, developed under the International Telecommunication Union (ITU-T), aims to establish a global standard for cybersecurity communication, facilitating the systematic description and integration of existing specifications like Common Vulnerabilities and Exposures (CVE) and Common Weakness Enumeration (CWE). This integration allows for a collaborative ecosystem where cybersecurity knowledge can be shared globally, particularly benefiting resource-constrained entities. The paper also emphasizes the importance of Software Bills of Materials (SBOMs) in managing software supply chain risks, noting that while SBOMs can help identify vulnerabilities, the vast number of vulnerabilities (over 200,000 in the CVE index) necessitates contextual filtering to prioritize risk management efforts effectively.

Furthermore, the paper discusses the challenges associated with processing SBOMs, particularly the high volume of vulnerability checks required, which can overwhelm cybersecurity teams. It notes that approximately 95% of vulnerabilities listed in SBOMs may not be exploitable, leading to inefficiencies in manual checks. The authors argue for the need for automated tools that can efficiently manage vulnerability data and integrate seamlessly with existing vulnerability management systems. The Dependency-Track tool is highlighted as a promising open-source solution that provides real-time intelligence on vulnerabilities. The discussion concludes by underscoring the necessity for robust software supply chain security measures and the development of compliant SBOM generators that can accurately reflect the complexities of software dependencies.

Limitations

The section on limitations outlines several key constraints and future research directions related to the study. Firstly, while the implementation of contextual VEX assertion generation is complete, aspects such as policy-gated enforcement and comprehensive CSAF advisory lifecycle management remain outside the current scope, necessitating further investigation into institutional, regulatory, and socio-technical factors. Secondly, the existing agent policies are designed to be rule-based and auditable; future research could explore the integration of learning-assisted decision support systems, contingent upon maintaining explainability and determinism.

Additionally, the paper highlights challenges associated with large-scale deployment across federated organizational boundaries, particularly concerning trust bootstrapping, advisory reconciliation, and cross-domain provenance verification. The experiments conducted focused on five workloads, including Spark and GPU profiles, but did not encompass very long-running high-performance computing (HPC) jobs or mixed-language builds with custom toolchains. Advisory poisoning tests were limited to OSV/NVD mirrors. Future work should aim to extend evaluations to HPC schedulers, incorporate a wider range of feed sources and attestation backends, and expand user studies to enhance reviewer UI ergonomics. The authors also intend to integrate VEX and CSAF end-to-end to strengthen the connection between vulnerability context and policy gating.