VeriFi: نحو إلغاء موثوق للفيدرالية VeriFi: Towards Verifiable Federated Unlearning

المجلة: IEEE Transactions on Dependable and Secure Computing، المجلد: 21، العدد: 6
DOI: https://doi.org/10.1109/tdsc.2024.3382321
تاريخ النشر: 2024-03-28
المؤلف: Xiangshan Gao وآخرون
الموضوع الرئيسي: كشف وتصنيف أورام الدماغ

نظرة عامة

تناقش هذه الفقرة النموذج الناشئ للتعلم الفيدرالي (FL)، الذي يمكّن المشاركين من تدريب النماذج بشكل تعاوني مع الحفاظ على خصوصية البيانات. جانب حاسم من FL هو تنفيذ الحق في النسيان (RTBF)، مما يسمح للمشاركين بطلب حذف بياناتهم من النموذج العالمي عند مغادرتهم. بينما تم تطوير طرق حديثة لإزالة التعلم على جانب الخادم لإزالة تدرجات المشاركين، لا تزال فعالية هذه الطرق في ضمان RTBF غير مؤكدة، وهي فجوة يعالجها هذا البحث.

يقدم المؤلفون إطار عمل جديد يسمى VERIFI، الذي يدمج إزالة التعلم الفيدرالي مع عملية التحقق. يتيح هذا الإطار للمشاركين التحقق بنشاط من تأثير إزالة التعلم بعد إبلاغ الخادم بمغادرتهم. تتضمن عملية التحقق خطوتين: الوسم، الذي يحقن علامات لتحديد بيانات المشارك المغادر، والتحقق، الذي يقيم التغييرات في أداء النموذج العالمي بناءً على هذه العلامات. تقيم الدراسة VERIFI عبر سبعة مجموعات بيانات وأربعة أنواع من نماذج التعلم العميق، مع دمج سبع طرق لإزالة التعلم وخمس طرق للتحقق، بما في ذلك تقنيات مقترحة حديثًا تعزز الكفاءة والموثوقية. تسهم النتائج في نهج أكثر موثوقية لإزالة التعلم الفيدرالي، مما يوفر رؤى تجريبية حول فعاليتها.

مقدمة

تناقش مقدمة هذه الورقة البحثية مفهوم التعلم الفيدرالي (FL)، وهو نهج تعليمي تعاوني يمكّن المشاركين من تدريب النماذج دون مشاركة بياناتهم الخاصة، مما يعالج مخاوف الخصوصية في مجالات حساسة مثل المالية والرعاية الصحية. جانب حاسم من FL هو “الحق في النسيان” (RTBF)، كما تفرضه اللوائح مثل GDPR وCCPA، مما يسمح للمشاركين بطلب حذف بياناتهم. ومع ذلك، تكافح الأطر الحالية لـ FL لتسهيل هذا الحق بشكل فعال، مما يثير مخاوف بشأن الاحتفاظ بالبيانات حتى بعد مغادرة المشاركين الاتحاد.

لمعالجة هذه التحديات، يقدم المؤلفون مفهوم إزالة التعلم الفيدرالي القابل للتحقق، مؤكدين على الحاجة إلى كل من عمليات الإزالة والتحقق. يقترحون إطار عمل موحد يسمى VERIFI، والذي يتضمن وحدة إزالة التعلم الفيدرالية ووحدة التحقق مع خطوات وسم والتحقق متميزة. يهدف هذا الإطار إلى ضمان أن يتمكن المشاركون من التحقق من حذف بياناتهم بطريقة قابلة للقياس، مما يعزز الثقة في أنظمة FL. تقدم الورقة أيضًا دراسة منهجية لمختلف طرق الإزالة والتحقق، مع تسليط الضوء على قيود الأساليب الحالية وتقديم تقنيات جديدة وفعالة مصممة لبيئات التعلم الفيدرالي. تشمل مساهمات هذا العمل تصميم VERIFI، وتحديد طرق فعالة للإزالة والتحقق، وتقييم شامل عبر مجموعات بيانات متعددة وهياكل نماذج.

طرق

في هذا القسم، يوضح المؤلفون إعداد التجربة المستخدمة لتقييم طرقهم المقترحة عبر سبع مجموعات بيانات متنوعة. تشمل هذه مجموعتين من بيانات تصنيف الصور منخفضة الدقة (MNIST وCIFAR-10)، ومجموعة بيانات التعرف على الكلام (SpeechCommand)، ومجموعتين من بيانات الصور عالية الدقة للوجه (VGGFace_mini) والتعرف على الكائنات الطبيعية (ImageNet_mini)، بالإضافة إلى مجموعتين من بيانات الصور الطبية لسرطان الجلد (ISIC) وتشخيص COVID-19. تتميز مجموعات البيانات بعدد الفئات، وأحجام العينات، والدقة، مع دقة النماذج المقابلة المبلغ عنها في الجدول III. على سبيل المثال، حقق نموذج LeNet-5 دقة قدرها 99.11% على MNIST، بينما وصلت ResNet-18 إلى 95.37% على CIFAR-10 و88.42% على مجموعة بيانات COVID.

يقدم المؤلفون أيضًا نظرة شاملة على إعداد VERIFI التجريبي، بما في ذلك المعلمات الفائقة الرئيسية مثل معدلات التعلم المحلية والعالمية، وأحجام الدفعات المحلية، وهيكل جولات التدريب، كما هو ملخص في الجدول IV. يتم توزيع بيانات التدريب لكل مجموعة بيانات بالتساوي بين المشاركين، مما يضمن عدم وجود تداخل. يتم توضيح إعدادات المعلمات الافتراضية والتفاصيل الإضافية المتعلقة بالظروف التجريبية بشكل أكبر في الملحق. من الجدير بالذكر أن المعلمات الفائقة المميزة في الجدول IV تتعلق بسيناريوهات الاختبار في المراحل المبكرة، مع التركيز بشكل خاص على عملية الإزالة.

مناقشة

ت outlines قسم المناقشة في الورقة إطار عمل VERIFI المقترح، الذي يعالج إزالة التعلم الفيدرالي والتحقق في سياق التعلم الفيدرالي (FL). يتكون الإطار من ثلاثة وحدات رئيسية: وحدة إزالة التعلم، وحدة التحقق، وآلية إزالة التعلم والتحقق. يؤكد المؤلفون على أهمية إزالة البيانات المرتبطة بمشارك مغادر بشكل فعال مع ضمان الحفاظ على سلامة النموذج العالمي. يقدمون طرق إزالة مختلفة، بما في ذلك إعادة التدريب التقليدية وتقنيات طرح التدرجات، ويقترحون طريقة جديدة تسمى Scale-to-Unlearn ($u_{S2U}$)، التي تعدل مساهمات التحديثات المحلية لتسهيل الإزالة دون إعادة تدريب مكثفة.

تعتبر عملية التحقق حاسمة لتقييم فعالية الإزالة. يقترح المؤلفون عدة طرق للوسم، بما في ذلك علامات الذاكرة الفريدة التي تستفيد من خصائص تذكر النموذج، للتحقق من عملية الإزالة. تهدف هذه الطرق إلى ضمان أن النموذج لم يعد يحتفظ بمعلومات حول البيانات المحذوفة. تسلط المناقشة الضوء على تحديات التحقق من الإزالة، لا سيما في FL، حيث قد تشكل تقنيات الباب الخلفي التقليدية مخاطر أمنية. يدعو المؤلفون إلى طرق تحقق غير تدخليه لا تضر بخصوصية المشاركين، مما يسهم في بيئة تعلم فيدرالي أكثر أمانًا وكفاءة.

Journal: IEEE Transactions on Dependable and Secure Computing, Volume: 21, Issue: 6
DOI: https://doi.org/10.1109/tdsc.2024.3382321
Publication Date: 2024-03-28
Author(s): Xiangshan Gao et al.
Primary Topic: Brain Tumor Detection and Classification

Overview

The section discusses the emerging paradigm of federated learning (FL), which enables participants to collaboratively train models while preserving data privacy. A critical aspect of FL is the implementation of the right to be forgotten (RTBF), allowing participants to request the deletion of their data from the global model upon leaving. While recent server-side unlearning methods have been developed to remove a participant’s gradients, the effectiveness of these methods in ensuring RTBF remains unverified, a gap this paper addresses.

The authors introduce a novel framework called VERIFI, which integrates federated unlearning with a verification process. This framework allows participants to actively verify the unlearning effect after notifying the server of their departure. The verification process involves two steps: marking, which injects markers to identify the leaving participant’s data, and checking, which assesses changes in the global model’s performance based on these markers. The study evaluates VERIFI across seven datasets and four types of deep learning models, incorporating seven unlearning methods and five verification methods, including newly proposed techniques that enhance efficiency and robustness. The findings contribute to a more trustworthy approach to federated unlearning, providing empirical insights into its effectiveness.

Introduction

The introduction of this research paper discusses the concept of federated learning (FL), a collaborative machine learning approach that enables participants to train models without sharing their private data, thus addressing privacy concerns in sensitive fields such as finance and healthcare. A critical aspect of FL is the “right to be forgotten” (RTBF), as mandated by regulations like the GDPR and CCPA, which allows participants to request the deletion of their data. However, current FL frameworks struggle to facilitate this right effectively, raising concerns about data retention even after participants leave the federation.

To tackle these challenges, the authors introduce the concept of verifiable federated unlearning, emphasizing the need for both unlearning and verification processes. They propose a unified framework called VERIFI, which includes a federated unlearning module and a verification module with distinct marking and checking steps. This framework aims to ensure that participants can verify the deletion of their data in a measurable way, thereby fostering trust in FL systems. The paper also presents a systematic study of various unlearning and verification methods, highlighting the limitations of existing approaches and introducing new, efficient techniques tailored for federated learning environments. The contributions of this work include the design of VERIFI, the identification of effective unlearning and verification methods, and an extensive evaluation across multiple datasets and model architectures.

Methods

In this section, the authors detail the experimental setup used to evaluate their proposed methods across seven diverse datasets. These include two low-resolution image classification datasets (MNIST and CIFAR-10), a speech recognition dataset (SpeechCommand), two high-resolution image datasets for face (VGGFace_mini) and natural object recognition (ImageNet_mini), as well as two medical image datasets for skin cancer (ISIC) and COVID-19 diagnosis. The datasets are characterized by their number of classes, sample sizes, and resolutions, with corresponding model accuracies reported in Table III. For instance, the LeNet-5 model achieved an accuracy of 99.11% on MNIST, while ResNet-18 reached 95.37% on CIFAR-10 and 88.42% on the COVID dataset.

The authors also provide a comprehensive overview of the VERIFI experimental setup, including key hyperparameters such as local and global learning rates, local batch sizes, and the structure of training rounds, as summarized in Table IV. Each dataset’s training data is evenly distributed among participants, ensuring no overlap. The default parameter settings and additional details regarding the experimental conditions are further elaborated in the appendix. Notably, the highlighted hyperparameters in Table IV pertain to early-stage testing scenarios, specifically focusing on the unlearning process.

Discussion

The discussion section of the paper outlines the proposed VERIFI framework, which addresses federated unlearning and verification in the context of Federated Learning (FL). The framework consists of three key modules: an unlearning module, a verification module, and an unlearning-verification mechanism. The authors emphasize the importance of effectively unlearning data associated with a leaving participant while ensuring that the global model’s integrity is maintained. They introduce various unlearning methods, including traditional retraining and gradient subtraction techniques, and propose a novel method called Scale-to-Unlearn ($u_{S2U}$), which adjusts the contributions of local updates to facilitate unlearning without extensive retraining.

The verification process is critical for assessing the effectiveness of unlearning. The authors propose several marking methods, including unique memory markers that leverage the model’s memorization characteristics, to verify the unlearning process. These methods aim to ensure that the model no longer retains information about the deleted data. The discussion highlights the challenges of unlearning verification, particularly in FL, where traditional backdoor techniques may pose security risks. The authors advocate for non-invasive verification methods that do not compromise the privacy of participants, thus contributing to a more secure and efficient federated learning environment.