SIGMA: استنتاج GPT الآمن مع مشاركة أسرار الوظائف SIGMA: Secure GPT Inference with Function Secret Sharing

المجلة: Proceedings on Privacy Enhancing Technologies، المجلد: 2024، العدد: 4
DOI: https://doi.org/10.56553/popets-2024-0107
تاريخ النشر: 2024-07-06
المؤلف: Kanav Gupta وآخرون
الموضوع الرئيسي: الحوسبة العلمية وإدارة البيانات

نظرة عامة

يتناول القسم التقدم في حسابات الطرفين الآمنة (2PC) لاستنتاج التعلم الآلي، مع التركيز بشكل خاص على نماذج المحولات. تواجه طرق الاستنتاج الآمنة التقليدية تحديات تتعلق بالزمن العالي والعبء الاتصالي، والتي تتفاقم في سياق هياكل المحولات. يقدم المؤلفون Sigma، وهو نظام جديد شامل يستفيد من مشاركة الأسرار الوظيفية (FSS) لتعزيز كفاءة بروتوكولات 2PC الآمنة.

تقوم Sigma بتنفيذ بروتوكولات جديدة تعتمد على FSS لوظائف التعلم الآلي المعقدة، بما في ذلك Softmax وGeLU وSiLU، مع تحسين تنفيذها على وحدات معالجة الرسوميات (GPUs). يؤدي ذلك إلى تقليل كبير في الزمن، حيث تحقق تحسينات تتراوح بين 12 إلى 19 مرة مقارنة بالطرق الحالية الرائدة التي تستخدم أيضًا المعالجة المسبقة وتسريع GPU. ومن الجدير بالذكر أن Sigma تظهر بنجاح قدرات الاستنتاج الآمن لنماذج المحولات المدربة مسبقًا التوليدية (GPT)، حيث تنفذ نموذج Llama2 من Meta الذي يحتوي على 13 مليار معلمة في 38 ثانية فقط وGPT2 في 1.5 ثانية، مما يمثل إنجازًا كبيرًا في مجال التعلم الآلي الآمن.

مقدمة

تناقش المقدمة الاتجاه المتزايد لخدمة التنبؤ كخدمة (PaaS) في صناعة البرمجيات، حيث تقوم الشركات بتدريب نماذج التعلم الآلي (ML) على بيانات خاصة وتقدم خدمات استنتاج مدفوعة. يثير هذا النموذج مخاوف كبيرة بشأن الخصوصية للعملاء، خاصة في المجالات الحساسة مثل المالية والرعاية الصحية، حيث يكون كل من العملاء ومقدمي النماذج حذرين من كشف بيانات الإدخال بسبب المسؤوليات القانونية المحتملة. يهدف مجال الاستنتاج الآمن إلى التخفيف من هذه المخاوف من خلال ضمان أن العملاء لا يتعلمون شيئًا عن النموذج بخلاف مخرجات الاستنتاج، بينما لا يحصل مقدمو النماذج على وصول إلى مدخلات العملاء.

على الرغم من الوعد النظري لحسابات الطرفين الآمنة (2PC) لتحقيق استنتاج آمن، فإن التطبيقات العملية واجهت تاريخيًا تحديات في الأداء. أدت التطورات الأخيرة إلى تحسين جدوى أنظمة الاستنتاج الآمن، مما يسمح بتطبيق 2PC على نماذج أكثر تعقيدًا، بما في ذلك تلك التي تحتوي على ملايين المعلمات وحتى نماذج المحولات الكبيرة مثل BERT. تهدف هذه الورقة إلى توسيع هذه القدرات لتشمل نماذج المحولات المدربة مسبقًا التوليدية (GPT) التي تحتوي على مليارات المعلمات. يحدد المؤلفون المتطلبات الأساسية لنظام استنتاج التعلم الآلي الآمن: الدقة، والأمان، والكفاءة، وقابلية التوسع. ينتقدون الأنظمة الحالية مثل THE-X وIron وCrypTen لعدم تلبيتها لهذه المعايير، مشيرين إلى مشكلات مثل الدقة الم compromised، والعبء الاتصالي المفرط، وتدابير الأمان غير الكافية.

نقاش

في هذا القسم، يقدم المؤلفون Sigma 2، وهو نظام جديد مصمم لتعزيز الاستنتاج الآمن لنماذج المحولات، محققًا تحسينات كبيرة في الكفاءة والأداء مقارنة بالطرق الحالية مثل CrypTen. تعمل Sigma ضمن إطار حسابات الطرفين (2PC)، مستفيدة من المعالجة المسبقة وتسريع GPU، مما يؤدي إلى كفاءات زمنية واتصالية أفضل بمقدار كبير مقارنة ببروتوكولات مشاركة الأسرار التقليدية. يحافظ النظام على دقة النموذج أثناء الاستنتاج الآمن من خلال استخدام تقريبات دقيقة للمعادلات غير الخطية المعقدة، مما يمكنه من التوسع إلى نماذج كبيرة مثل GPT التي تحتوي على مليارات المعلمات.

يبرز المؤلفون أنه بينما استفادت الطرق الرائدة السابقة من تسريع GPU، فإنها غالبًا ما تعاني من عبء اتصالي مرتفع بسبب بروتوكولات مشاركة الأسرار، والتي يمكن أن تصبح عنق زجاجة حتى في البيئات ذات النطاق الترددي العالي. تعالج Sigma هذا من خلال تنفيذ بروتوكولات مشاركة الأسرار الوظيفية (FSS) التي تقلل من تكاليف الاتصال بينما تقدم بعض العبء الحسابي. توضح الورقة تطوير بروتوكولات FSS فعالة للعمليات غير الخطية المعقدة مثل GeLU وSiLU وSoftmax، والتي تعتبر حاسمة لهياكل المحولات. ومن الجدير بالذكر أن Sigma تحقق مكاسب أداء كبيرة، حيث تكون أوقات الاستنتاج الآمن لمختلف النماذج، بما في ذلك GPT-Neo وLlama2، أسرع بكثير من الطرق السابقة، مما يظهر إمكانياتها للتطبيقات العملية في التعلم الآلي الآمن.

Journal: Proceedings on Privacy Enhancing Technologies, Volume: 2024, Issue: 4
DOI: https://doi.org/10.56553/popets-2024-0107
Publication Date: 2024-07-06
Author(s): Kanav Gupta et al.
Primary Topic: Scientific Computing and Data Management

Overview

The section discusses the advancements in secure two-party computation (2PC) for machine learning inference, particularly focusing on transformer models. Traditional secure inference methods face challenges related to high latency and communication overheads, which are exacerbated in the context of transformer architectures. The authors introduce Sigma, a novel end-to-end system that leverages function secret sharing (FSS) to enhance the efficiency of secure 2PC protocols.

Sigma implements new FSS-based protocols for complex ML functions, including Softmax, GeLU, and SiLU, while optimizing their execution on GPUs. This results in a significant reduction in latency, achieving improvements of 12 to 19 times compared to existing state-of-the-art methods that also utilize preprocessing and GPU acceleration. Notably, Sigma successfully demonstrates secure inference capabilities for generative pre-trained transformer (GPT) models, executing Meta’s Llama2 with 13 billion parameters in just 38 seconds and GPT2 in 1.5 seconds, marking a significant milestone in the field of secure machine learning.

Introduction

The introduction discusses the growing trend of prediction-as-a-service (PaaS) in the software industry, where companies train machine learning (ML) models on proprietary data and offer paid inference services. This model raises significant privacy concerns for clients, particularly in sensitive domains such as finance and healthcare, where both clients and model providers are wary of exposing input data due to potential legal liabilities. The field of secure inference aims to mitigate these concerns by ensuring that clients learn nothing about the model beyond the inference output, while model providers do not gain access to client inputs.

Despite the theoretical promise of secure two-party computation (2PC) for achieving secure inference, practical implementations have historically faced performance challenges. Recent advancements have improved the feasibility of secure inference systems, allowing for the application of 2PC to increasingly complex models, including those with millions of parameters and even large transformer models like BERT. This paper aims to extend these capabilities to Generative Pretrained Transformer (GPT) models with billions of parameters. The authors outline essential requirements for a secure ML inference system: accuracy, security, efficiency, and scalability. They critique existing systems such as THE-X, Iron, and CrypTen for failing to meet these criteria, highlighting issues like compromised accuracy, excessive communication overhead, and inadequate security measures.

Discussion

In this section, the authors present Sigma 2, a novel system designed to enhance secure inference for transformer-based models, achieving significant improvements in efficiency and performance compared to existing methods like CrypTen. Sigma operates within a two-party computation (2PC) framework, utilizing preprocessing and GPU acceleration, resulting in latency and communication efficiencies that are an order of magnitude better than traditional secret sharing protocols. The system maintains model accuracy during secure inference by employing precise approximations for complex non-linearities, effectively scaling to large models such as GPT with billions of parameters.

The authors highlight that while previous state-of-the-art methods have leveraged GPU acceleration, they often suffer from high communication overheads due to secret sharing protocols, which can become bottlenecks even in high-bandwidth environments. Sigma addresses this by implementing function secret sharing (FSS) protocols that reduce communication costs while introducing some computational overhead. The paper details the development of efficient FSS-based protocols for complex non-linear operations like GeLU, SiLU, and Softmax, which are critical for transformer architectures. Notably, Sigma achieves substantial performance gains, with secure inference times for various models, including GPT-Neo and Llama2, being significantly faster than prior approaches, thus demonstrating its potential for practical applications in secure machine learning.