عندما تلتقي نماذج اللغة الكبيرة بالتخصيص: آفاق التحديات والفرص When large language models meet personalization: perspectives of challenges and opportunities

عربي
English

المجلة: World Wide Web، المجلد: 27، العدد: 4
DOI: https://doi.org/10.1007/s11280-024-01276-1
تاريخ النشر: 2024-06-28

عندما تلتقي نماذج اللغة الكبيرة بالتخصيص: آفاق التحديات والفرص

جين تشينتشنغ ليوشيو هوانغتشنوانغغنغوي جيانغيوانهاويوشوان ليشياولونغ تشينشينغمي وانغكاي تشنغديفو ليان. إنهونغ تشين

تاريخ الاستلام: 22 نوفمبر 2023 / تاريخ المراجعة: 15 مارس 2024 / تاريخ القبول: 14 مايو 2024 /
تم النشر على الإنترنت: 28 يونيو 2024
© المؤلفون 2024

الملخص

يمثل ظهور نماذج اللغة الكبيرة اختراقًا ثوريًا في الذكاء الاصطناعي. مع الحجم غير المسبوق للتدريب ومعلمات النموذج، تحسنت قدرة نماذج اللغة الكبيرة بشكل كبير، مما أدى إلى أداء يشبه البشر في الفهم، وتوليد اللغة، والتفكير السليم، وما إلى ذلك. ستغير هذه القفزة الكبيرة في قدرة الذكاء الاصطناعي العام بشكل جذري نمط كيفية إجراء التخصيص. من ناحية، ستعيد تشكيل طريقة التفاعل بين البشر وأنظمة التخصيص. بدلاً من أن تكون وسيلة سلبية لتصفية المعلومات، مثل أنظمة التوصية التقليدية ومحركات البحث، تقدم نماذج اللغة الكبيرة الأساس لمشاركة نشطة من المستخدمين. على هذا الأساس الجديد، يمكن استكشاف طلبات المستخدمين بشكل استباقي، ويمكن تقديم المعلومات المطلوبة من قبل المستخدمين بطريقة طبيعية وقابلة للتفاعل وقابلة للتفسير. من ناحية أخرى، ستوسع أيضًا نطاق التخصيص بشكل كبير، مما يجعلها تنمو من الوظيفة الوحيدة لجمع المعلومات الشخصية إلى الوظيفة المركبة لتقديم الخدمات الشخصية. من خلال الاستفادة من نماذج اللغة الكبيرة كواجهة متعددة الأغراض، يمكن أن تقوم أنظمة التخصيص بتجميع طلبات المستخدمين في خطط، واستدعاء وظائف الأدوات الخارجية (مثل محركات البحث، الآلات الحاسبة، واجهات برمجة التطبيقات للخدمات، إلخ) لتنفيذ الخطط، ودمج مخرجات الأدوات لإكمال مهام التخصيص من البداية إلى النهاية. اليوم، لا تزال نماذج اللغة الكبيرة تتطور بسرعة، بينما لا يزال التطبيق في التخصيص غير مستكشف إلى حد كبير. لذلك، نعتبر أن الوقت مناسب لمراجعة التحديات في التخصيص والفرص لمعالجتها باستخدام نماذج اللغة الكبيرة. على وجه الخصوص، نكرس هذه الورقة للنقاش حول الجوانب التالية: التطور والتحديات للنظام الحالي للتخصيص، القدرات الجديدة التي ظهرت لنماذج اللغة الكبيرة، والطرق المحتملة للاستفادة من نماذج اللغة الكبيرة للتخصيص.

الكلمات الرئيسية: نماذج اللغة الكبيرة

أنظمة التخصيص

أنظمة التوصية

تعلم الأدوات

AIGC

1 المقدمة

ظهور نماذج اللغة الكبيرة [1]، التي أظهرت تقدمًا ملحوظًا في فهم التعبير البشري، يؤثر بشكل عميق على مجتمع الذكاء الاصطناعي. هذه النماذج، المجهزة بكميات هائلة من البيانات وشبكات عصبية واسعة النطاق، تظهر قدرات مثيرة للإعجاب في فهم اللغة البشرية وتوليد نصوص تشبه نصوصنا. من بين هذه القدرات التفكير [2]، التعلم من عدد قليل من الأمثلة [3]، ودمج المعرفة العالمية الواسعة ضمن النماذج المدربة مسبقًا [1]. يمثل هذا اختراقًا كبيرًا في مجال الذكاء الاصطناعي، مما يؤدي إلى ثورة في تفاعلاتنا مع الآلات. وبالتالي، أصبحت نماذج اللغة الكبيرة لا غنى عنها عبر تطبيقات متنوعة، تتراوح من معالجة اللغة الطبيعية والترجمة الآلية إلى توليد المحتوى الإبداعي وتطوير الدردشة الآلية. لقد حظي إدخال ChatGPT، على وجه الخصوص، باهتمام كبير من المجتمع البشري، مما أثار تأملات حول القوة التحويلية لنماذج اللغة الكبيرة وإمكاناتها لدفع حدود ما يمكن أن تحققه الذكاء الاصطناعي (AI). تحمل هذه التكنولوجيا المدمرة وعدًا بتحويل كيفية تفاعلنا مع الذكاء الاصطناعي واستغلاله في مجالات لا حصر لها، مما يفتح آفاقًا جديدة وفرصًا للابتكار. مع استمرار تقدم هذه النماذج اللغوية وتطورها، من المحتمل أن تشكل مستقبل الذكاء الاصطناعي، مما يمكّننا من استكشاف أراضٍ غير مكتشفة وإطلاق إمكانيات أكبر في التعاون بين الإنسان والآلة.

يعتبر التخصيص، فن تخصيص التجارب وفقًا لتفضيلات الأفراد، رابطًا أساسيًا وديناميكيًا يجسر الفجوة بين البشر والآلات. في عالم اليوم المدفوع بالتكنولوجيا، يلعب التخصيص دورًا محوريًا في تعزيز تفاعلات المستخدمين ومشاركاتهم مع مجموعة متنوعة من المنصات والخدمات الرقمية. من خلال التكيف مع تفضيلات الأفراد، تمكّن أنظمة التخصيص الآلات من تلبية احتياجات كل مستخدم فريدة، مما يؤدي إلى تفاعلات أكثر كفاءة ومتعة. علاوة على ذلك، يتجاوز التخصيص مجرد توصيات المحتوى؛ فهو يشمل جوانب مختلفة من تجارب المستخدمين، بما في ذلك واجهات المستخدم، وأنماط التواصل، وأكثر من ذلك. مع استمرار تقدم الذكاء الاصطناعي، يصبح التخصيص أكثر تعقيدًا في التعامل مع كميات كبيرة من التفاعلات ونوايا المستخدمين المتنوعة. يتطلب ذلك تطوير تقنيات أكثر تقدمًا للتعامل مع السيناريوهات المعقدة وتقديم تجارب أكثر متعة ورضا. إن السعي نحو تحسين التخصيص مدفوع بالرغبة في فهم المستخدمين بشكل أفضل وتلبية احتياجاتهم المتطورة باستمرار. مع تطور التكنولوجيا، من المحتمل أن تستمر أنظمة التخصيص في التطور، مما يخلق في النهاية مستقبلًا حيث تتكامل تفاعلات الإنسان والآلة بسلاسة في كل جانب من جوانب حياتنا، مقدمة تجارب شخصية ومخصصة تعزز الروتين اليومي.

تمتلك نماذج اللغة الكبيرة، بقدراتها العميقة والواسعة، القدرة على إحداث ثورة في أنظمة التخصيص، وتحويل طريقة تفاعل البشر وتوسيع نطاق التخصيص. لم يعد يمكن تصنيف التفاعل بين البشر والآلات ببساطة على أنه نشط وسلبي، تمامًا مثل محركات البحث التقليدية وأنظمة التوصية. ومع ذلك، تتجاوز هذه النماذج الكبيرة مجرد تصفية المعلومات البسيطة وتقدم مجموعة متنوعة من الوظائف الإضافية. على وجه التحديد، سيتم استكشاف نية المستخدم بشكل نشط وشامل، مما يسمح بتواصل أكثر مباشرة وسلاسة بين المستخدمين والأنظمة من خلال اللغة الطبيعية. على عكس التقنيات التقليدية التي تعتمد على تمثيل المعلومات المعتمد على الهوية والذي يكون أقل قابلية للتفسير، تمكّن نماذج اللغة الكبيرة من فهم أعمق لمطالب واهتمامات المستخدمين بدقة. يمهد هذا الفهم الأعمق الطريق لخدمات شخصية عالية الجودة، تلبي احتياجات وتفضيلات المستخدمين بطريقة أكثر دقة وفعالية. علاوة على ذلك، فإن دمج
أدوات متنوعة معزز بشكل كبير بقدرات نماذج اللغة الكبيرة، مما يوسع بشكل كبير من الإمكانيات والسيناريوهات لأنظمة التخصيص. من خلال تحويل متطلبات المستخدمين إلى خطط، بما في ذلك فهمها وتوليدها وتنفيذها، يمكن للمستخدمين الوصول إلى مجموعة متنوعة من المعلومات والخدمات. من المهم أن يظل المستخدمون غير مدركين للتحولات المعقدة التي تحدث خلف الكواليس، حيث يختبرون نموذجًا سلسًا من البداية إلى النهاية. لا تزال إمكانيات نماذج اللغة الكبيرة في التخصيص غير مستكشفة إلى حد كبير.

تتناول هذه الورقة التحديات في التخصيص وتستكشف الحلول المحتملة باستخدام نماذج اللغة الكبيرة. في الأعمال ذات الصلة الموجودة، يقدم LaMP [4] معيارًا جديدًا لتدريب وتقييم نماذج اللغة في إنتاج مخرجات مخصصة لأنظمة استرجاع المعلومات. من ناحية أخرى، تركز دراسات أخرى ذات صلة [5-7] بشكل أساسي على تقنيات التخصيص التقليدية، مثل أنظمة التوصية. من منظور آليات التعلم، يتناول LLM4Rec [5] كل من LLM التمييزي للتوصية وLLM التوليدي للتوصية. فيما يتعلق بتكييف LLM لأنظمة التوصية من حيث “أين” و”كيف”، يركز Li وآخرون [6] على خط الأنابيب العام في مراحل التوصية الصناعية. من ناحية أخرى، يقوم Fan وآخرون [7] بإجراء مراجعة تركز على أساليب ما قبل التدريب، والتعديل الدقيق، والتحفيز. بينما تناقش هذه الأعمال نماذج اللغة المدربة مسبقًا مثل Bert لتسهيل التحليل، فإنها تخصص اهتمامًا محدودًا للقدرات الناشئة لنماذج اللغة الكبيرة. تهدف هذه الورقة إلى سد هذه الفجوة من خلال فحص القدرات الفريدة والقوية لنماذج اللغة الكبيرة في التخصيص، وتوسيع نطاق التخصيص باستخدام الأدوات.

تم تنظيم بقية هذه الدراسة على النحو التالي: نستعرض تخصيص النماذج اللغوية الكبيرة في القسم 2 لمراجعة التطورات والتحديات. ثم نناقش بعناية الفاعلين المحتملين للنماذج اللغوية الكبيرة من أجل التخصيص بدءًا من الاستخدام البسيط للقدرات الناشئة والتكامل المعقد مع أدوات أخرى في القسم 3. كما نناقش التحديات المحتملة عند تكييف النماذج اللغوية الكبيرة للتخصيص. يتم عرض الهيكل الكامل لهذه الورقة في الشكل 1.

الشكل 1 نظرة عامة على هذه الورقة

2 نظرة عامة على الخلفية

2.1 تقنيات التخصيص

التخصيص، فن دقيق يكيف التجربة وفقًا للتفضيلات والاحتياجات الفريدة للمستخدمين الأفراد، أصبح حجر الزاوية في الذكاء الاصطناعي الحديث. في هذا القسم، نستكشف العالم الجذاب لتقنيات التخصيص وتأثيرها العميق على تفاعلات المستخدمين مع أنظمة الذكاء الاصطناعي. سنتناول جانبين رئيسيين من التخصيص: أنظمة التوصية والمساعدة الشخصية. لا تعزز هذه التقنيات رضا المستخدم فحسب، بل تجسد أيضًا تطور الذكاء الاصطناعي، حيث تتكامل الآلات بسلاسة مع حياتنا، وتفهمنا على مستوى عميق. من خلال تخصيص التوصيات، وتقديم المساعدة المخصصة، وتقديم نتائج بحث شخصية، تمتلك أنظمة الذكاء الاصطناعي القدرة على خلق تجربة مستخدم غامرة وفردية حقًا.

2.1.1 أنظمة التوصية

تلعب أنظمة التوصية دورًا محوريًا في التخصيص، حيث تحدث ثورة في الطريقة التي يكتشف بها المستخدمون المحتوى ويتفاعلون معه. تهدف هذه الأنظمة إلى التنبؤ واقتراح عناصر تهم المستخدمين الأفراد، مثل الأفلام أو المنتجات أو المقالات، بناءً على تفاعلاتهم وتفضيلاتهم التاريخية.

فيما يتعلق بتطوير أنظمة التوصية، فقد تطورت بشكل كبير على مر السنين، حيث يُعتبر التصفية التعاونية [8،9] واحدة من أقدم وأهم الأساليب. تعتمد التصفية التعاونية على بيانات تفاعل المستخدمين مع العناصر لتحديد الأنماط وتقديم التوصيات بناءً على المستخدمين الذين لديهم تفضيلات مشابهة. تستخرج الحلول التقليدية، مثل تحليل المصفوفات [10] والأساليب المعتمدة على المستخدم/العنصر [11]، العناصر المثيرة للاهتمام بناءً على الفكرة القائلة بأن المستخدمين الذين أظهروا تفضيلات مشابهة في الماضي من المحتمل أن يكون لديهم تفضيلات مشابهة في المستقبل. على الرغم من فعاليتها، فإن التصفية التعاونية لها قيود، مثل مشكلة “البداية الباردة” للمستخدمين والعناصر الجدد. لمعالجة هذه القيود، ظهرت التصفية المعتمدة على المحتوى [12]، التي تأخذ في الاعتبار محتوى العناصر لتقديم التوصيات. تستفيد من ميزات وخصائص العناصر للعثور على أوجه التشابه وتقديم اقتراحات مخصصة. يمكن تقسيم هذه الميزات إلى معلومات من جانب المستخدم، مثل ملفات تعريف المستخدمين، ومعلومات من جانب العنصر [13، 14]، مثل علامات العناصر وفئات العناصر، ومعلومات قائمة على التفاعل [15]، مثل المراجعات والتعليقات. ومع ذلك، قد تواجه التصفية المعتمدة على المحتوى صعوبة في التقاط تفضيلات المستخدمين المعقدة واكتشاف توصيات متنوعة مقيدة بتمثيلات الميزات المحدودة.

في السنوات الأخيرة، حظي التعلم العميق باهتمام كبير في مجال أنظمة التوصية بسبب قدرته على نمذجة الأنماط المعقدة والتفاعلات في بيانات المستخدمين والعناصر. لقد أظهرت الأساليب المعتمدة على التعلم العميق نتائج واعدة في التقاط المعلومات التسلسلية والزمنية والسياقية، بالإضافة إلى استخراج تمثيلات ذات مغزى من البيانات واسعة النطاق. مع إدخال الشبكات العميقة، يتم التقاط التفاعلات ذات الرتبة العالية بين ميزات المستخدمين والعناصر بشكل جيد لاستخراج اهتمامات المستخدم. تقدم الأساليب المعتمدة على التعلم العميق طرقًا لالتقاط التفاعلات ذات الرتبة العالية من خلال استخدام تقنيات مثل آليات الانتباه والشبكات المعتمدة على الرسوم البيانية لاستخراج العلاقات المعقدة بين المستخدم والعنصر. لقد أظهرت هذه الأساليب أنها تعزز أداء التوصية من خلال مراعاة الاعتمادات ذات الرتبة الأعلى والعلاقات بين العناصر. منطقة أخرى من أنظمة التوصية المعتمدة على التعلم العميق هي أنظمة التوصية التسلسلية، المصممة خصيصًا للتعامل مع التفاعلات التسلسلية بين المستخدمين والعناصر، مثل تسلسلات سلوك المستخدم على مر الزمن. تعتبر آليات الانتباه الذاتية ووحدات التكرار المغلقة (GRUs) خيارات شائعة لـ
نمذجة البيانات التسلسلية في أنظمة التوصية. تتفوق هذه النماذج في التقاط الاعتمادات الزمنية والسياق، مما يجعلها مناسبة تمامًا لمهام مثل توصية العنصر التالي وتوصية قائمة الجلسة. يمكن للنماذج المعتمدة على التسلسل أن تأخذ في الاعتبار الترتيب الذي يتم فيه التفاعل مع العناصر وتتعلم أنماط سلوك المستخدم التي تتطور. علاوة على ذلك، فإن ظهور نماذج اللغة مثل BERT قد تقدم بشكل أكبر أنظمة التوصية من خلال تمكين فهم أفضل لكل من ميزات اللغة الطبيعية وسلوكيات المستخدم التسلسلية. يمكن لهذه النماذج اللغوية أن تلتقط تمثيلات دلالية عميقة ومعرفة بالعالم، مما يثري عملية التوصية ويسهل تقديم توصيات أكثر تخصيصًا ووعيًا بالسياق. بشكل عام، فإن تطبيق تقنيات التعلم العميق في أنظمة التوصية قد فتح آفاقًا جديدة للبحث والابتكار، واعدًا بثورة في مجال التوصيات الشخصية وتعزيز تجارب المستخدمين.

2.1.2 المساعدة الشخصية

تشير مساعدة التخصيص إلى استخدام تقنيات الذكاء الاصطناعي وتعلم الآلة لتخصيص وتكييف التجارب أو المنتجات أو المحتوى بناءً على تفضيلات وسلوك وخصائص الأفراد. من خلال تحليل تفضيلات وسلوك وخصائص الأفراد، تخلق نظامًا بيئيًا مخصصًا يعزز من تفاعل المستخدم ورضاه. على عكس أنظمة التوصية التقليدية، التي تعتمد على التنبؤ باهتمامات المستخدم بشكل سلبي، تتبنى المساعدة الشخصية نهجًا أكثر استباقية. تتجه إلى مجال التنبؤ بنوايا أو أفعال المستخدمين التالية من خلال استخدام المعلومات السياقية، مثل التعليمات التاريخية وإشارات الكلام. يتيح هذا المستوى الأعمق من الفهم للنظام تلبية احتياجات المستخدمين بطريقة أكثر توقعًا وبديهية. في جوهر هذه القدرة تكمن دمج تقنيات متطورة مثل معالجة اللغة الطبيعية ورؤية الكمبيوتر. تمكّن هذه الأدوات المتقدمة النظام من التعرف على نوايا المستخدم وتفسيرها، سواء كانت مُعبرًا عنها من خلال اللغة المنطوقة أو المكتوبة، أو حتى الإشارات البصرية. علاوة على ذلك، يمتد إمكان المساعدة الشخصية إلى ما هو أبعد من التوصيات الثابتة إلى تفاعلات ديناميكية وواعية بالسياق. مع تزايد معرفة النظام بتفضيلات وأنماط المستخدم، يتكيف ويصقل توصياته في الوقت الفعلي، مواكبًا للاحتياجات والتفضيلات المتغيرة باستمرار للمستخدم.

أنظمة التوصية المحادثية [24، 25] تمثل خطوة ملحوظة إلى الأمام في مجال المساعدة الشخصية. من خلال إشراك المستخدمين في محادثات تفاعلية، تتعمق هذه الأنظمة في تفضيلاتهم وتضبط توصياتها وفقًا لذلك. من خلال الاستفادة من قوة فهم اللغة الطبيعية، تقوم هذه الأنظمة التوصية المحادثية بتفسير استفسارات المستخدمين وردودهم بمهارة، مما يؤدي إلى تجربة مستخدم سلسة وجذابة. ومن الأمثلة البارزة على منتجات المساعدة الشخصية، مثل سيري.

ومايكروسوفت كورتانا

لقد أثبتت بالفعل فعاليتها على الأجهزة المحمولة. بالإضافة إلى ذلك، فإن دمج نماذج اللغة الكبيرة مثل ChatGPT يعزز من قدرات الموصيات التفاعلية، مما يعد بتجارب مستخدمين محسّنة بشكل أكبر. مع استمرار تقدم هذه التكنولوجيا، يمكننا توقع أهميتها المتزايدة عبر صناعات متنوعة، بما في ذلك الرعاية الصحية والتعليم والمالية، وغيرها. بينما يعد نمو الموصيات التفاعلية والمساعدة الشخصية بفوائد هائلة، من الضروري تطوير هذه المنتجات بشكل مسؤول. إن الحفاظ على خصوصية المستخدم وضمان ممارسات شفافة في معالجة البيانات أمران أساسيان للحفاظ على ثقة المستخدم وحماية المعلومات الحساسة.

2.2 نماذج اللغة الكبيرة

تقوم نماذج اللغة بعمليات النمذجة الاحتمالية لتوليد اللغة الطبيعية، أي أنه عند تقديم سياق محدد، تقوم نماذج اللغة بعمل توقعات للكلمات التي سيتم توليدها في الخطوات المستقبلية. في الوقت الحاضر، تُبنى نماذج اللغة في الغالب على الشبكات العصبية العميقة، حيث يجب التأكيد على ميزتين. أولاً، تعتمد الغالبية العظمى من نماذج اللغة على المحولات أو تبايناتها القريبة. هذه الأنواع من الشبكات العصبية بارعة في نمذجة الاعتماد على السياق داخل اللغات الطبيعية وتظهر أداءً متفوقًا وتحسنًا مستمرًا عند توسيع نطاقها. ثانيًا، يتم تدريب نماذج اللغة مسبقًا على نطاق واسع باستخدام كمية هائلة من النصوص غير المصنفة. يتم ضبط النماذج المدربة مسبقًا بشكل إضافي باستخدام بيانات موجهة للمهام لتتكيف مع تطبيقات مختلفة في الأسفل.

لقد تحقق تقدم هائل في نماذج اللغة في السنوات الأخيرة، حيث يمثل ظهور نماذج اللغة الكبيرة، الممثلة بـ GPT-3 [3]، علامة فارقة مهمة لمجتمع الذكاء الاصطناعي بأسره. نماذج اللغة الكبيرة (LLMs)، كما يوحي الاسم، هي مشتقات موسعة بشكل كبير من نماذج اللغة التقليدية. على وجه الخصوص، تم تضخيم الشبكات الأساسية وبيانات التدريب بشكل كبير. من ناحية، على الرغم من عدم وجود معايير محددة للحد الأدنى من العدد، فإن نموذج LLM النموذجي يتكون عادة من عدة مليارات ولا يقل عن تريليونات من معلمات النموذج، وهي أعداد أكبر بكثير مما كانت عليه من قبل [33]. من ناحية أخرى، يتم إجراء التدريب المسبق بناءً على مجموعة بيانات غير خاضعة للإشراف، مع مئات المليارات أو تريليونات من الرموز التي تم تصفيتها بعناية من مصادر مثل Common Crawl وGitHub وWikipedia وBooks وArXiv، إلخ [1]. يتم توضيح تأثير التوسع من خلال قوانين التوسع [34، 35]، التي تكشف رقميًا عن العلاقة ذات القوة بين حجم النموذج وحجم البيانات ونطاق التدريب ونمو أداء النموذج.

إن توسيع الشبكة وبيانات التدريب يؤديان إلى قفزة نوعية في قدرة نماذج اللغة الكبيرة. فهي لا تصبح أكثر كفاءة في المهارات التقليدية، مثل فهم نوايا الناس وتوليد لغات تشبه اللغة البشرية فحسب، بل أيضًا في القدرات المعالجة التي نادرًا ما تظهرها النماذج الأصغر. يُشار إلى هذه الظاهرة باسم القدرات الناشئة لنماذج اللغة الكبيرة، حيث يتم مناقشة ثلاث قدرات تمثيلية بشكل متكرر. واحدة منها هي قدرة التعلم في السياق، حيث يمكن لنماذج اللغة الكبيرة أن تتعلم بسرعة من الأمثلة القليلة المقدمة في الموجه. والأخرى هي قدرة اتباع التعليمات. بعد أن يتم ضبطها بدقة مع مهام متنوعة في شكل ضبط التعليمات، تصبح نماذج اللغة الكبيرة قادرة على اتباع تعليمات البشر. وبالتالي، يمكنها التعامل مع مهام مختلفة تُعرض بطريقة عشوائية. وأخيرًا وليس آخرًا، وُجد أن نماذج اللغة الكبيرة قادرة على إجراء استدلال خطوة بخطوة. مع بعض استراتيجيات التوجيه، مثل سلسلة الأفكار، يمكن لنماذج اللغة الكبيرة الاقتراب بشكل تكراري من الإجابة النهائية لبعض المهام المعقدة، مثل مسائل الرياضيات الكلامية، من خلال تقسيم المهام إلى مشكلات فرعية واكتشاف الإجابات المتوسطة المحتملة لكل من المشكلات الفرعية.

بفضل القدرات المتفوقة على الفهم والاستدلال والتوليد، تُعتبر نماذج اللغة الكبيرة، وخاصة نماذج الدردشة الناتجة عن ضبط التعليمات، ككتل أساسية للعديد من خدمات التخصيص. أحد السيناريوهات المباشرة هو البحث التفاعلي والتوصيات. بمجرد أن تُبنى على نماذج اللغة الكبيرة، ستتمكن أنظمة البحث والتوصية من التفاعل مع المستخدمين من خلال التفاعلات، وتقديم المخرجات بطريقة لفظية وقابلة للتفسير، واستقبال الملاحظات من المستخدم وإجراء التعديلات بناءً على تلك الملاحظات، وما إلى ذلك. ستؤدي التغييرات المذكورة أعلاه إلى إحداث تحول جذري في خدمات التخصيص، من مجرد إجراء البحث والتوصية بشكل سلبي، إلى اكتشاف احتياجات المستخدم بشكل استباقي والبحث عن العناصر المفضلة لدى المستخدم. في نطاقات أوسع، قد تتجاوز نماذج اللغة الكبيرة مجرد إجراء البحث والتوصية المخصصة، بل تلعب دور المساعدين الشخصيين لمساعدة المستخدمين في إكمال مهامهم. قد تأخذ نماذج اللغة الكبيرة
ملاحظات حول المعلومات المهمة للمستخدمين داخل ذاكرتهم، وضع خطط مخصصة بناءً على المعلومات المحفوظة عند ظهور مطالب جديدة، وتنفيذ الخطط من خلال الاستفادة من أدوات مثل محركات البحث وأنظمة التوصية.

ومع ذلك، يجب علينا مواجهة الواقع بأن تطبيق نماذج اللغة الكبيرة (LLMs) من أجل التخصيص ليس مشكلة بسيطة. لنذكر بعض التحديات المفتوحة. أولاً، يتطلب التخصيص فهم تفضيلات المستخدم، وهو ما يتطلب معرفة أكثر تخصصًا بدلاً من المعرفة العامة التي تتعلمها نماذج اللغة الكبيرة. لا يزال التكيف الفعال والفعال لنماذج اللغة الكبيرة للخدمات المخصصة بحاجة إلى حل. بالإضافة إلى ذلك، يمكن أن تتذكر نماذج اللغة الكبيرة المعلومات السرية للمستخدم أثناء تقديم الخدمات المخصصة. وبالتالي، يثير ذلك مخاوف بشأن حماية الخصوصية. تم تعلم نماذج اللغة الكبيرة من بيانات الإنترنت؛ وبسبب انحياز التعرض، من شبه المؤكد أن تصدر توقعات غير عادلة للأقليات. لمعالجة التحديات المذكورة أعلاه، تحتاج المجتمعات البحثية إلى معايير ومجموعات بيانات تقييم. ومع ذلك، فإن هذه الموارد بعيدة عن الاكتمال في الوقت الحالي. لدعم التخصيص بالكامل باستخدام نماذج اللغة الكبيرة، يجب إنشاء أطر منهجية وتجريبية بشكل منهجي لجميع هذه الجوانب.

3 نماذج لغوية كبيرة للتخصيص

في الأقسام التالية، نتناول إمكانيات نماذج اللغة الكبيرة للتخصيص، مستعرضين تطورها من حالات الاستخدام البسيطة، مثل استخدام معرفة الكلمات كميزات، إلى تكامل أكثر تعقيدًا مع وحدات أدوات أخرى لتعمل كعملاء. نركز بشكل خاص على تقدم القدرات الناشئة، بدءًا من المعرفة الأساسية بالعالم وفهم نية المستخدم، والتقدم إلى قدرات التفكير عالية المستوى. نستكشف كيف يمكن أن تسهم نماذج اللغة الكبيرة في بناء قاعدة معرفة تعزز المعرفة العامة حول عناصر متنوعة. بالإضافة إلى ذلك، نناقش كيف يمكن أن تمكّن قدرة الفهم لنماذج اللغة الكبيرة المفسرين والمفسرين للمحتوى من إجراء تحليل متعمق للتفاعلات. علاوة على ذلك، نلاحظ محاولات للاستفادة من قدرة التفكير لنماذج اللغة الكبيرة لمقدمي الأنظمة لتقديم نتائج التوصية. هذه القدرات المتزايدة التعقيد تمكّن الاستخدام المعقد لنماذج اللغة الكبيرة مع وحدات أدوات أخرى، مما يمكّنها من فهم نوايا المستخدمين بشكل أفضل وتنفيذ تعليمات المستخدم. وبالتالي، نستكشف أيضًا تكامل نماذج اللغة الكبيرة مع أدوات أخرى للتخصيص، بما في ذلك تعلم الأدوات، والعملاء المحادثين ومبدعي المحتوى المخصص. يتم تصوير نظرة عامة هذا الفصل في الشكل 2. تهدف دراستنا الشاملة إلى تقديم فهم أعمق للمشهد الحالي، مسلطة الضوء على الفرص والتحديات المرتبطة بإدماج نماذج اللغة الكبيرة في التخصيص.

الشكل 2 نظرة عامة على نماذج اللغة الكبيرة للتخصيص

4 نماذج اللغة الكبيرة كقاعدة معرفة

توفر قاعدة المعرفة معلومات غنية مع دلالات، مما يجذب اهتمامًا متزايدًا لاستخدام قاعدة المعرفة في أنظمة التوصية. بشكل خاص، تعتبر الرسوم البيانية للمعرفة، حيث تمثل العقد الكيانات وتمثل الحواف العلاقات في الرسم البياني للمعلومات المتنوعة، هي الشكل الشائع لقاعدة المعرفة وتُقدم كمعلومات جانبية لتعزيز أداء الموصين. تساعد الرسوم البيانية للمعرفة في فهم العلاقات المتبادلة بين المستخدمين والعناصر وتوفر أيضًا تفسيرًا أفضل للموصين. يمكن تصنيف الطرق الحالية التي تدمج الرسوم البيانية للمعرفة في أنظمة التوصية إلى ثلاث مجموعات رئيسية: طرق قائمة على التضمين، وطرق قائمة على المسارات، وطرق موحدة. تستخدم الطرق القائمة على التضمين، مثل CKE [14] وDKN [41] وKSR [42] وSHINE [43]، تمثيلات دلالية للمستخدمين والعناصر. تهدف هذه الطرق إلى التقاط العلاقات الدلالية الأساسية بين الكيانات في الرسم البياني للمعرفة، مما يمكن أن يحسن جودة التوصيات. تستغل الأساليب القائمة على المسارات، مثل Hete-MF [44] وSemRec [45] وRuleRec [46] وEIUM [47]، معلومات الاتصال الدلالي الموجودة في الرسم البياني للمعرفة لتنظيم تمثيلات المستخدم والعنصر. تأخذ هذه الطرق في الاعتبار المسارات بين المستخدمين والعناصر في الرسم البياني وتستفيد منها لإدخال القابلية للتفسير في عملية التوصية. تقوم الطرق الموحدة، مثل RippleNet [48] وKGCN [49] وKGAT [50] وAKUPM [51] وIntentGC [52]، بتنقيح تمثيلات الكيانات في الرسم البياني للمعرفة من خلال الاستفادة من تقنيات انتشار التضمين. تقوم هذه الطرق بنشر تضمينات الكيانات عبر هيكل الرسم البياني، مما يسمح بتدفق المعلومات عبر الكيانات المتصلة وتنقيح التمثيلات وفقًا لذلك.

ومع ذلك، فإن الرسوم البيانية للمعرفة المعتمدة في أنظمة التوصية محدودة وذات قابلية استخدام منخفضة. عند مراجعة مجموعات بيانات الرسوم البيانية للمعرفة المختلفة لأنظمة التوصية، التي تغطي مجالات الأفلام والكتب والأخبار والمنتجات، وما إلى ذلك، لا تزال هذه المجموعات نادرة بشكل كبير مقارنةً بالكم الهائل من المعرفة البشرية، وخاصة نقص الحقائق، بسبب التكلفة الباهظة للإشراف على بناء الرسم البياني للمعرفة. سيكون بناء رسم بياني شامل ودقيق للمعرفة مهمة معقدة ومرهقة للموارد، والتي ستشمل جمع البيانات، والتكامل، والتنظيف لضمان جودة البيانات وتناسقها. مقيدًا بتكلفة الوسم الباهظة للرسوم البيانية للمعرفة، عادة ما توجد كيانات أو علاقات مفقودة. قد يتم تجاهل تفضيلات المستخدم لهذه الكيانات أو المسارات، مما يؤثر سلبًا على أداء التوصية.

لقد تم مناقشة قدرة نماذج اللغة الكبيرة على استرجاع المعرفة الواقعية كقواعد معرفة صريحة [53-56، 56-61]، مما يقدم فرصة لبناء رسوم بيانية للمعرفة أكثر شمولاً داخل أنظمة التوصية. بالعودة إلى العمل [53]، أظهرت نماذج اللغة الكبيرة قوتها المذهلة في تخزين المعلومات الواقعية، مثل الكيانات والمعرفة العامة، ثم يمكن نقل المعرفة العامة بشكل موثوق إلى المهام الأساسية. الطرق الحالية في الرسوم البيانية للمعرفة لا تكفي للتعامل مع الرسوم البيانية الناقصة [62] وبناء الرسوم البيانية للمعرفة باستخدام نصوص [63] ويحاول العديد من الباحثين الاستفادة من قوة نماذج اللغة الكبيرة لحل المهمتين، أي إكمال المعرفة [64] وبناء المعرفة [65]. بالنسبة لإكمال الرسم البياني للمعرفة، والذي يشير إلى مهمة الحقائق المفقودة في الرسم البياني المعطى، تم بذل جهود حديثة لترميز النص أو توليد الحقائق للرسوم البيانية للمعرفة. يقوم MTL-KGC [66] بترميز تسلسلات النص للتنبؤ بإمكانية وجود الأزواج. يتنبأ MEMKGC [67] بالكيانات المفقودة من الثلاثية. يستخدم StAR [68] مشفرات نصية سيامية لترميز الكيانات بشكل منفصل. يستخدم GenKGC [69] نماذج اللغة التي تعتمد على فك التشفير لتوليد الكيان النهائي مباشرة. يقوم TagReal [70] بتوليد مطالبات عالية الجودة من نصوص خارجية. يعتمد AutoKG [63] مباشرة على نماذج اللغة الكبيرة، مثل ChatGPT وGPT-4، ويصمم
مطالبات مخصصة للتنبؤ بالكيان النهائي. أما بالنسبة للمهمة المهمة الأخرى، أي بناء الرسم البياني للمعرفة، والتي تشير إلى إنشاء تمثيل منظم للمعرفة، يمكن تطبيق نماذج اللغة الكبيرة في عملية بناء الرسوم البيانية للمعرفة، بما في ذلك اكتشاف الكيانات [71، 72]، وحل الإشارة المرجعية [73، 74] واستخراج العلاقات [75، 76]. يمكن أن تحقق نماذج اللغة الكبيرة أيضًا البناء من البداية إلى النهاية [

] لبناء الرسوم البيانية للمعرفة مباشرة من النص الخام. تمكّن نماذج اللغة الكبيرة من تقطير المعرفة لبناء الرسوم البيانية للمعرفة. يقوم symbolic-kg [79] بتقطير الحقائق العامة من GPT3 ثم يقوم بتعديل نموذج الطالب الصغير لتوليد الرسوم البيانية للمعرفة. لقد أظهرت هذه النماذج القدرة على تخزين كميات كبيرة من المعرفة، مما يوفر خيارًا قابلاً للتطبيق لتحسين نطاق وعمق الرسوم البيانية للمعرفة. علاوة على ذلك، دفعت هذه التطورات البحث في النقل المباشر للمعرفة المخزنة من نماذج اللغة الكبيرة إلى الرسوم البيانية للمعرفة، مما يلغي الحاجة إلى الإشراف البشري. يسلط هذا البحث المثير الضوء على إمكانيات أتمتة إكمال الرسوم البيانية للمعرفة باستخدام نماذج اللغة الكبيرة المتطورة.

من خلال الاستفادة من قدرات نماذج اللغة الكبيرة، ستستفيد أنظمة التوصية من قاعدة معرفة أكثر شمولاً وتحديثًا. أولاً، يمكن استكمال المعلومات المفقودة عن الكلية لبناء رسوم بيانية معرفية أكثر شمولاً وبالتالي يمكن استخراج العلاقات بين الكيانات لتحسين التوصيات. ثانيًا، على عكس البيانات السابقة التي كانت محصورة في نطاق معين، تحتوي نماذج اللغة الكبيرة نفسها على الكثير من المعلومات عبر المجالات التي يمكن أن تساعد في تحقيق توصيات عبر المجالات، مثل التوصية بأفلام مناسبة بناءً على الأغاني المفضلة للمستخدم. باختصار، يمكن الاستفادة من المعرفة المخزنة لتعزيز دقة التوصيات وملاءمتها وتخصيصها، مما يحسن في النهاية الأداء العام لأنظمة التوصية. العمل الحالي [80] يحث نماذج اللغة الكبيرة على توليد معرفة واقعية حول الأفلام لتعزيز أداء نماذج توقع CTR. للاستفادة بشكل أفضل من المعرفة الواقعية، تم اعتماد وحدة تكيف المعرفة لاستخراج معلومات سياقية أفضل. تعتمد LLMRec [81] استراتيجيات تعزيز الرسوم البيانية المعتمدة على LLM لإثراء معلومات العناصر وتصميم طرق إزالة الضوضاء لضمان التعزيز. تعتمد LLM-KRec [82] نماذج اللغة الكبيرة لتحديد العلاقات التكميلية في كل زوج من الكيانات وبناء رسم بياني معرفي تكميلي، مما يعزز من توصيات الصناعة. يتم تلخيص العمل الحالي الذي يستفيد من قدرات قاعدة المعرفة في التخصيص في الجدول 1.

من الجدير بالذكر أن مشكلة الشبح في نماذج اللغة الكبيرة يمكن أن تكون تحديًا عند تطبيقها على مهام التوصية. يمكن أن تؤدي الطبيعة الكامنة لنماذج اللغة الكبيرة إلى الغموض أو عدم دقة المصدر [83]. يمكن أن تظهر هذه المشكلة كإدخال معلومات زائدة أو حتى ضوضاء في عملية التوصية. قد تولد نماذج اللغة الكبيرة استجابات، رغم أنها صحيحة نحويًا، تفتقر إلى السياق المعلوماتي أو الملاءمة. وفقًا لـ KoLA [84]، وهو معيار لتقييم معرفة الكلمات لنماذج LLM، فإن أفضل نموذج GPT4 يحقق فقط 0.012 في الدقة و0.013 في الاسترجاع في مهمة التعرف على الكيانات المسماة، وهو ما يقل بكثير عن أداء النماذج الخاصة بالمهام (0.712 في الدقة و0.706 في الاسترجاع) لنموذج PL-Marker [85]. تشير هذه النتيجة إلى أن الفطرة السليمة لا تزال بعيدة عن أن يتم التقاطها بشكل كافٍ بواسطة LLM.

5 نماذج اللغة الكبيرة كمفسر للمحتوى

توفر أنظمة التوصية المعتمدة على المحتوى حلاً فعالًا للتخفيف من مشكلة التغذية الراجعة النادرة في أنظمة التوصية. من خلال الاستفادة من الخصائص والسمات للعناصر، تحقق هذه الأنظمة فهمًا أعمق لخصائصها، مما يسهل دقة

الجدول 1 نماذج اللغة الكبيرة كقاعدة معرفة

النهج	المعرفة	المهمة	العمود الفقري لنموذج اللغة الكبيرة	مجموعات البيانات
KAR [80]	معرفة واقعية	توقع CTR وإعادة الترتيب	gpt-3.5-turbo	Ml-1M، كتاب أمازون
LLMRec [81]	تفاعلات المستخدم-العنصر، معلومات جانبية	توصية TopK	gpt-3.5-turbo	ML، نتفليكس
KRec [82]	علاقة تكميلية	توقع CTR	ChatGPT 3.5، ChatGLM 2	بيانات Alipay

التوافق مع تفضيلات المستخدم. ومع ذلك، قد تظهر ميزات المحتوى المستخدمة في التوصية المعتمدة على المحتوى أيضًا ندرة. الاعتماد فقط على إشارة الإشراف الموصى بها، مثل النقر والتصفح، قد لا يستغل بالكامل الفوائد المحتملة لهذه الميزات. للتغلب على هذا التحدي، تظهر نماذج اللغة كنماذج أساسية قوية تعمل كمفسرين للمحتوى في معالجة الميزات النصية. تعزز استخدامها فعالية أنظمة التوصية من خلال فهم وتفسير المحتوى النصي بشكل فعال، مما يؤدي إلى تحسين التوصيات.

5.1 مفسر المحتوى التقليدي

يتضمن مفسر المحتوى التقليدي نموذجًا إحصائيًا، وشبكات عصبية، وشبكات NLP متقدمة، كما هو ملخص في الشكل 3. تركز هذه الأساليب بشكل أساسي على تحويل معلومات المحتوى، مثل البيانات النصية، إلى تمثيلات ميزات لتسهيل عملية التوصية.

تم استخدام نماذج إحصائية مثل TF-IDF، وطول الوصف الأدنى (MDL) [86]، وكيس الكلمات تقليديًا لترميز البيانات النصية مثل المقالات الإخبارية والمستندات إلى متجهات قيم مستمرة. ومع ذلك، مع تقدم تقنيات التعلم العميق، استكشف الباحثون هياكل شبكات عصبية متنوعة لتعلم تمثيلات محتوى أكثر تعبيرًا. بدلاً من الاعتماد فقط على التمثيلات الإحصائية، تقوم بعض الأساليب بتهيئة المتجهات بتمثيلات كيس الكلمات ثم تستخدم نماذج قائمة على التشفير الذاتي لتعلم تمثيلات أكثر قوة. على سبيل المثال، يجمع CDL [16] بين المتجهات الكامنة المستمدة من التشفير الذاتي مع تمثيلات ID الأصلية لتعزيز تمثيلات المحتوى. يقدم CRAE [87] تشفيرًا تكراريًا تعاونيًا يلتقط ترتيب الكلمات في النصوص، مما يمكّن من نمذجة تسلسلات المحتوى في سيناريوهات التصفية التعاونية. يقترح Dong et al. [88] تشفيرًا تكراريًا مزيلًا للضوضاء يقوم بإعادة بناء تقييمات العناصر/المستخدمين والمعلومات النصية في وقت واحد، مما يسمح بالنمذجة المشتركة للمعرفة التعاونية والنصية. يقدم CVAE [89] تشفيرًا تكراريًا تعاونيًا متغيرًا يتعلم ميزات نصية احتمالية. بينما تكون التشفيرات الذاتية فعالة في تعلم تمثيلات منخفضة الأبعاد من بيانات النص، قد تواجه صعوبة في التقاط المعلومات الدلالية بشكل فعال [90]. في بعض الحالات، تُستخدم أساليب مثل doc2vec [91] لبناء تمثيلات المحتوى [92، 93] وتعلم التمثيلات المخفية. يقيم Okura et al. [94] هياكل شبكية مختلفة، بما في ذلك نماذج الكلمات وشبكات GRU، لتمثيل حالات المستخدم.

الشكل 3 تطور مفسر المحتوى في التوصية

في أعقاب التقدم في نماذج معالجة اللغة الطبيعية العصبية (NLP)، تم استخدام هياكل أكثر تعقيدًا مثل الشبكات العصبية التلافيفية (CNNs)، والشبكات العصبية التكرارية (RNNs)، ونماذج الانتباه العصبي كمفسرين للمحتوى لاستخراج المعلومات السياقية والتقاط تفضيلات المستخدم. تأخذ هذه النماذج مدخلات جمل، مثل عناوين الأخبار، والمراجعات، أو التعليقات، وتحولها إلى مصفوفات تمثيل الكلمات باستخدام التهيئة العشوائية أو تمثيلات word2vec [95]. يتم استخدام هياكل متنوعة، بما في ذلك CNNs، وشبكات الانتباه، والنماذج الهجينة، لتعلم تمثيلات الجمل. على سبيل المثال، تدمج NPA [96] وLSTUR [97] آليات الانتباه لتحديد أهمية الكلمات بعد طبقات CNN. تستخدم NRMS [98] وCPRS [99] شبكات الانتباه الذاتي متعددة الرؤوس لتعلم تمثيلات الكلمات. هذه النماذج فعالة في التقاط الاعتماديات الطويلة في السياق وفهم المعلومات الدلالية في النص. بالإضافة إلى نمذجة النص، تُستخدم نماذج اللغة أيضًا كمفسرين للمحتوى لالتقاط اهتمامات المستخدم بناءً على تفاعلاتهم التاريخية. على سبيل المثال، تستخدم WE3CN [100] شبكة CNN ثلاثية الأبعاد لاستخراج ميزات زمنية من البيانات التاريخية. تستخدم DKN [41] آلية انتباه لتجميع المعلومات التاريخية المتعلقة بالعناصر المرشحة. يقترح DAN [101] نموذج LSTM قائم على الانتباه لالتقاط ميزات تسلسل مخفية أغنى. تستفيد هذه النماذج من هياكل عصبية مختلفة لتعزيز تمثيل النص في سياق أنظمة التوصية. من الجدير بالذكر أن هذه النماذج لا تزال لديها قيود من حيث العمق والقدرة على تعميم المعلومات الدلالية بشكل فعال.

5.2 مفسر المحتوى المعتمد على نموذج اللغة

في السنوات الأخيرة، كان هناك اهتمام متزايد في دمج نماذج اللغة المدربة مسبقًا الأكثر قوة، مثل BERT وGPT، في أنظمة التوصية. لقد أظهرت هذه النماذج أداءً استثنائيًا في مهام معالجة اللغة الطبيعية المختلفة وألهمت الباحثين للاستفادة منها لالتقاط تمثيلات دلالية عميقة ودمج المعرفة العالمية في أنظمة التوصية. ومع ذلك، فإن تطبيق نماذج اللغة المدربة مسبقًا على مهام التوصية يقدم تحديين رئيسيين. أولاً، هناك عدم توافق في الأهداف بين نماذج اللغة العامة والأهداف المحددة لأنظمة التوصية. لمعالجة ذلك، اقترح الباحثون أساليب تقوم بضبط النماذج المدربة مسبقًا أو تصميم مهام تدريب مسبق محددة للمهام لتكييفها مع مهام التوصية. على سبيل المثال، يستخدم U-BERT [102] BERT كمفسر للمحتوى ويقدم توقعات رموز الرأي المخفية وتوقعات تقييم الرأي كمهام تدريب مسبق لتوافق BERT بشكل أفضل مع أهداف التوصية. وبالمثل، استخدمت أعمال أخرى [103-107] BERT المدرب مسبقًا لتهيئة مشفر الأخبار لتوصية الأخبار، مما يعزز تمثيل الميزات النصية. كما تم استخدام النموذج المدرب مسبقًا، ERNIE، لتعزيز قدرة التمثيل للاستفسارات والمستندات [108، 109]. التحدي الثاني هو تقليل زمن الاستدلال عبر الإنترنت الناتج عن نماذج اللغة المدربة مسبقًا، والتي يمكن أن تكون مكلفة حسابيًا. استكشف الباحثون تقنيات مثل تقطير المعرفة وتحسين النموذج للحصول على نماذج خفيفة وفعالة مناسبة للخدمات عبر الإنترنت. على سبيل المثال، يستخدم CTR-BERT [110] تقطير المعرفة للحصول على نموذج صديق للذاكرة لتوقع معدل النقر، مما يعالج مشكلة زمن الاستجابة.

علاوة على ذلك، تم تطبيق نماذج اللغة المدربة مسبقًا خارج مهام التوصية السائدة. لقد تم دمجها في سيناريوهات توصية متنوعة، بما في ذلك توصية العلامات [111]، تمثيلات التغريدات [112]، وتوصية أمثلة الشيفرة [113]، لتعزيز تمثيل الميزات النصية في تلك المجالات المحددة.

بالإضافة إلى ذلك، استكشفت بعض الأعمال الحديثة [114-117] استخدام الميزات النصية فقط كمدخلات لنماذج التوصية، مستفيدة من نماذج اللغة المدربة مسبقًا لتخفيف مشاكل البداية الباردة وتمكين التوصيات عبر المجالات. تقدم هذه الفكرة مزايا في تخفيف مشاكل البداية الباردة وتسهيل التوصيات عبر المجالات بناءً على عمومية اللغة الطبيعية. يستخدم ZESREC [114] BERT للحصول على تمثيلات مستمرة عالمية لوصف العناصر لتوصية بدون تدريب. يركز Unisrec [115] على توصية تسلسلية عبر المجالات ويستخدم وحدة معززة خفيفة الوزن لتضمين تمثيل BERT الثابت في مهمة التوصية. يقوم VQ-Rec [116] بمزيد من توافق تضمينات النص التي تنتجها نماذج اللغة المدربة مسبقًا مع مهمة التوصية بمساعدة التكميم المتجه. يستكشف Fu وآخرون [118] ضبط المحولات على مستوى الطبقات لتحقيق توصيات قابلة للنقل بكفاءة في المعلمات.

بينما تمكّن نماذج اللغة المدربة مسبقًا فهم النص مع فائدة التقاط المعرفة العالمية أولاً، فإن تطوير نماذج اللغة الكبيرة المدربة مسبقًا يوفر قدرة طوارئ كبيرة في مجالات الاستدلال والتعميم، كما هو موضح في الجدول 2. يستكشف TALLRe [119] قدرة نماذج اللغة الكبيرة على التوصية التسلسلية. يلاحظون أن نماذج اللغة الأصلية تؤدي بشكل ضعيف في سيناريوهات عدم التدريب وندرة التدريب، بينما تظهر نماذج اللغة المعدلة وفقًا للتوصية أداءً متفوقًا في التعلم القليل والتعميم عبر المجالات. يتبنى Li وآخرون [123] نماذج اللغة الكبيرة كمشفرات سياقية ويحاولون فحص الحدود العليا لنماذج اللغة الكبيرة. وبالمثل، يقترح Kang وآخرون [120] طريقة ضبط تعليمات مماثلة لمهام توقع التقييم بناءً على هيكل T5. يجدون أن نماذج اللغة المعدلة، التي تستفيد من كفاءة البيانات، تتفوق على الموصين التقليديين. يعزز PALR [121] مزيدًا من بناء خط أنابيب ضبط التعليمات المحددة للتوصية، الذي يستخدم أولاً نماذج اللغة الكبيرة لتوليد الاستدلال كميزات إضافية بناءً على تاريخ سلوك المستخدم. بعد ذلك، يتم استرجاع مجموعة صغيرة من المرشحين باستخدام أي نموذج موجود بناءً على ملف تعريف المستخدم. أخيرًا، لتكييف نماذج اللغة العامة مع مهمة التوصية، يقومون بتحويل ميزات الاستدلال المولدة، وتاريخ تفاعل المستخدم، والمرشحين المسترجعين إلى بيانات تعليمات باللغة الطبيعية وضبط نموذج اللغة. تركز طرق ضبط التعليمات الحالية لنماذج اللغة لسيناريوهات التوصية عادةً على نوع واحد فقط من مهام التوصية، مما يحد من الاستخدام الكامل لقدرة التعميم القوية لنماذج اللغة. يعالج InstructRec [122] هذا القيد من خلال صياغة التوصية كإجراء اتباع التعليمات. يقومون بتصميم قوالب تعليمات متنوعة لاستيعاب مهام التوصية المختلفة ويستخدمون GPT-3.5 لتوليد بيانات تعليمات عالية الجودة بناءً على البيانات التاريخية للمستخدمين والقوالب. يمكن لنماذج اللغة التي تم ضبطها بدقة باستخدام بيانات التعليمات هذه التعامل بفعالية مع مجموعة واسعة من مهام التوصية وتلبية متطلبات المعلومات المتنوعة من مستخدمين مختلفين.

6 نماذج اللغة الكبيرة كمفسر

بالإضافة إلى تقدير الاقتراحات المقدمة من نموذج التوصية، يهتم المستخدمون أيضًا بالتبريرات المفهومة لهذه التوصيات [124، 125]. هذا أمر حاسم حيث أن معظم أنظمة التوصية هي صناديق سوداء لا يمكن فهم آلياتها الداخلية من قبل البشر [126]، مما يقلل من ثقة المستخدم. على سبيل المثال، في توصيات الأدوية، من غير المقبول التوصية بأدوية ذات تأثيرات علاجية جيدة ببساطة ولكن الفشل في تقديم أسباب فعاليتها. لتحقيق هذه الغاية، تهدف التوصيات القابلة للتفسير إلى ربط الاقتراحات عالية الجودة بتفسيرات متاحة. هذا لا يساعد فقط في تحسين

الجدول 2 نماذج اللغة الكبيرة لمفسر المحتوى

النهج	المهمة	هيكل نموذج اللغة	استراتيجية الضبط	المجموعات البيانية
TALLRe [119]	التوصية التسلسلية	LLaMA-7B	ضبط التعليمات & الضبط الدقيق	MovieLens100k، BookCrossing
LLMs-Rec [120]	توقع التقييم	Flan-T5-Base، Flan-T5-XXL	الضبط الدقيق	MovieLens-1M، كتاب أمازون
PALR [121]	توصية العناصر	LLaMa-7B	ضبط التعليمات	MovieLens-1M، جمال أمازون
InstructRec [122]	التوصية التسلسلية البحث المخصص	Flan-T5-XL	ضبط التعليمات	Amazon-Games، CDs

شفافية النموذج، الإقناع، والموثوقية، ولكن أيضًا تسهل تحديد وتصحيح الأخطاء المحتملة من خلال تفسيرات ثاقبة. تم توثيق هذه الفوائد بشكل موسع في الأعمال الأخيرة [38، 127-129]. على سبيل المثال، [128] أجرت دراسة تضمنت معالجة 40 مهمة صعبة وتقييم تأثير التفسيرات على سيناريوهات عدم التدريب وندرة التدريب. أظهرت نتائجهم أن التفسيرات لها تأثير إيجابي على أداء النموذج من خلال إنشاء ارتباط بين الأمثلة والتفسير.

تركز الأساليب التقليدية بشكل أساسي على التفسيرات المعتمدة على القوالب، والتي يمكن تصنيفها بشكل عام إلى تفسيرات قائمة على العناصر، وتفسيرات قائمة على المستخدمين، وتفسيرات قائمة على السمات. ترتبط الأساليب القابلة للتفسير القائمة على العناصر بالتوصيات المتعلقة بالعناصر المألوفة، موضحة أن العنصر الموصى به يشبه العناصر الأخرى التي يفضلها المستخدم، والتي تتواجد بكثرة على منصات مثل أمازون ونتفليكس. ومع ذلك، بسبب التعاون، قد تؤدي هذه الأساليب إلى أداء أقل في التوصيات الشخصية التي تتطلب التنوع، وقد تواجه صعوبة في تحديد العناصر ذات الصلة بكفاءة في البيئات الصناعية التي تحتوي على عدد هائل من العناصر. في المقابل، تستفيد التفسيرات القائمة على المستخدمين من العلاقات الاجتماعية لتقديم التوصيات من خلال توضيح أن المستخدمين ذوي الاهتمامات المماثلة يفضلون أيضًا العنصر الموصى به. تجعل خاصية المستخدم الاجتماعية هذه التفسيرات أكثر إقناعًا، مما يشجع المستخدمين على تجربة التوصيات. ومع ذلك، قد تؤدي التباينات في تفضيلات المستخدمين إلى تقليل تأثير هذه الطريقة في قياس التفضيل الفعلي. أخيرًا، تركز التفسيرات القائمة على السمات على تسليط الضوء على السمات الخاصة بالعناصر الموصى بها التي قد يجدها المستخدمون جذابة، مما ينقل بشكل أساسي “قد تهمك هذه الميزات”. تتطلب هذه الطريقة تخصيصًا وفقًا لاهتمامات كل مستخدم، مما يؤدي إلى دقة ورضا أعلى. وبالتالي، فإنها في طليعة البحث.

من الواضح أن مثل هذه الشروحات عادة ما تستخدم تنسيقات محددة مسبقًا وصياغات نمطية، مثل الشروحات المستندة إلى عناصر مشابهة أو أصدقاء. على الرغم من قدرتها على نقل المعلومات الأساسية، قد تؤدي هذه التنسيقات غير المرنة إلى تقليل تجربة المستخدم ورضاه بسبب افتقارها إلى التكيف والتخصيص. لهذا السبب، حظيت أساليب توليد اللغة الطبيعية باهتمام متزايد. اعتمدت الأعمال المبكرة بشكل رئيسي على الشبكات العصبية التكرارية (مثل LSTM و GRU). ومع ذلك، وبسبب محدودية تعبير النموذج، غالبًا ما تعاني من مشكلة نقص التنوع. مع الأداء الممتاز لنماذج Transformer في مهام اللغة الطبيعية المختلفة، حاولت بعض الأعمال دمج نماذج Transformer في التوصيات القابلة للتفسير. استخدم لي وآخرون المتجهات الموضعية المقابلة لمعرفات المستخدم (العنصر) للتنبؤ بالرموز المفسرة. أظهرت الأعمال اللاحقة أن الشرح الناتج لا يمكن أن يبرر تفضيل المستخدم من خلال تجميع أوصاف غير ذات صلة. لذلك، استخدم ني وآخرون مثل هذه المعلومات كمدخلات موجهة إلى BERT للحصول على تبرير قابل للتحكم. نظرًا لأن هذه المعلومات المساعدة ليست متاحة دائمًا في السيناريوهات الواقعية، يتطلب ExBERT فقط الشروحات التاريخية المكتوبة من قبل المستخدمين، ويستخدم مشفرًا قائمًا على الانتباه الذاتي متعدد الرؤوس لالتقاط الصلة بين هذه الشروحات وأزواج المستخدم-العنصر. مؤخرًا، قامت MMCT و EG4Rec و KAER بمزيد من النمذجة الدقيقة للمعلومات مثل الصور المرئية، وسلاسل الزمن، والميول العاطفية للحصول على تفسيرات عالية الجودة. تم القيام بمحاولة مبكرة، LLM4Vis، للتركيز على التوصيات القابلة للتفسير من خلال ChatGPT. يقترح RecExplainer ثلاث استراتيجيات توافقية من أجل القابلية للتفسير.

نظرًا للقوة التعبيرية المحدودة لنماذج اللغة التقليدية، فإن طرق توليد اللغة الطبيعية عرضة لمشاكل الاعتماد على المدى الطويل، أي أن إدخال نصوص طويلة سيؤدي إلى توليد تفسيرات تفتقر إلى التنوع والتماسك في المحتوى. بالإضافة إلى ذلك، فإن هذه الطرق التفسيرية مرتبطة ارتباطًا وثيقًا بنماذج التوصية المحددة (مثل NETE)، أو تصمم مباشرة نموذج توصية جديد (مثل NRT).

بيتر [144])، وغالبًا ما يكونون عاجزين عند مواجهة نماذج التوصية المتقدمة الموجودة، مما يحد من قابليتها للتعميم. هذه أيضًا عيب في الطرق المعتمدة على القوالب. ومن الجدير بالذكر أنه في البيئات الصناعية، تتضمن خوارزميات التوصية غالبًا ليس نموذجًا واحدًا فقط، بل سلسلة أو تكامل لعدة نماذج، وهذه التركيبات المعقدة تزيد من صعوبة فهم التوصيات.

بفضل القدرة التوليدية الملحوظة لنماذج اللغة الكبيرة في مهام اللغة، فهي مثالية لمواجهة التحديات المذكورة أعلاه. أولاً، بفضل الاستفادة من بيانات التدريب الواسعة، تتقن نماذج اللغة الكبيرة استخدام اللغة البشرية، بما في ذلك السياق والاستعارات والتركيب المعقد. وهذا يجهزها لصياغة تفسيرات مخصصة تكون دقيقة وطبيعية وقابلة للتكيف مع تفضيلات المستخدمين المختلفة، مما يقلل من قيود التفسيرات التقليدية القالبية. ثانياً، تتيح القدرات الفريدة للتعلم في السياق لنماذج اللغة الكبيرة، مثل التحفيز بدون أمثلة، والتحفيز بعدد قليل من الأمثلة، والتحفيز بسلسلة من الأفكار، لها جمع تعليقات المستخدمين في الوقت الحقيقي أثناء التفاعلات، وتقديم نتائج التوصيات وتفسيراتها المقابلة، مما يعزز التوافق الثنائي الاتجاه بين الإنسان والآلة. أظهرت دراسة حديثة إمكانية نماذج اللغة الكبيرة في توضيح تعقيدات النماذج المعقدة، كما يتضح من قدرة GPT-4 على تفسير وظيفة كل خلية عصبية في GPT-2 بشكل مستقل من خلال إدخال تحفيزات مناسبة وتفعيل الخلايا العصبية المقابلة. وهذا يعرض نهجًا مبتكرًا لتفسير نماذج التوصية المعتمدة على التعلم العميق. من الضروري التأكيد على أن هذه التقنية التفسيرية غير مرتبطة بهيكل النموذج، مما يميزها عن التفسيرات التقليدية المرتبطة بخوارزميات محددة. وبالتالي، فإن التفسيرات المستندة إلى نماذج اللغة الكبيرة تمهد الطريق لإطار تفسيري متعدد الاستخدامات وقابل للتوسع مع تطبيقات أوسع.

على الرغم من أن نماذج اللغة الكبيرة تتمتع بمزايا كبيرة بطبيعتها في تفسيرات التوصيات، فإنه من الضروري التعرف على القضايا المحتملة. أولاً، مثل نماذج التوصية، تعتبر نماذج اللغة الكبيرة صناديق سوداء يصعب على البشر فهمها. لا يمكننا تحديد المفاهيم التي تستند إليها التفسيرات. أيضًا، قد تكون التفسيرات المقدمة غير صادقة؛ أي أن التفسيرات غير متسقة مع سلوكياتها الموصى بها. تتضمن بعض التطورات الأخيرة استخدام سلاسل من الأفكار لتحفيز التفكير من أجل تحسين القابلية للتفسير؛ ومع ذلك، تظل غموض عملية التفكير في كل خطوة مصدر قلق، وقد تساءل البعض عن إمكانية وجود تفسيرات غير موثوقة لتحفيز سلاسل الأفكار. ثانيًا، قد تشمل البيانات الواسعة التي تستخدمها نماذج اللغة الكبيرة تحيزات بشرية ومحتوى خاطئ. وبالتالي، حتى إذا كانت التفسير متوافقًا مع سلوك التوصية للنموذج، فقد تكون كل من التفسير والتوصية معيبة. من الضروري مراقبة ومعايرة هذه النماذج لضمان العدالة والدقة في التوصيات القابلة للتفسير. أخيرًا، تظهر النماذج التوليدية مستويات متفاوتة من الكفاءة عبر مهام مختلفة، مما يؤدي إلى عدم اتساق في الأداء. قد تؤدي الإشارات الدلالية المتطابقة إلى تفسيرات توصية متباينة. وقد تم إثبات هذا التباين من خلال دراسات حديثة تركز على قوة نماذج اللغة الكبيرة. يتطلب معالجة هذه القضايا استكشاف تقنيات للتخفيف أو حتى تجاوز سلوك التفسير منخفض الموثوقية، والتحقيق في كيفية تدريب نماذج اللغة الكبيرة على توليد تفسيرات توصية موثوقة بشكل متسق، خاصة في ظل ظروف معادية، هو مجال جدير بمزيد من البحث.

7 نماذج اللغة الكبيرة كمنطق نظام شائع

مع تطور نماذج اللغة الكبيرة، هناك ملاحظة أن نماذج اللغة الكبيرة تظهر قدرات على التفكير [2,161] عندما تكون كبيرة بما فيه الكفاية، وهو أمر أساسي للبشر.
الذكاء لاتخاذ القرارات وحل المشكلات. من خلال تزويد النماذج بـ ‘سلسلة الأفكار’ [38]، مثل التحفيز بعبارة ‘دعنا نفكر في الأمر خطوة بخطوة’، تظهر نماذج اللغة الكبيرة قدرات ناشئة للتفكير المنطقي ويمكن أن تصل إلى استنتاجات أو أحكام وفقًا للأدلة أو المنطق. وبناءً عليه، فإن أنظمة التوصية، قادرة على التفكير لمساعدة استخراج اهتمامات المستخدم، مما يحسن الأداء.

7.1 تقديم توصيات مباشرة

التعلم في السياق [167-173] هو أحد القدرات الناشئة لنماذج اللغة الكبيرة التي تميزها عن نماذج اللغة المدربة مسبقًا، حيث، عند إعطاء تعليمات بلغة طبيعية وعروض للمهام، ستقوم نماذج اللغة الكبيرة بتوليد المخرجات من خلال إكمال تسلسل الكلمات دون تدريب أو ضبط [3]. بالنسبة للتعلم في السياق، يتبع التحفيز تعليمات المهمة و/أو عدة أزواج من المدخلات والمخرجات لتوضيح المهمة ويتم إضافة مدخل اختبار يتطلب من نموذج اللغة الكبيرة إجراء توقعات. يُطلق على زوج المدخلات والمخرجات اسم لقطة. هذه القدرة الناشئة تمكن من التنبؤ بحالات جديدة دون ضبط على عكس التعلم الآلي السابق.

في مجال أنظمة التوصية، استكشفت العديد من الدراسات أداء التعلم بدون لقطة/بضع لقطات باستخدام نماذج اللغة الكبيرة، تغطي المهام الشائعة للتوصية مثل توقع التقييم، وتوقع الترتيب. تقيم هذه الدراسات قدرة نماذج اللغة على تقديم التوصيات دون ضبط صريح، كما هو ملخص في الجدول 3، حيث تتبنى جميع الطرق التعلم في السياق للمستشارين المباشرين. يمكن إرفاق العملية العامة في الشكل 4. وبناءً عليه، لدينا النتائج التالية:

ركزت الدراسات المذكورة أعلاه بشكل أساسي على تقييم المستشارين بدون لقطة/بضع لقطات باستخدام مجموعات بيانات مفتوحة النطاق، بشكل رئيسي في مجالات مثل الأفلام والكتب. تم تدريب نماذج اللغة الكبيرة على مجموعات بيانات مفتوحة النطاق واسعة، مما يمكنها من امتلاك كمية كبيرة من المعرفة العامة، بما في ذلك المعلومات حول الأفلام المعروفة. ومع ذلك، عندما يتعلق الأمر ببيانات المجال الخاص، مثل منتجات التجارة الإلكترونية أو المواقع المحددة، فإن قدرة المستشارين بدون لقطة تفتقر إلى التحقق، وهو ما يُتوقع أن يكون تحديًا.
تتطلب طرق الاختبار الحالية دمج وحدات إضافية للتحقق من أداء المستشارين بدون لقطة لمهام محددة. على وجه الخصوص، بالنسبة لمهام الترتيب التي تتضمن تقديم قائمة من العناصر حسب التفضيل، يتم استخدام وحدة توليد المرشحين لتقليص مجموعة العناصر [164] و [165]. تولد النماذج المعتمدة على التوليد مثل gpt-3.5-turbo النتائج بطريقة توليدية بدلاً من الاعتماد على الاسترجاع من الذكريات الموجودة، مما يتطلب وحدات إضافية لتنفيذ توصيات العناصر المعتمدة على الهوية.
من منظور أداء التوصية، تظهر المستشارين بدون لقطة بعض القدرات ويؤدي المتعلمون بضع لقطات بشكل أفضل. ومع ذلك، لا يزال هناك فجوة كبيرة عند مقارنتها بنماذج التوصية التقليدية، وخاصة نماذج اللغة الكبيرة المضبوطة بشكل خاص للمستشارين، مثل P5 [174] و M6-Rec [175]. وهذا يبرز أن نماذج اللغة الكبيرة لا تمتلك ميزة كبيرة في النمذجة الشخصية.

قدرة ناشئة أخرى مهمة هي التفكير ‘خطوة بخطوة’، حيث يمكن لنماذج اللغة الكبيرة حل المهام المعقدة من خلال استخدام التحفيز الذي يتضمن خطوات التفكير الوسيطة السابقة، والتي تُسمى استراتيجية ‘سلسلة الأفكار’ [38]. صمم وانغ ولِم [164] تحفيزًا من ثلاث خطوات، وهو NIR، لالتقاط تفضيلات المستخدم، واستخراج الأفلام الأكثر تمثيلاً وإعادة ترتيبها.

الجدول 3 المتعلمين بدون لقطة/بضع لقطات من نماذج اللغة الكبيرة لأنظمة التوصية

النهج

العمود الفقري لنموذج اللغة الكبيرة

المهمة

المقياس

مجموعات البيانات

ICL

COT

[162]

gpt-3.5-turbo

توقع التقييم، التوصية التسلسلية، التوصية المباشرة، توليد الشرح، تلخيص المراجعات

RMSE، MAE

HR، NDCG

BLUE4، ROUGE، إنسان

تقييم

BLUE4، ROUGE، إنسان

تقييم

أمازون الجمال

[163]

Text-davinci-002

text-davinci-003

gpt-3.5-turbo

نقطة-واحدة، زوجية، قائمة-واحدة

NDCG، MRR

MovieLens-1M

أمازون-كتاب

أمازون-موسيقى

MIND-small

[120]

Flan-U-PALM gpt-3.5-turbo text-davinci-003

توقع التقييم، توقع الترتيب

RMSE، MAE

ROC-AUC

MovieLens-1M

أمازون-كتب

[164]

Text-davinci-003

إعادة الترتيب

NDCG، HR

MovieLens 100 K

[165]

gpt-3.5-turbo

إعادة الترتيب

NDCG

MovieLens-1M

أمازون-ألعاب

[166]

gpt-3.5-turbo

إعادة الترتيب

الدقة

MIND

الشكل 4 مثال على التعلم بدون لقطة/بضع لقطات للمستشارين المباشرين

العناصر بعد تصفية العناصر. مثل هذه الاستراتيجية متعددة الخطوات تعزز بشكل كبير أداء التوصية.

7.2 التفكير للاختيار الآلي

التعلم الآلي الآلي (AutoML) يُطبق على نطاق واسع في أنظمة التوصية للقضاء على الإعداد اليدوي المكلف مع التجارب والأخطاء. يمكن تصنيف مساحة البحث في أنظمة التوصية إلى (1) حجم التضمين (2) الميزات (3) تفاعل الميزات (4) بنية النموذج. يبحث بحث حجم التضمين، مثل [176-179] عن حجم التضمين المناسب لكل ميزة لتجنب استهلاك الموارد بشكل مفرط. البحث عن الميزات يتكون من البحث عن الميزات الخام [180،181] والبحث عن الميزات الاصطناعية [182،183]، والذي يختار مجموعة فرعية من مجموعة الميزات الأصلية أو المتقاطعة للحفاظ على الميزات المفيدة لتقليل كل من تكلفة الحساب والمساحة. البحث عن تفاعل الميزات، مثل [184-188]، يقوم تلقائيًا بتصفية تفاعلات الميزات التي ليست مفيدة. بحث بنية النموذج، مثل [189-192]، يوسع مساحة البحث إلى البنى الكاملة. تنتقل استراتيجية البحث من عملية التعلم المعزز المنفصل، التي تأخذ عينات من البنى للتدريب بشكل متكرر وتستغرق وقتًا طويلاً، إلى البحث القابل للتفريق، الذي يختار بشكل تكيفي البنى ضمن التعلم من لقطة واحدة لتجنب العبء الحسابي، لتحقيق تقارب أكثر كفاءة. ثم يعمل التقييم لكل بنية مأخوذة كإشارة لضبط الاختيارات. أي أن هناك صانع قرار يتذكر النتائج السابقة لاختيارات البنى السابقة ويحلل النتائج السابقة لتقديم الاختيار الموصى به التالي.

تمتلك نماذج اللغة الكبيرة الناشئة قدرات ممتازة على التذكر والتفكير التي ستعمل من أجل التعلم الآلي الآلي. حاولت عدة أعمال التحقق من إمكانيات التعلم الآلي الآلي مع نماذج اللغة الكبيرة. أوليًا، يستفيد GPT-NAS [193] من القدرة التوليدية لنماذج اللغة الكبيرة. يتم صياغة بنية الشبكات في شخصيات متسلسلة، وبالتالي يمكن تحقيق توليد بنى الشبكات بسهولة من خلال نماذج التدريب التوليدية. يتم استخدام NAS-Bench-101 [194] للتدريب المسبق وتستخدم النتائج المتطورة لضبط دقيق. تنتج نماذج التدريب التوليدية بنى معقولة، مما يقلل من مساحة البحث لخوارزميات الجينات اللاحقة للبحث عن البنى المثلى. يتم تقييم القدرة على التفكير المتقدمة نسبيًا بشكل أكبر في GENIUS [195]، حيث يتم استخدام GPT-4 كوكيل صندوق أسود لتوليد بنى محتملة ذات أداء أفضل وفقًا للتجارب السابقة بما في ذلك البنى التي تم تجربتها مع أدائها التقييمي. وفقًا للنتائج، يمكن لـ GPT-4 توليد شبكات بنية جيدة، مما يظهر الإمكانية لمهام أكثر تعقيدًا. ومع ذلك، من الصعب جدًا على نماذج اللغة الكبيرة اتخاذ قرارات مباشرة بشأن المشكلات التقنية الصعبة فقط من خلال التحفيز. لتحقيق التوازن
الكفاءة وقابلية التفسير، إحدى الطرق هي دمج نماذج اللغة الكبيرة في استراتيجيات بحث معينة، حيث توجه خوارزمية الجينات عملية البحث وتولد نماذج اللغة الكبيرة التبادلات المرشحة. تستخدم LLMatic [196] و EvoPrompting [197] نماذج اللغة الكبيرة كعوامل طفرات وتبادلات لخوارزمية NAS الجينية. خلال عملية التطور، كل جيل لديه احتمال معين لتحديد ما إذا كان سيتم إجراء تبادل أو طفرة لإنتاج نسل جديد. يتم توليد التبادلات والطفرات من خلال تحفيز نماذج اللغة الكبيرة. مثل هذا الحل يدمج نماذج اللغة الكبيرة في خوارزمية البحث الجيني، مما يحقق أداءً أفضل من التفكير المباشر على المساحة الكاملة.

تقدم الأبحاث المذكورة أعلاه رؤى قيمة في مجال التعلم الآلي في أنظمة التوصية. ومع ذلك، هناك العديد من التحديات التي تحتاج إلى معالجة. أولاً، مساحة البحث في أنظمة التوصية أكثر تعقيدًا بشكل ملحوظ، حيث تشمل أنواعًا متنوعة من مساحة البحث وتواجه مشكلات كبيرة في الحجم. تشكل هذه التعقيدات تحديًا في استكشاف مساحة البحث وتحسينها بشكل فعال. ثانيًا، مقارنةً ببحث الهيكل الشائع في مجالات أخرى، تفتقر أنظمة التوصية إلى أساس قوي من المعرفة بشأن المكونات المعلوماتية داخل مساحة البحث، خاصةً التفاعلات الفعالة للميزات عالية الترتيب. على عكس الهياكل الشبكية المعروفة في مجالات أخرى، تعمل أنظمة التوصية في مجالات وسيناريوهات متنوعة، مما يؤدي إلى مكونات متنوعة ومحددة المجال. ستمهد معالجة هذه التحديات وتقدم فهم مساحة البحث والمكونات المعلوماتية في أنظمة التوصية الطريق لتحسينات كبيرة في أساليب التعلم الآلي.

8 نماذج اللغة الكبيرة كوكيل محادثة

نظام التوصية المحادثي (CRS) هو نوع متخصص من أدوات التوصية التي تهدف إلى اكتشاف اهتمامات وتفضيلات المستخدمين من خلال الحوار، مما يمكّن من تقديم توصيات مخصصة وتعديل استراتيجيات التوصية في الوقت الحقيقي بناءً على ملاحظات المستخدم. مقارنةً بأنظمة التوصية التقليدية، تتمتع أنظمة التوصية المحادثية بميزة الفهم الفوري لنوايا المستخدم والقدرة على تكييف التوصيات بناءً على ملاحظات المستخدم. عادةً ما يتكون نظام التوصية المحادثي من مكونين رئيسيين: وحدة الحوار ووحدة التوصية.

في هذا القسم، سنركز بشكل أساسي على مناقشة وحدة الحوار، التي تلعب دورًا حاسمًا في تسهيل التفاعلات الفعالة بين المستخدم والنظام وفهم تفضيلات المستخدم (الشكل 5).

في نظام التوصية المحادثي، تأخذ وحدة الحوار عادةً شكل نظام حوار. يمكن تصنيف أنظمة الحوار عمومًا إلى فئتين رئيسيتين: الدردشة العادية والموجهة نحو المهام. تركز الأولى على الإجابة على الأسئلة في مجالات مفتوحة، وغالبًا ما يتم استخدام طريقتين رئيسيتين: الطرق التوليدية والطرق المعتمدة على الاسترجاع. تستخدم الطرق التوليدية [198-200] هيكل نموذج تسلسل إلى تسلسل لتوليد الردود، بينما تحول الطرق المعتمدة على الاسترجاع [201-203] مهمة توليد الردود إلى مشكلة استرجاع من خلال البحث عن الرد الأكثر صلة في قاعدة بيانات الردود بناءً على سياق الحوار. في أنظمة التوصية المحادثية، غالبًا ما تكون أنظمة الحوار الموجهة نحو المهام مطلوبة، حيث تم تصميمها خصيصًا لمساعدة المستخدمين في إنجاز مهام محددة. بالنسبة لأنظمة الحوار الموجهة نحو المهام، فإن نهجًا شائعًا [24،204] هو اعتبار توليد الردود كخط أنابيب ومعالجته بشكل منفصل باستخدام أربعة مكونات: فهم الحوار [205، 206]، تتبع حالة الحوار [207-209]، تعلم سياسة الحوار [24، 210]، وتوليد اللغة الطبيعية [211،212]. نهج آخر هو استخدام طريقة شاملة [213-215]، حيث يتم تدريب نموذج ترميز-فك ترميز للتعامل مع جميع خطوات المعالجة.

الشكل 5 approaches المهمة في وكيل المحادثة

بشكل جماعي. تعاني الطريقة الأولى من مشكلات قابلية التوسع وتفتقر إلى التآزر بين المكونات، بينما تتطلب الطريقة الثانية كمية كبيرة من البيانات المشروطة للتدريب.

استنادًا إلى تصنيف أنظمة الحوار، يمكن أيضًا تقسيم الأساليب الشائعة في أنظمة التوصية المحادثية إلى فئتين: الأسئلة والأجوبة المعتمدة على السمات (QA) والطرق التوليدية. تستخدم طريقة الأسئلة والأجوبة المعتمدة على السمات [24، 216-218] طريقة خط أنابيب داخل نظام الحوار. في كل دورة حوار، يحتاج النظام إلى اتخاذ قرار بشأن ما إذا كان يجب أن يسأل المستخدم سؤالًا أو تقديم توصية. يتم التعامل مع عملية اتخاذ القرار، خاصةً فيما يتعلق بالسمة التي يجب السؤال عنها، عادةً بواسطة شبكة سياسة. من ناحية أخرى، لا تقوم الطرق التوليدية بنمذجة عملية اتخاذ القرار بشكل صريح. بدلاً من ذلك، غالبًا ما تستخدم نهج التدريب الشامل، حيث يقوم نموذج تسلسل إلى تسلسل بتوليد المخرجات مباشرة من مفردات مشتركة من الكلمات والعناصر. سواء كانت المخرجات المولدة دردشة عادية، سؤال، أو توصية يتم تحديدها ضمنيًا خلال عملية التوليد. مقارنةً بأساليب الأسئلة والأجوبة المعتمدة على السمات، تبدو الطرق التوليدية [219-222] أبسط وأكثر قابلية للتوسع. ومع ذلك، فإنها تتطلب كمية كبيرة من بيانات التدريب المشروطة. مع تقدم نماذج اللغة المدربة مسبقًا (PLMs) في مجال معالجة اللغة الطبيعية، وخاصةً نماذج مثل BERT [223] وGPT [224]، أصبحت قدرات النماذج المدربة مسبقًا في فهم اللغة وتوليدها أكثر قوة. وجد الباحثون أن تحسين النماذج المدربة مسبقًا بكمية صغيرة من البيانات المشروطة يمكن أن يؤدي إلى نتائج مثيرة للإعجاب في مهام محددة. أدى هذا الاكتشاف إلى تطبيق PLMs في أنظمة التوصية المحادثية التوليدية. على سبيل المثال، حقق DialoGPT [215] ذكاء حواري واعد من خلال تحسين GPT-2 على بيانات الحوار المجمعة من منصات مثل Reddit. بعد ذلك، استخدمت BARCOR [220] وRecInDial [222] وUniCRS [221] DialoGPT لبناء أنظمة توصية محادثية، مع اختلافات في استراتيجيات اتخاذ القرار الخاصة بهم. بينما تقلل PLMs من اعتماد نماذج الحوار التوليدية على البيانات الواسعة، لا يزال عملية التحسين تتطلب وقتًا كبيرًا من الحوسبة وتحتاج إلى جمع بيانات تدريب عالية الجودة محددة المجال بسبب مساحة المعلمات الكبيرة.

مع زيادة معلمات النموذج وبيانات التدريب، تستمر قدرة الذكاء والمعرفة للنماذج في التحسن. كانت OpenAI توسع معلمات النموذج وبيانات التدريب مع استخدام تقنيات مثل RLHF (التعلم المعزز من

ملاحظات البشر) وضبط التعليمات لتحسين GPT-3 [3]. أدى ذلك إلى ظهور قدرات نماذج مثل InstructGPT [37] ونماذج لاحقة مثل ChatGPT، التي تظهر ذكاءً مذهلاً وفتحت الأبواب أمام أنظمة الحوار الذكية الجديدة المعتمدة على نماذج اللغة الكبيرة (LLMs). علاوة على ذلك، فإن نموذج BARD من Google وLLaMA من META [32] هما أيضًا نماذج حوارية كبيرة تم اقتراحها وأظهرت أداءً ملحوظًا في القدرات المحادثية. على سبيل المثال، يستخدم نموذج Vicuna مجموعات حوارية مشتركة من قبل المستخدمين في استخدام ChatGPT لتحسين نموذج LLaMA مفتوح المصدر، حيث يدعي الفريق أنه يمكن أن يحقق أكثر من

قدرة ChatGPT. لقد جلبت هذه السلسلة من إدخال نماذج اللغة الكبيرة الجديدة رؤى جديدة إلى أنظمة التوصية المحادثية. نظرًا لاستخدام مجموعات بيانات مفتوحة واسعة النطاق أثناء تدريب LLM، فإن لديها قدرات توصية محادثية فطرية ويمكن أن تقدم توصيات معقولة في مجالات مفتوحة مثل الأفلام والموسيقى والأخبار والألعاب.

ومع ذلك، لا تزال هناك تحديات كبيرة في بناء نظام إدارة علاقات العملاء على مستوى المؤسسات. التحدي الأول هو نقص الوعي بالنماذج الكبيرة حول بيانات المجال الخاص. من المعروف أن معظم بيانات التدريب للنماذج اللغوية الكبيرة، مثل GPT-3، تأتي من مصادر متاحة للجمهور على الإنترنت. ونتيجة لذلك، قد تفتقر هذه النماذج إلى الرؤية حول البيانات التي توجد داخل منصات المعلومات، مما يجعل قدراتها على النمذجة وفهم مثل هذه البيانات ضعيفة نسبيًا. لمعالجة هذا التحدي، يتم حاليًا استكشاف نهجين: التخصيص الدقيق [215] وتعلم الأدوات [225، 226]. يتضمن التخصيص الدقيق ضبط النموذج اللغوي الكبير باستخدام بيانات حوار خاصة بالمجال. هناك قلقان رئيسيان في هذا النهج. أولاً، يتطلب الأمر بيانات حوار ضخمة وعالية الجودة خاصة بالمجال لضبط النموذج الكبير للغاية. ومع ذلك، في معظم سيناريوهات التوصية، تتكون البيانات بشكل أساسي من تفاعلات المستخدم-العنصر الصريحة أو الضمنية، والتي قد تفتقر إلى سياق المحادثة. لذلك، فإن توليد بيانات حوار عالية الجودة من بيانات التفاعل هو مصدر قلق رئيسي في هذا النهج. في RecLLM [226] و iEvaLM [227]، اقترح الباحثون استخدام النماذج اللغوية الكبيرة لبناء محاكي مستخدم لتوليد بيانات حوارية. بالإضافة إلى ذلك، تلعب تقنية التخصيص الدقيق دورًا حاسمًا في تحديد الجودة النهائية للنماذج اللغوية الكبيرة. يمكن أن تؤدي استراتيجية تخصيص دقيقة ومصممة بشكل جيد إلى تحسينات كبيرة في أداء النموذج وقدراته، مثل ضبط التعليمات وRLHF المقترح في InstructGPT [3]. تعلم الأدوات هو نهج آخر لمعالجة هذا التحدي، وفكرته الرئيسية هي اعتبار نماذج التوصية التقليدية كأدوات يمكن استخدامها، مثل تحليل المصفوفات (MF) وDeepFM. لمزيد من الشرح التفصيلي حول تعلم الأدوات، يرجى الرجوع إلى القسم 9. نظرًا لأن نماذج التوصية خاصة بالمجال، يمكن للنموذج اللغوي الكبير الاستفادة من هذه النماذج للحصول على نتائج التوصية وتوصيتها للمستخدمين في الاستجابة. في هذا النهج، هناك نقطتان تقنيتان رئيسيتان: بناء نموذج الأداة وهندسة المطالبات لتوجيه النموذج اللغوي الكبير في الاستخدام الصحيح للأداة. أولاً وقبل كل شيء، تستخدم نماذج التوصية التقليدية عمومًا معرفات أو ميزات فئوية كمدخلات، بينما يقدم المستخدمون دائمًا متطلباتهم أو تفضيلاتهم بلغة طبيعية في المحادثات. لذلك، يجب أخذ ميزات النص غير المنظم في الاعتبار في بناء الأداة. في Chat-Rec [225]، يتم استخدام نموذج توصية تقليدي ونموذج قائم على تضمين النص (text-embedding-ada-002) كأدوات. قامت RecLLM [226] بتكييف نموذج لغوي معزز بنموذج ثنائي التشفير وعدة طرق لاسترجاع النصوص كمحرك توصية. من ناحية أخرى، على الرغم من الذكاء القوي وقدرات الاستدلال للنماذج اللغوية الكبيرة، فإن الاستفادة الفعالة من هذه القدرات تتطلب مطالبات مصممة بشكل جيد للتوجيه. على سبيل المثال، يمكن أن يؤدي سلسلة التفكير المقترحة من قبل جيسون [38] إلى تحفيز النموذج اللغوي الكبير على التفكير والاستدلال خطوة بخطوة، مما
يفيد قدرة استخدام الأداة. وقد اقترحت الدراسات اللاحقة مثل ToT [228] وPlan-and-Solve [229] وReAct [230] تقنيات أكثر تقدمًا لتصميم المطالبات للمساعدة في توجيه النموذج اللغوي الكبير للانخراط في تفكير أعمق وتخطيط الأدوات.

التحدي الثاني يكمن في قضية الذاكرة والفهم في المحادثات الطويلة. بسبب قيود الإدخال للنماذج اللغوية الكبيرة، يمكن لنماذج مثل ChatGPT دعم حد أقصى يبلغ 4096 رمزًا في مكالمة واحدة، بما في ذلك كل من الإدخال والإخراج. في سيناريوهات الحوار متعددة الأدوار، غالبًا ما تواجه سياقات الحوار الأطول خطر تجاوز هذا الحد من الرموز. أبسط نهج لمعالجة هذا التحدي هو تقليم الحوار عن طريق التخلص من الأدوار السابقة. ومع ذلك، في أنظمة التوصية الحوارية، قد يعبر المستخدمون عن كمية كبيرة من المعلومات الشخصية والاهتمامات في المراحل المبكرة من المحادثة. إن حذف مثل هذه المعلومات يؤثر مباشرة على دقة التوصيات. لمعالجة هذه القضية، اقترحت عدة أعمال ذات صلة حلولًا. يعزز MemPrompt [231] المطالبة من خلال دمج وحدة ذاكرة، مما يمكّن GPT-3 من امتلاك قدرة ذاكرة أقوى للحوار الطويل. وبالمثل، تستفيد RecLLM [226] من النموذج اللغوي الكبير لاستخراج ملفات تعريف المستخدمين وتخزينها كبيانات واقعية في ذاكرة المستخدم. عند معالجة استفسارات المستخدم، يتم استرجاع الحقائق ذات الصلة بناءً على تشابه النص.

9 تعلم الأدوات وتطبيقاته في التوصية

9.1 تعلم الأدوات القائم على النماذج اللغوية الكبيرة

يعد تعلم الأدوات مجال بحث ناشئ يهدف إلى تعزيز قدرات حل المهام من خلال دمج أدوات متخصصة مع نماذج أساسية، والتي تم فهمها من قبل [232] على أنها منظورين:

يعتبر التعلم المعزز بالأدوات أدوات متخصصة كمساعدين من أجل تحسين جودة ودقة المهام، أو أداة للذكاء الاصطناعي؛
يركز التعلم الموجه بالأدوات أكثر على تدريب النماذج لاستخدام الأدوات بشكل فعال، والتحكم في عمليات تطبيق الأدوات وتحسينها، أو الذكاء الاصطناعي من أجل الأداة.
لقد وجد تعلم الأدوات تطبيقات في مجالات متنوعة، ويركز هذا القسم بشكل أساسي على نماذج تعلم الأدوات القائمة على النماذج اللغوية الكبيرة (LLMs). بينما تتضمن الأعمال الأخيرة غالبًا مزيجًا من هذين المنظورين، فإننا لا نصنف كل عمل بشكل محدد في نوع واحد. النماذج اللغوية الكبيرة، مثل GPT، مناسبة تمامًا لتطبيقات تعلم الأدوات [233]. بفضل قدراتها القوية في معالجة اللغة الطبيعية، يمكن للنماذج اللغوية الكبيرة تقسيم المهام المعقدة إلى مهام فرعية أصغر وتحويلها إلى تعليمات قابلة للتنفيذ. تتيح الأدوات المتخصصة للنماذج اللغوية الكبيرة الوصول إلى المعرفة التي تتجاوز فهمها الخاص. من خلال دمج الأدوات المتخصصة، يمكن للنماذج اللغوية الكبيرة فهم المشكلات المعقدة بشكل أفضل ثم معالجتها، مما يوفر حلولًا أكثر دقة وكفاءة للأنظمة المخصصة.

تُستخدم النماذج اللغوية الكبيرة عادة كعناصر تحكم لاختيار وإدارة نماذج الذكاء الاصطناعي المختلفة الموجودة لحل المهام المعقدة، والتي تعتمد على مدخلات المستخدم وواجهات اللغة في إجراء الملخصات. تعمل كنقطة مركزية، مسؤولة عن فهم بيانات المشكلة واتخاذ قرارات بشأن الإجراءات التي يجب تنفيذها. بالإضافة إلى ذلك، تجمع النتائج بناءً على نتائج الإجراءات المنفذة. في هذه الحالة، يستفيد HuggingGPT [243] من النماذج الموجودة من مجتمع Hugging Face

للمساعدة في

حل المهام. يجمع Visual ChatGPT [244] بين نماذج الأساس البصرية مثل BLIP [247] وStable Diffusion [248]، إلخ، مع LangChain

للتعامل مع المهام البصرية المعقدة، بينما يقوم ما يليTaskMatrix.AI[245] بالحفاظ على منصة API موحدة تمدد قدرات Visual ChatGPT، وتوسع قدرات Visual ChatGPT من خلال الحفاظ على منصة API موحدة، مما يمكّن المدخلات من عدة أنماط ويولد حلول مهام أكثر تعقيدًا. على العكس من ذلك، يعمل Auto-GPT

كوكيل يفهم بشكل مستقل الأهداف المحددة من خلال اللغة الطبيعية وينفذ جميع العمليات في حلقة آلية، دون الحاجة إلى مدخلات بشرية إلزامية.

يقدم WebGPT [239] بيئة تفاعلية لتصفح الويب تعتمد على النصوص، حيث تتعلم نماذج اللغة الكبيرة تقليد العملية الكاملة للتفاعل البشري مع متصفح الويب باستخدام تقنيات استنساخ السلوك وعينات الرفض. في ReAct [230]، من خلال الاستفادة من موجه بديهي، تتعلم نماذج اللغة الكبيرة توليد مسارات التفكير وإجراءات محددة للمهام بالتناوب عند حل مهمة معينة. يتم تفويض تنفيذ الإجراءات المحددة إلى الأدوات المقابلة، ويتم استخدام التعليقات الخارجية التي تم الحصول عليها من هذه الأدوات للتحقق من صحة وتوجيه عملية التفكير بشكل أكبر. تتماشى الدوافع وراء Toolformer [246] بشكل وثيق مع ReAct؛ ومع ذلك، فإنها تتقدم خطوة إلى الأمام من خلال دمج أدوات متنوعة ضمن نموذج واحد. يوفر هذا التكامل للنموذج قدرات مرنة في اتخاذ القرار وقدرات تحسين عامة، تم تحقيقها من خلال طريقة بسيطة ولكن فعالة للتعلم الذاتي. على عكس الأعمال السابقة، تتبنى LATM [249] نهجًا جديدًا من خلال تمكين نماذج اللغة الكبيرة من توليد الأدوات مباشرة. تحقق تقسيم العمل ضمن عملية حل المهام من خلال استخدام نماذج اللغة الكبيرة على مقاييس مختلفة: صانع الأدوات، مستخدم الأدوات، والموزع. تتكون LATM بالكامل من نماذج اللغة الكبيرة، مما يمكّن من توليد الأدوات واستخدامها ذاتيًا.

9.2 التطبيقات في سيناريوهات التخصيص

مؤخراً، أظهرت نماذج اللغة الكبيرة قدرات مثيرة للإعجاب في الاستفادة من المعرفة الداخلية للعالم والتفكير السليم لفهم نية المستخدم بدقة من الحوارات. علاوة على ذلك، يمكن لنماذج اللغة الكبيرة التواصل مع المستخدمين بطلاقة باللغة الطبيعية، مما يوفر تجربة مستخدم سلسة وممتعة. تجعل هذه المزايا نماذج اللغة الكبيرة خيارًا جذابًا كعملاء توصية لتعزيز التجربة المخصصة.

ومع ذلك، على الرغم من سعة الذاكرة المثيرة للإعجاب لنماذج اللغة الكبيرة، فإنها تواجه تحديات في تذكر المعرفة المحددة في المجالات الخاصة والخاصة دون تدريب كافٍ. على سبيل المثال، يمكن أن يكون تخزين مجموعة العناصر وجميع ملفات تعريف المستخدمين في نظام التوصية تحديًا لنماذج اللغة الكبيرة. يمكن أن تؤدي هذه القيود إلى توليد نماذج اللغة الكبيرة استجابات غير دقيقة أو خاطئة وتجعل من الصعب التحكم في سلوكها ضمن مجال معين. علاوة على ذلك، تواجه نماذج اللغة الكبيرة تحدي مشكلة التعميم الزمني حيث تستمر المعرفة الخارجية في التطور والتغيير مع مرور الوقت. لمعالجة هذه القضايا، يمكن استخدام أدوات متنوعة لتعزيز نماذج اللغة الكبيرة وزيادة فعاليتها كعملاء توصية. يوضح الجدول 4 أمثلة على الأدوات ذات الصلة.

محرك البحث تُستخدم محركات البحث على نطاق واسع لتوفير المعرفة الخارجية لنماذج اللغة الكبيرة، مما يقلل من عبء الذاكرة على نماذج اللغة الكبيرة ويخفف من حدوث الهلوسة في استجابات نماذج اللغة الكبيرة. يستخدم BlenderBot 3 [250] مجموعات بيانات محددة لضبط سلسلة من الوحدات، مما يمكّن نماذج اللغة الكبيرة من تعلم استدعاء محرك البحث في الوقت المناسب واستخراج المعرفة المفيدة من نتائج الاسترجاع. تتعلم LaMDA [238] استخدام مجموعة أدوات تتضمن IR

الجدول 4 أساليب تعلم الأدوات المعتمدة على نماذج اللغة الكبيرة

النهج	استخدام الأداة	العمود الفقري لنموذج اللغة الكبيرة	المهمة
Re3 [234]	نموذج اللغة الكبيرة	gpt3-instruct-175B gpt3-instruct13B	توليد قصص طويلة
PEER [235]	نموذج اللغة الكبيرة	T5 المعدل	الإصدارات، الاقتباسات، الاقتباسات
METALM [236]	مشفرات مدربة مسبقًا مع أوضاع متنوعة	محول (مدرب من الصفر)	مهام اللغة فقط مهام اللغة البصرية
Atlas [237]	مسترجع كثيف	T5	مهام اللغة المعتمدة على المعرفة فهم اللغة متعددة المهام الإجابة على الأسئلة التحقق من الحقائق
LaMDA [238]	مسترجع مترجم آلة حاسبة	محول فقط للترميز	الحوار
WebGPT [239]	متصفح الويب	gpt-3	الإجابة على الأسئلة
عين العقل [240]	محرك فيزياء نص إلى كود LM	gpt-3 PaLM	التفكير
PAL [241]	مترجم بايثون	CODEX(code-davinci-002)	التفكير الرمزي الرياضي الخوارزمي
SayCan [242]	روبوتات	PaLM	المهام الروبوتية في العالم الحقيقي
HuggingGPT [243]	نماذج الذكاء الاصطناعي في مجتمع Hugging Face	gpt-3.5-turbo text-davinci-003 gpt4	تصنيف الصور، وصف الصور، كشف الكائنات، إلخ.
Auto-GPT	متصفح الويب	gpt-3.5-turbo text-davinci-003 gpt4	المهام المحددة من قبل المستخدم
Visual ChatGPT [244] Taskmatrix.AI [245]	نماذج الأساس البصرية نماذج مخصصة مع شكل واجهة برمجة التطبيقات الموحدة	text-davinci-003	المهام المخصصة البصرية
ReAct [230]	واجهة برمجة تطبيقات ويكيبيديا	PaLM-540B	الإجابة على الأسئلة التحقق من الوجه
Toolformer [246]	آلة حاسبة نظام الأسئلة والأجوبة محرك البحث نظام الترجمة التقويم	GPT-J	المهام اللاحقة

نظام، مترجم، وآلة حاسبة من خلال الضبط الدقيق لتوليد استجابات أكثر دقة. RETA-LLM [251] هو مجموعة أدوات لنماذج اللغة الكبيرة المعززة بالاسترجاع. تفصل تمامًا بين أنظمة IR ونماذج اللغة الكبيرة، مما يسهل تطوير أنظمة قائمة على نماذج اللغة الكبيرة في المجال. يظهر Thoppilan وآخرون [238] حالة من تطبيق LaMDA في توصية المحتوى. مشروطًا على عدد قليل من الحوارات الخاصة بالأدوار، يمكن أن تلعب LaMDA دور وكيل توصية موسيقية.

محرك التوصية حاولت بعض الأعمال تخفيف عبء الذاكرة على نماذج اللغة الكبيرة من خلال تجهيزها بمحرك توصية كأداة، مما يمكّن نماذج اللغة الكبيرة من تقديم توصيات مستندة إلى مجموعة العناصر. يتم تقسيم محرك التوصية في ChatREC [225] إلى مرحلتين: الاسترجاع وإعادة التصنيف، مما يتماشى مع استراتيجيات نظام التوصية التقليدية. في مرحلة الاسترجاع، تستخدم نماذج اللغة الكبيرة أنظمة التوصية التقليدية كأدوات لاسترجاع 20 عنصرًا من مجموعة العناصر كمجموعة عناصر مرشحة. بعد ذلك، تستخدم نماذج اللغة الكبيرة نفسها كأدوات لإعادة تصنيف مجموعة العناصر المرشحة. تتيح قدرة نماذج اللغة الكبيرة على التفكير السليم، جنبًا إلى جنب مع المعرفة الداخلية الموجودة فيها، تقديم تفسيرات لنتائج الفرز. الأداة المستخدمة في محرك التوصية في RecLLM [226] مشابهة جدًا لها في Chat-REC، وهي أيضًا مقسمة إلى مراحل الاسترجاع وإعادة التصنيف. يوفر RecLLM العديد من الحلول العملية للاسترجاعات على نطاق واسع، مثل نموذج الترميز المزدوج العام والبحث القائم على المفهوم، وما إلى ذلك.

قاعدة البيانات تُستخدم قواعد البيانات أيضًا كأدوات لتكملة المعلومات الإضافية لنماذج اللغة الكبيرة. من أجل التعامل بشكل أفضل مع مشكلة البداية الباردة للعناصر الجديدة وتخفيف مشكلة التعميم الزمني لنماذج اللغة الكبيرة، يتم استخدام قاعدة بيانات متجهة لتوفير المعلومات للعناصر الجديدة التي لا تعرفها نماذج اللغة الكبيرة في Chat-REC [225]. عند مواجهة عناصر جديدة، يمكن لنماذج اللغة الكبيرة استخدام هذه القاعدة للوصول إلى معلومات حولها بناءً على التشابه بين تضمين طلب المستخدم وتضمينات العناصر في القاعدة. يمكن أن تساعد ملفات تعريف المستخدمين أيضًا نماذج اللغة الكبيرة في فهم نية المستخدم بشكل أفضل. يستخدم RecLLM [226] وحدة ملف تعريف المستخدم كأداة لتخزين حقائق ذات مغزى ودائمة حول المستخدمين التي تم التعرض لها خلال المحادثات التاريخية في ذاكرة المستخدم واسترجاع حقيقة واحدة تتعلق بالحوار الحالي عند الحاجة.

على الرغم من أن بعض الأعمال قد طبقت مفهوم تعلم الأدوات على أنظمة التخصيص، لا تزال هناك مواضيع بحثية مثيرة وواعدة تستحق الاستكشاف. 1) ضبط النماذج لاستخدام أفضل للأدوات. لقد أظهرت التعلم في السياق وعدًا في تعليم نماذج اللغة الكبيرة كيفية استخدام الأدوات بفعالية مع عدد قليل من العروض، كما هو موضح في Chat-REC و RecLLM. ومع ذلك، غالبًا ما واجهت نماذج اللغة الكبيرة صعوبة في تعلم استراتيجيات التعامل مع السياقات المعقدة مع عروض محدودة. يعتبر الضبط خيارًا قابلاً لتحسين استخدام الأدوات، لكنه يتطلب بيانات تدريب كافية وتقنيات فعالة. يقوم RecLLM بمزيد من الضبط لبعض وحداته باستخدام بيانات اصطناعية تم إنشاؤها بواسطة محاكي مستخدم من خلال تقنية RLHF [37]. إن التحقيق في طرق الحصول على بيانات تدريب كافية وتطوير تقنيات ضبط مخصصة لأنظمة التوصية هو اتجاه بحثي يستحق العناء. 2) تطوير محرك توصية أكثر قوة. غالبًا ما تعتمد أنظمة التوصية التقليدية على إشارات التصفية التعاونية وعلاقات الانتقال من عنصر إلى عنصر للتوصيات. ومع ذلك، مع استخدام نماذج اللغة الكبيرة كنماذج أساسية، يمكن أن تعكس تفضيلات المستخدمين من خلال اللغة الطبيعية وحتى الصور. لذلك، فإن تطوير محرك توصية يدعم البيانات متعددة الوسائط هو اتجاه بحثي حاسم. بالإضافة إلى ذلك، يجب أن يكون محرك التوصية قادرًا أيضًا على ضبط مجموعة المرشحين بناءً على تفضيلات المستخدم أو ملاحظاته (مثل استعلام عن أفلام من نوع معين أو عدم الإعجاب بعنصر في مجموعة التوصيات). 3) بناء المزيد من الأدوات. لتزويد نماذج اللغة الكبيرة بمعلومات أكثر أصالة وتخصيصًا، فإن تطوير أدوات إضافية أمر حاسم. على سبيل المثال، يمكن أن تعزز واجهات برمجة التطبيقات لاستعلام الرسوم البيانية المعرفية [252] أو الوصول إلى العلاقات الاجتماعية للمستخدمين المعرفة المتاحة لنماذج اللغة الكبيرة، مما يمكّن من تقديم توصيات أكثر دقة وتخصيصًا.
يمكن أن تعزز العلاقات الاجتماعية المعرفة المتاحة لنماذج اللغة الكبيرة، مما يمكّن من تقديم توصيات أكثر دقة وتخصيصًا.

10 نماذج اللغة الكبيرة كمنشئ محتوى مخصص

تركز أنظمة التوصية التقليدية على اقتراح العناصر الموجودة بناءً على تفضيلات المستخدم والبيانات التاريخية، حيث يتم عرض المحتوى الذي تم إنشاؤه بالفعل للاسترجاع. ومع ذلك، مع التقدم في التقنيات والمنصات لمنشئي المحتوى، جذب منشئ المحتوى المخصص المزيد والمزيد من الانتباه، حيث يتم تخصيص محتوى أكثر جاذبية ليتناسب مع اهتمامات وتفضيلات المستخدم، خاصة في مجال الإعلان عبر الإنترنت [253]. تحتوي المحتويات الشائعة على المحتويات المرئية والدلالية [254-256]، مثل العنوان، الملخص، الوصف، الكتابات الإعلانية، لافتات الإعلانات، الصورة المصغرة، ومقاطع الفيديو. موضوع آخر تم مناقشته على نطاق واسع هو توليد إعلانات نصية، حيث يتم توليد عنوان الإعلان ووصف الإعلان بمعلومات مخصصة. اعتمدت الأعمال السابقة على القوالب المحددة مسبقًا [254، 257، 258] لتقليل الجهد البشري الكبير، والذي، مع ذلك، غالبًا ما يفشل في تلبية اهتمامات وتفضيلات المستخدم بالكامل. ظهرت مؤخرًا طرق مدفوعة بالبيانات، والتي تدمج ملاحظات المستخدم كمكافآت في إطار التعلم المعزز لتوجيه عملية التوليد [259-262]. علاوة على ذلك، لعب دمج نماذج اللغة المدربة مسبقًا دورًا كبيرًا في تحسين عملية التوليد لعدة عناصر محتوى [263-266]. يساعد هذا الدمج في تحسين نماذج توليد المحتوى وزيادة قدرتها على تلبية تفضيلات المستخدم بشكل فعال.

مع استمرار تطور أنظمة التوصية ونماذج اللغة الكبيرة، فإن تقنية واعدة ستجلب فرصًا جديدة هي دمج المحتوى الذي تم إنشاؤه بواسطة الذكاء الاصطناعي (AIGC). يتضمن AIGC [267] إنشاء محتوى رقمي، مثل الصور والموسيقى واللغة الطبيعية من خلال نماذج الذكاء الاصطناعي، بهدف جعل عملية إنشاء المحتوى أكثر كفاءة وسهولة. ركزت الجهود السابقة في هذا المجال على نماذج توليد تعتمد على التعلم العميق، بما في ذلك الشبكات التنافسية التوليدية (GANs) [268]، والمشفرات التلقائية المتغيرة (VAEs) [269]، وتدفقات التطبيع [270]، ونماذج الانتشار [271] لتوليد صور عالية الجودة. مع تطور النموذج التوليدي، يظهر في النهاية كهيكل المحول [28]، الذي يعمل ككتل أساسية لـ BERT [29] و GPT [224] في مجال معالجة اللغة الطبيعية، ولـ Vision Transformer (ViT) [272] و Swin Transformer [273] في مجال الرؤية الحاسوبية. علاوة على ذلك، توسع نطاق مهام التوليد من المهام أحادية الوضع إلى المهام متعددة الأوضاع، بما في ذلك النموذج التمثيلي CLIP [274]، الذي يمكن استخدامه كمشفرات صور مع تحفيز متعدد الأوضاع للتوليد. أصبحت التوليد متعددة الأوضاع جانبًا أساسيًا من AIGC، الذي يتعلم الاتصال والتفاعل متعدد الأوضاع، وعادة ما يتضمن توليد لغة الرؤية [274]، وتوليد نصوص صوتية [275]، وتوليد نصوص الرسوم البيانية [276]، وتوليد نصوص الشيفرة [277]. مع ظهور نماذج اللغة الكبيرة، يتم تحقيق AIGC اليوم من خلال استخراج النية البشرية من التعليمات وتوليد المحتوى وفقًا لمعرفته ونيته. لقد حصلت المنتجات التمثيلية، بما في ذلك ChatGPT [278]، DALL-E-2 [279]، Codex [280] و Midjourney [281]، على اهتمام كبير من المجتمع. مع زيادة البيانات وحجم النموذج، يمكن للنموذج تعلم معلومات أكثر شمولاً وبالتالي يؤدي إلى منشئي محتوى أكثر واقعية وعالية الجودة.

بالعودة إلى منشئ المحتوى المخصص، ستجلب نماذج اللغة الكبيرة الفرص من النقاط التالية (الشكل 6). ستقوم نماذج اللغة الكبيرة بتوسيع

الشكل 6 مزايا منشئ المحتوى مع نماذج اللغة الكبيرة

قدرات النموذج المدرب مسبقًا، مما يسمح بتحسين أفضل لاستنتاج نية واهتمام المستخدم المخصص. قد يتم تحسين الطرق السابقة [264، 265] التي تعتمد على نماذج التدريب المسبق المخصصة لتحسين قدرات الاستنتاج وتحفيز القليل من العروض. ثانيًا، يمكن تطبيق استراتيجية التعلم المعزز من ملاحظات البشر (RLHF) لضبط النماذج لالتقاط معلومات نية المستخدم بشكل أفضل، مشابهًا لإطار العمل القائم على التعلم المعزز الحالي [260] لتوليد إعلانات نصية. وآخرًا وليس آخرًا، فإن القدرات التوليدية القوية لنماذج اللغة الكبيرة تمكّن من إنشاء واقعي بفضل توفر قواعد المعرفة متعددة الأوضاع الكافية. العمل [282] يقترح بشكل أكثر تحديدًا نموذج توصية قائم على ChatGPT، حيث تتلقى عملية التوليد ملاحظات وجولات متعددة من التحويلات لالتقاط تفضيلات المستخدم الصريحة بشكل أفضل. مقارنةً بأنماط التدريب السابقة، يمكن لنماذج اللغة الكبيرة فهم تعبيرات أكثر وضوحًا لاهتمام المستخدم وتحويلها إلى تعليمات مناسبة لتوجيه توليد المحتوى، مما يخفف بشكل كبير من مشكلة الملاحظات النادرة للغاية. يوفر دمج AIGC (الذكاء الاصطناعي والمحتوى التوليدي) مع أنظمة التوصية فرصًا قيمة في سيناريوهات الأعمال المختلفة. في التجارة الإلكترونية، يمكن أن تدعم نماذج اللغة الكبيرة الدردشة الآلية لتوصيات المنتجات المخصصة وإنشاء محتوى جذاب لجذب المستخدمين. في خدمة العملاء، يمكّن AIGC من الاستجابات الآلية، والأسئلة الشائعة، والمساعدة المخصصة، مما يحسن من كفاءة الدعم ورضا العملاء. بشكل عام، يعزز دمج AIGC مع أنظمة التوصية تجارب المستخدمين ويدفع نمو الأعمال.

ومع ذلك، هناك خطران رئيسيان يتعلقان بالأمان والخصوصية لمنشئي المحتوى المخصص. أحد المخاوف هو موثوقية نماذج مثل ChatGPT من حيث الحقائق، كما هو موضح في العمل [283]. بينما تولد هذه النماذج محتوى يبدو معقولًا، هناك خطر توزيع معلومات مضللة أو غير دقيقة، مما يمكن أن يضعف مصداقية محتوى الإنترنت. تصبح هذه المشكلة أكثر أهمية في التوصيات المخصصة، حيث قد يروج النموذج عن غير قصد لمعلومات مضللة مصممة لتناسب اهتمامات المستخدم. يتعلق القلق الثاني بخصوصية البيانات، التي تشمل كل من ملفات تعريف المستخدمين وتواريخ التفاعل البشرية طويلة الأمد. في حالة نماذج اللغة الكبيرة، يتم جمع هذه التواريخ أو مشاركتها، مما قد يؤدي إلى
نماذج تتذكر بيانات المستخدم الحساسة. أظهرت الأعمال السابقة [284] أن نماذج اللغة الكبيرة، وخاصة GPT-2 [285]، تتذكر وتكشف عن أمثلة تدريب فردية. وهذا يبرز الحاجة إلى موافقة صارمة من المستخدم والتعامل الحذر مع بيانات المعلقين لتقليل مخاطر الخصوصية. من الضروري تطوير تقنيات جديدة تعطي الأولوية للحفاظ على الخصوصية خلال عملية التدريب.

11 التحديات المفتوحة

11.1 صعوبات النشر

11.1.1 التحديات الصناعية

تعتبر خدمات التخصيص، وخاصة مع أنظمة التوصية، منتجات صناعية معقدة تواجه العديد من التحديات عند تنفيذها في سيناريوهات العالم الحقيقي. سنلخص الآن التحديات الرئيسية على النحو التالي:

توسيع الموارد الحاسوبية تتطلب نماذج اللغة الكبيرة الحالية، مثل BERT و GPT، طاقة حاسوبية كبيرة للتدريب والاستدلال. يشمل ذلك استخدام ذاكرة عالية واستهلاك وقت. يمكن أن يكون ضبط هذه النماذج لتوافقها مع أنظمة التخصيص، التي أظهرت نتائج واعدة لتحسين أداء التخصيص، مكلفًا من الناحية الحاسوبية. تم تطوير العديد من استراتيجيات الضبط الفعالة، مثل ضبط الخيارات في M6-Rec [175]، Lora [286]، QLora [287]، لمعالجة هذه المشكلة وتمهيد الطريق لضبط أكثر كفاءة.

زمن استجابة كبير إن تحقيق أوقات استجابة فعالة أمر حاسم للخدمة عبر الإنترنت ويؤثر بشكل كبير على تجربة المستخدم المخصصة. يشمل زمن الاستجابة كل من مرحلة الاستدلال لنماذج اللغة الكبيرة وطلبات المستخدم المتزامنة بأعداد كبيرة. يمكن أن يؤدي إدخال نماذج اللغة الكبيرة إلى زيادة زمن الاستدلال بشكل كبير، مما يشكل تحديًا للنشر في العالم الحقيقي. إحدى الطرق هي حساب تمثيلات المخرجات الوسيطة من نماذج اللغة مسبقًا، وتخزينها وفهرستها في قاعدة بيانات متجهات، خاصة للطرق التي تستخدم نماذج اللغة الكبيرة كمشفرات نصية. تهدف طرق أخرى، مثل التقطير والتكميم، إلى تحقيق توازن بين الأداء والكمون.

11.1.2 جمع البيانات الشاق

من المعروف على نطاق واسع أن نماذج اللغة الكبيرة تستفيد من كميات هائلة من المعرفة في المجالات المفتوحة خلال عمليات التدريب والضبط الخاصة بها. تشمل هذه المصادر المعرفية مراجع معروفة مثل ويكيبيديا والكتب ومواقع الويب المختلفة [3]. بالمثل، عند تطبيقها في أنظمة التوصية، غالبًا ما تعتمد هذه النماذج على مجموعات بيانات تمثيلية في المجالات المفتوحة مثل MovieLens و Amazon Books. بينما تحتوي هذه المعرفة في المجالات المفتوحة على ثروة من المعلومات البديهية، تتطلب المهام المخصصة الوصول إلى بيانات أكثر تخصصًا لا يمكن مشاركتها بسهولة. بالإضافة إلى ذلك، يمكن أن تكون طبيعة ملاحظات المستخدم في المهام المخصصة معقدة ونادرة، وغالبًا ما تكون مصحوبة بملاحظات ضوضائية. إن جمع وتصنيف هذه البيانات، على عكس الحصول على المعرفة البديهية، يمثل تحديات. يتطلب ذلك تكاليف عمل أعلى ويقدم تكرار تدريب إضافي بسبب الحاجة إلى معالجة وتصنيف البيانات بشكل مكثف. علاوة على ذلك، فإن تصميم مطالبات مناسبة لتوجيه أو ضبط نماذج اللغة الكبيرة أمر حاسم لمواءمتها مع توزيع المدخلات في المجال في مهام التخصيص. من خلال تخصيص المطالبات بعناية،
يمكن للباحثين والممارسين توجيه النموذج لإنتاج مخرجات تلبي بشكل أفضل التطبيقات المخصصة، مما يزيد من الأداء والفعالية.

11.2 قدرات نماذج اللغة الكبيرة

11.2.1 نمذجة النصوص الطويلة

تواجه نماذج اللغة الكبيرة حدًا على الحد الأقصى لعدد الرموز المدخلة التي يمكنها التعامل معها، وعادة ما تكون مقيدة بحجم نافذة السياق، مثل 4096 لـ ChatGPT. وهذا يشكل تحديات عند التعامل مع تسلسلات سلوك المستخدم الطويلة، والتي تكون شائعة في أنظمة التوصية الحديثة. من الضروري تصميم بعناية لتوليد مدخلات مطالبات فعالة ومناسبة ضمن هذا الطول المحدود. في حالة المحادثات ذات الجولات المتعددة، يمكن أن يتجاوز تراكم عدة جولات من الحوار بسهولة حد الرموز للنماذج. النهج الحالي في التعامل مع المحادثات الطويلة هو تقصير التاريخ، مع الاحتفاظ فقط بأحدث الرموز. ومع ذلك، فإن هذا التقطيع يتجاهل معلومات تاريخية قيمة، مما قد يضر بأداء النماذج. لمعالجة هذه التحديات، يمكن استخدام عدة تقنيات. إحدى الطرق هي إعطاء الأولوية واختيار الأجزاء الأكثر صلة من تسلسل سلوك المستخدم أو تاريخ المحادثة لتضمينها في المطالبة. يمكن أن يكون هذا الاختيار بناءً على معايير مختلفة مثل الحداثة أو الأهمية أو الصلة بالمهمة المطروحة. تتضمن تقنية أخرى تلخيص أو ضغط المدخلات الطويلة مع الحفاظ على المعلومات الأساسية. يمكن تحقيق ذلك من خلال تقنيات مثل التلخيص الاستخراجي أو تمثيل التسلسل الطويل في شكل مكثف. علاوة على ذلك، يمكن استكشاف التعديلات المعمارية، مثل النماذج الهرمية أو المعززة بالذاكرة، للتعامل بشكل أفضل مع التسلسلات الطويلة من خلال دمج آليات لتخزين واسترجاع المعلومات ذات الصلة بكفاءة.

بالإضافة إلى ذلك، فإن النمذجة التعاونية لبيانات النصوص الطويلة ومهام التوصية تمثل تحديًا ناشئًا وملحًا. في أنظمة التخصيص التقليدية، يتم استخدام معلومات معرف العنصر جنبًا إلى جنب مع معلومات فئوية أخرى بشكل شائع لنمذجة تفاعلات الميزات وتفضيلات المستخدم. مع ظهور نماذج اللغة الكبيرة، سيكون هناك اتجاه متزايد نحو الاستفادة من المعلومات النصية بشكل أكثر شمولاً. توفر البيانات النصية رؤى فريدة حول العناصر أو المستخدمين، مما يجعلها قيمة لأغراض النمذجة. من منظور النمذجة، يتطلب التعامل مع بيانات النصوص الطويلة مزيدًا من الاهتمام والتعقيد مقارنةً بالبيانات الفئوية، ناهيك عن الحاجة إلى مطابقة نمذجة اهتمامات المستخدم. من منظور التنفيذ، يصبح من الضروري إصلاح خط الأنابيب بالكامل لاستيعاب متطلبات الكمون الفعال. هناك تحديات تقنية تتمثل في دمج بيانات النصوص الطويلة في نماذج التوصية وتقديمها في الوقت الحقيقي.

11.2.2 القابلية للتفسير والشرح

بينما توفر نماذج اللغة الكبيرة قدرات جيدة في التفكير، إلا أنها معروفة بطبيعة ‘الصندوق الأسود’، التي تكون معقدة وغير خطية بشكل كبير في حجمها الضخم وبنيتها الطبقية، مما يجعل من الصعب فهم كيفية عملها الداخلية وفهم عملية توليد التوصيات. بدون فهم عميق لكيفية عمل النموذج، يصبح من الصعب اكتشاف ومعالجة التحيزات أو ضمان توصيات عادلة وأخلاقية. بمجرد أن تفتقر الشفافية حول الآليات الداخلية، يكافح المستخدمون للثقة في القرارات التي يتخذها النظام وقبولها. غالبًا ما يرغب المستخدمون في الحصول على تفسيرات مفهومة للاختيارات الموصى بها. يتطلب معالجة تحدي قابلية تفسير النموذج وشرحه بحثًا يتضمن معالجة اللغة الطبيعية، والذكاء الاصطناعي القابل للتفسير، وتفاعل الإنسان مع الكمبيوتر، وأنظمة التوصية. إن تطوير تقنيات
التي تكشف عن كيفية عمل نماذج اللغة، وتسهيل توليد تفسيرات ذات مغزى ودقيقة، وتمكين طرق تقييم قوية هو التركيز الرئيسي. من خلال تقديم توصيات شفافة وقابلة للتفسير، يمكن للمستخدمين بناء الثقة، وفهم الأسباب وراء التوصيات، واتخاذ قرارات مستنيرة.

11.3 التقييم

11.3.1 الأدوات والمعايير

تعتمد أنظمة التخصيص التقليدية عادةً على مقاييس محددة للمهام مثل مقاييس الترتيب، NDCG، AUC، وRecall لتقييم أداء النموذج. ومع ذلك، مع دمج نماذج اللغة الكبيرة في أنظمة التوصية، تخضع أدوات التقييم والمعايير لتغييرات كبيرة. قد لا تكون المقاييس التقليدية كافية لالتقاط أداء أنظمة التوصية المدعومة بنماذج اللغة الكبيرة، التي تقدم قدرات جديدة وتولد توصيات بطريقة مختلفة وتتطلب تطوير أدوات تقييم جديدة.

أحد الجوانب الحاسمة في التقييم هو أخذ تفضيلات المستخدمين في الأنظمة المدعومة بنماذج اللغة الكبيرة في الاعتبار، مما يتطلب نهجًا يركز على المستخدم. تصبح مقاييس مثل رضا المستخدم، والانخراط، والتجربة العامة اعتبارات أساسية. على سبيل المثال، يقترح عمل ليو [162] مهمة جمع بيانات من الجمهور لتقييم جودة الشروحات والملخصات التي تم إنشاؤها، مما يوفر وسيلة لتقييم فعالية المحتوى الناتج. بالإضافة إلى ذلك، يمكن أن تكون استبيانات رضا المستخدمين واستبيانات التعليقات خيارات قيمة.

وجهة نظر أخرى يجب أخذها في الاعتبار هي صحة النظام، والتي تتضمن تقييم الجدة وتقييم عوامل مثل التنوع، والجدة، والصدفة، ومعدلات احتفاظ المستخدمين. تساعد هذه المقاييس في تقييم حداثة التوصيات وآثار نماذج اللغة الكبيرة على المدى الطويل. علاوة على ذلك، من الضروري تقييم قابلية تفسير التوصيات وإنصافها. يركز تقييم قابلية التفسير على قياس الوضوح، والفهم، والشفافية في التوصيات. في الوقت نفسه، يهدف تقييم الإنصاف إلى معالجة التحيزات المحتملة في النتائج المخصصة. من خلال إعطاء الأولوية للإنصاف، نسعى إلى خلق تجارب مخصصة تكون عادلة وشاملة لجميع المستخدمين. كلا هذين التقييمين ضروريان لتعزيز تجربة المستخدم العامة وبناء الثقة في التوصيات المخصصة التي يقدمها النظام.

11.3.2 التوازن بين الفائدة، الصدق، وعدم الإيذاء

عندما يتم استخدام نماذج اللغة الكبيرة للتخصيص، ستتضخم بعض عيوبها. قد يأتي السعي نحو نظام أكثر صدقًا وأقل ضررًا على حساب أداء النظام.

أولاً وقبل كل شيء، يجب ضمان دقة وموضوعية النظام. على الرغم من أن نماذج اللغة الكبيرة يمكن أن تولد محتوى يبدو معقولاً، إلا أن هناك خطرًا في نشر معلومات مضللة أو غير دقيقة. يصبح هذا الأمر أكثر أهمية عند دمج ملاحظات المستخدمين، حيث قد يحاكي النموذج سلوكيات المستخدمين في محاولة للظهور بمظهر الصدق. ومع ذلك، يمكن أن تؤدي هذه المحاكاة إلى توجيه متحيز للمستخدمين، مما لا يقدم أي فوائد حقيقية.

ثانيًا، من حيث عدم الضرر، تثار المخاوف بشأن الخصوصية والتمييز والأخلاقيات. بينما تمتلك نماذج اللغة الكبيرة القدرة على تقديم توصيات مخصصة للغاية من خلال الاستفادة من بيانات المستخدم، تصبح الخصوصية وأمان البيانات في غاية الأهمية. على عكس مجموعات البيانات المفتوحة، يجب أن تكون خصوصية البيانات الفردية المستخدمة في التدريب صارمة.
محمي، مع أذونات مستخدم صارمة لمشاركة معلوماتهم الشخصية. بالنسبة للتمييز، قد تعكس نماذج اللغة الكبيرة بشكل لا مفر منه التحيزات الموجودة في بيانات التدريب، مما يؤدي إلى توصيات تمييزية. بالنظر إلى توزيع المستخدمين والعناصر المتحيز، والذي يكون أكثر أهمية في أنظمة التوصية ذات تأثير الذيل الطويل، حيث يمكن أن يؤدي توزيع المستخدمين والعناصر المتحيز إلى قرارات تفضل الخيارات الأكثر شيوعًا، مما يؤدي إلى التمييز ضد بعض المستخدمين. تتعلق القلق النهائي بالاعتبارات الأخلاقية. يمكن أن توجه الرسائل الضارة، إذا تم النقر عليها من قبل المستخدمين بشكل غير واعٍ، نماذج اللغة الكبيرة نحو إنتاج محتوى ضار مشابه. ومع ذلك، عند المساعدة في اتخاذ القرارات الشخصية، من الضروري أن تمتلك نماذج اللغة الكبيرة القدرة على تقليل التعرض للرسائل الضارة وتوجيه المستخدمين بطريقة مسؤولة. قد تقدم أساليب مثل بناء ذكاء اصطناعي دستوري، حيث يتم اعتماد النقد والمراجعات والتعلم تحت الإشراف لتحسين نماذج التدريب، رؤى قيمة.

من خلال معالجة هذه المخاوف، وحماية الخصوصية، والتقليل من التمييز، والالتزام بالإرشادات الأخلاقية، يمكن أن تستفيد أنظمة التوصية من قوة نماذج اللغة الكبيرة مع ضمان ثقة المستخدمين، والعدالة، والتوصيات المسؤولة.

12 الخاتمة

في الختام، يمثل ظهور نماذج اللغة الكبيرة اختراقًا كبيرًا في مجال الذكاء الاصطناعي. لقد فتحت قدراتها المحسنة في الفهم، وتحليل اللغة، والتفكير السليم آفاقًا جديدة للتخصيص. في هذه الورقة، نقدم عدة وجهات نظر حول متى تتكيف نماذج اللغة الكبيرة مع أنظمة التخصيص. لقد لاحظنا تقدمًا من استخدام القدرات الأساسية لنماذج اللغة الكبيرة لتعزيز الأداء، إلى الاستفادة من إمكانياتها في التفاعلات المعقدة مع الأدوات الخارجية للمهام الشاملة. يعد هذا التطور بثورة في طريقة تقديم الخدمات المخصصة. كما نعترف بالتحديات المفتوحة التي تأتي مع دمج نماذج اللغة الكبيرة في أنظمة التخصيص.

مساهمات المؤلفين نظم جين تشين الكتابة الرئيسية وكتب النص الرئيسي للمخطوطة. نظم زينغ ليو الهيكل الكامل للمخطوطة وكتب مقدمة نماذج اللغة الكبيرة. كتب شيو هوانغ القسم 8: نماذج اللغة الكبيرة كوكيل محادثة. كتب تشينغوانغ وو القسم 6: نماذج اللغة الكبيرة كمفسر. كتب تشي ليو وغانغوي جيانغ القسم 5: نماذج اللغة الكبيرة كمفسر محتوى. كتب يوانهاو بو ويوشوان لي وشياولونغ تشين القسم 9: تعلم الأدوات وتطبيقاته في التوصية. كتبت شينغمي وانغ القسم 7.2 التفكير للاختيار الآلي. قام كاي تشنغ وديفو ليان وإنهونغ تشين بمراجعة المقالة مرارًا وتكرارًا. راجع جميع المؤلفين المخطوطة.

تم توفير تمويل الوصول المفتوح من قبل جامعة هونغ كونغ للعلوم والتكنولوجيا. تم دعم هذا العمل من قبل المؤسسة الوطنية للعلوم الطبيعية في الصين (NSFC) رقم 62022077.

توفر البيانات لم يتم إنشاء أو تحليل أي مجموعات بيانات خلال الدراسة الحالية.

الإعلانات

تناقض المصالح يعلن المؤلفون عدم وجود تناقضات في المصالح.
الموافقة الأخلاقية لا تتضمن هذه البحث مشاركين بشريين أو تشكل خطرًا محتملاً على الأفراد. وبالتالي، لم تكن الموافقة الأخلاقية الرسمية مطلوبة. ومع ذلك، نود أن نؤكد التزامنا بممارسات البحث الأخلاقية ونؤكد أن الدراسة تتماشى مع الإرشادات والمبادئ الأخلاقية المنصوص عليها في مدونة الأخلاقيات التابعة لجمعية الحاسبات الآلية.

الوصول المفتوح هذه المقالة مرخصة بموجب رخصة المشاع الإبداعي النسب 4.0 الدولية، التي تسمح بالاستخدام والمشاركة والتكيف والتوزيع وإعادة الإنتاج بأي وسيلة أو صيغة، طالما أنك تعطي الائتمان المناسب للمؤلفين الأصليين والمصدر، وتوفر رابطًا لرخصة المشاع الإبداعي، وتوضح إذا ما تم إجراء تغييرات. الصور أو المواد الأخرى من طرف ثالث في هذه المقالة مشمولة في رخصة المشاع الإبداعي الخاصة بالمقالة، ما لم يُشار إلى خلاف ذلك في سطر الائتمان للمواد. إذا لم تكن المادة مشمولة في رخصة المشاع الإبداعي الخاصة بالمقالة وكان استخدامك المقصود غير مسموح به بموجب اللوائح القانونية أو يتجاوز الاستخدام المسموح به، فسيتعين عليك الحصول على إذن مباشرة من صاحب حقوق الطبع والنشر. لعرض نسخة من هذه الرخصة، قم بزيارةhttp://creativecommons.org/licenses/by/4.0/.

References

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. arXiv:2303.18223 (2023)
Huang, J., Chang, K.C.-C.: Towards reasoning in large language models: a survey. arXiv:2212.10403 (2022)
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877-1901 (2020)
Salemi, A., Mysore, S., Bendersky, M., Zamani, H.: Lamp: When large language models meet personalization. arXiv:2304.11406 (2023)
Wu, L., Zheng, Z., Qiu, Z., Wang, H., Gu, H., Shen, T., Qin, C., Zhu, C., Zhu, H., Liu, Q., et al.: A survey on large language models for recommendation. arXiv:2305.19860 (2023)
Lin, J., Dai, X., Xi, Y., Liu, W., Chen, B., Li, X., Zhu, C., Guo, H., Yu, Y., Tang, R., et al.: How can recommender systems benefit from large language models: a survey. arXiv:2306.05817 (2023)
Fan, W., Zhao, Z., Li, J., Liu, Y., Mei, X., Wang, Y., Tang, J., Li, Q.: Recommender systems in the era of large language models (llms). arXiv:2307.02046 (2023)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175-186 (1994)
Pan, R., Zhou, Y., Cao, B., Liu, N.N., Lukose, R., Scholz, M., Yang, Q.: One-class collaborative filtering. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 502-511 (2008). IEEE
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30-37 (2009)
Wang, J., De Vries, A.P., Reinders, M.J.: Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 501-508 (2006)
Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: The adaptive Web: Methods and Strategies of Web Personalization, pp. 325-341. Springer (2007)
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 448-456 (2011)
Zhang, F., Yuan, N.J., Lian, D., Xie, X., Ma, W.-Y.: Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 353-362 (2016)
Liu, H., Wu, F., Wang, W., Wang, X., Jiao, P., Wu, C., Xie, X.: Nrpa: neural recommendation with personalized attention. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1233-1236 (2019)
Wang, H., Wang, N., Yeung, D.-Y.: Collaborative deep learning for recommender systems. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235-1244 (2015)
Zhou, G., Zhu, X., Song, C., Fan, Y., Zhu, H., Ma, X., Yan, Y., Jin, J., Li, H., Gai, K.: Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1059-1068 (2018)
Zhou, G., Mou, N., Fan, Y., Pi, Q., Bian, W., Zhou, C., Zhu, X., Gai, K.: Deep interest evolution network for click-through rate prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5941-5948 (2019)
Wang, X., He, X., Wang, M., Feng, F., Chua, T.-S.: Neural graph collaborative filtering. In: Proceedings of the 42 nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165-174 (2019)
Kang, W.-C., McAuley, J.: Self-attentive sequential recommendation. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 197-206 (2018). IEEE
Hidasi, B., Karatzoglou, A., Baltrunas, L., Tikk, D.: Session-based recommendations with recurrent neural networks. arXiv:1511.06939 (2015)
Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., Jiang, P.: Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1441-1450 (2019)
Paschou, M., Sakkopoulos, E.: Personalized assistant apps in healthcare: a systematic review. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1-8 (2019). IEEE
Sun, Y., Zhang, Y.: Conversational recommender system. In: The 41st International Acm Sigir Conference on Research & Development in Information Retrieval, pp. 235-244 (2018)
Jannach, D., Manzoor, A., Cai, W., Chen, L.: A survey on conversational recommender systems. ACM Computing Surveys (CSUR) 54(5), 1-36 (2021)
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. Adv. Neural. Inf. Process. Syst. 13 (2000)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, pp. 1045-1048 (2010). Makuhari
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30 (2017)
Kenton, J.D.M.-W.C., Toutanova, L.K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171-4186 (2019)
Shanahan, M.: Talking about large language models. Commun. ACM 67(2), 68-79 (2024)
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al.: Palm: scaling language modeling with pathways. J. Mach. Learn. Res. 24(240), 1-113 (2023)
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: open and efficient foundation language models. arXiv:2302.13971 (2023)
Le Scao, T., Wang, T., Hesslow, D., Bekman, S., Bari, M.S., Biderman, S., Elsahar, H., Muennighoff, N., Phang, J., Press, O., et al.: What language model to train if you have one million gpu hours?. In: Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 765-782 (2022)
Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models. arXiv:2001.08361 (2020)
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D.d.L., Hendricks, L.A., Welbl, J., Clark, A., et al.: Training compute-optimal large language models. arXiv:2203.15556 (2022)
Sanh, V., Webson, A., Raffel, C., Bach, S., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Raja, A., Dey, M., et al.: Multitask prompted training enables zero-shot task generalization. In: International Conference on Learning Representations (2021)
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730-27744 (2022)
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., Zhou, D.: Chain of thought prompting elicits reasoning in large language models. arXiv:2201.11903 (2022)
Fu, Y., Peng, H., Khot, T.: How does gpt obtain its ability? tracing emergent abilities of language models to their sources. Yao Fu’s Notion (2022)
Zhang, Y., Chen, X., Ai, Q., Yang, L., Croft, W.B.: Towards conversational search and recommendation: system ask, user respond. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 177-186 (2018)
Wang, H., Zhang, F., Xie, X., Guo, M.: Dkn: deep knowledge-aware network for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 1835-1844 (2018)
Huang, J., Zhao, W.X., Dou, H., Wen, J.-R., Chang, E.Y.: Improving sequential recommendation with knowledge-enhanced memory networks. In: The 41st international ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 505-514 (2018)
Wang, H., Zhang, F., Hou, M., Xie, X., Guo, M., Liu, Q.: Shine: signed heterogeneous information network embedding for sentiment link prediction. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 592-600 (2018)
Yu, X., Ren, X., Gu, Q., Sun, Y., Han, J.: Collaborative filtering with entity similarity regularization in heterogeneous information networks. IJCAI HINA 27 (2013)
Shi, C., Zhang, Z., Luo, P., Yu, P.S., Yue, Y., Wu, B.: Semantic path based personalized recommendation on weighted heterogeneous information networks. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 453-462 (2015)
Ma, W., Zhang, M., Cao, Y., Jin, W., Wang, C., Liu, Y., Ma, S., Ren, X.: Jointly learning explainable rules for recommendation with knowledge graph. In: The World Wide Web Conference, pp. 1210-1221 (2019)
Huang, X., Fang, Q., Qian, S., Sang, J., Li, Y., Xu, C.: Explainable interaction-driven user modeling over knowledge graph for sequential recommendation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 548-556 (2019)
Wang, H., Zhang, F., Wang, J., Zhao, M., Li, W., Xie, X., Guo, M.: Ripplenet: propagating user preferences on the knowledge graph for recommender systems. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 417-426 (2018)
Wang, H., Zhao, M., Xie, X., Li, W., Guo, M.: Knowledge graph convolutional networks for recommender systems. In: The World Wide Web Conference, pp. 3307-3313 (2019)
Wang, X., He, X., Cao, Y., Liu, M., Chua, T.-S.: Kgat: knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 950-958 (2019)
Tang, X., Wang, T., Yang, H., Song, H.: Akupm: attention-enhanced knowledge-aware user preference model for recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1891-1899 (2019)
Zhao, J., Zhou, Z., Guan, Z., Zhao, W., Ning, W., Qiu, G., He, X.: Intentgc: a scalable graph convolution framework fusing heterogeneous information for recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2347-2357 (2019)
Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A.H., Riedel, S.: Language models as knowledge bases?. arXiv:1909.01066 (2019)
Roberts, A., Raffel, C., Shazeer, N.: How much knowledge can you pack into the parameters of a language model?. arXiv:2002.08910 (2020)
Petroni, F., Lewis, P., Piktus, A., Rocktäschel, T., Wu, Y., Miller, A.H., Riedel, S.: How context affects language models’ factual predictions. In: Automated Knowledge Base Construction
Jiang, Z., Xu, F.F., Araki, J., Neubig, G.: How can we know what language models know? Trans. Assoc. Comput. Linguist 8, 423-438 (2020)
Wang, C., Liu, X., Song, D.: Language models are open knowledge graphs. arXiv:2010.11967 (2020)
Poerner, N., Waltinger, U., Schütze, H.: E-bert: efficient-yet-effective entity embeddings for bert. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 803-818 (2020)
Heinzerling, B., Inui, K.: Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1772-1791 (2021)
Wang, C., Liu, P., Zhang, Y.: Can generative pre-trained language models serve as knowledge bases for closed-book qa?. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 1: long papers), pp. 3241-3251 (2021)
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: International Conference on Machine Learning, pp. 3929-3938 (2020). PMLR
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. Adv. Neural. Inf. Process. Syst. 26 (2013)
Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., Zhang, N.: Llms for knowledge graph construction and reasoning: recent capabilities and future opportunities. arXiv:2305.13168 (2023)
Zhang, Z., Liu, X., Zhang, Y., Su, Q., Sun, X., He, B.: Pretrain-kge: learning knowledge representation from pretrained language models. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 259-266 (2020)
Kumar, A., Pandey, A., Gadia, R., Mishra, M.: Building knowledge graph using pre-trained language model for learning entity-aware relationships. In: 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), pp. 310-315 (2020). IEEE
Kim, B., Hong, T., Ko, Y., Seo, J.: Multi-task learning for knowledge graph completion with pre-trained language models. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1737-1743 (2020)
Choi, B., Jang, D., Ko, Y.: Mem-kgc: masked entity model for knowledge graph completion with pretrained language model. IEEE Access 9, 132025-132032 (2021)
Wang, B., Shen, T., Long, G., Zhou, T., Wang, Y., Chang, Y.: Structure-augmented text representation learning for efficient knowledge graph completion. In: Proceedings of the Web Conference 2021, pp. 1737-1748 (2021)
Xie, X., Zhang, N., Li, Z., Deng, S., Chen, H., Xiong, F., Chen, M., Chen, H.: From discrimination to generation: knowledge graph completion with generative transformer. In: Companion Proceedings of the Web Conference 2022, pp. 162-165 (2022)
Jiang, P., Agarwal, S., Jin, B., Wang, X., Sun, J., Han, J.: Text-augmented open knowledge graph completion via pre-trained language models. arXiv:2305.15597 (2023)
Yan, H., Gui, T., Dai, J., Guo, Q., Zhang, Z., Qiu, X.: A unified generative framework for various ner subtasks. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 1: long papers), pp. 5808-5822 (2021)
Li, B., Yin, W., Chen, M.: Ultra-fine entity typing with indirect supervision from natural language inference. Trans. Assoc. Comput. Linguist. 10, 607-622 (2022)
Kirstain, Y., Ram, O., Levy, O.: Coreference resolution without span representations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 2: short papers), pp. 14-19 (2021)
Cattan, A., Eirew, A., Stanovsky, G., Joshi, M., Dagan, I.: Cross-document coreference resolution over predicted mentions. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 5100-5107 (2021)
Lyu, S., Chen, H.: Relation classification with entity type restriction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 390-395 (2021)
Wang, H., Focke, C., Sylvester, R., Mishra, N., Wang, W.: Fine-tune bert for docred with two-step process. arXiv:1909.11898 (2019)
Han, J., Collier, N., Buntine, W., Shareghi, E.: Pive: prompting with iterative verification improving graph-based generative capability of llms. arXiv:2305.12392 (2023)
Trajanoska, M., Stojanov, R., Trajanov, D.: Enhancing knowledge graph construction using large language models. arXiv:2305.04676 (2023)
West, P., Bhagavatula, C., Hessel, J., Hwang, J., Jiang, L., Le Bras, R., Lu, X., Welleck, S., Choi, Y.: Symbolic knowledge distillation: from general language models to commonsense models. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4602-4625 (2022)
Xi, Y., Liu, W., Lin, J., Zhu, J., Chen, B., Tang, R., Zhang, W., Zhang, R., Yu, Y.: Towards open-world recommendation with knowledge augmentation from large language models. arXiv:2306.10933 (2023)
Wei, W., Ren, X., Tang, J., Wang, Q., Su, L., Cheng, S., Wang, J., Yin, D., Huang, C.: Llmrec: large language models with graph augmentation for recommendation. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pp. 806-815 (2024)
Zhao, Q., Qian, H., Liu, Z., Zhang, G.-D., Gu, L.: Breaking the barrier: utilizing large language models for industrial recommendation systems through an inferential knowledge graph. arXiv:2402.13750 (2024)
Razniewski, S., Yates, A., Kassner, N., Weikum, G.: Language models as or for knowledge bases. arXiv:2110.04888 (2021)
Yu, J., Wang, X., Tu, S., Cao, S., Zhang-Li, D., Lv, X., Peng, H., Yao, Z., Zhang, X., Li, H., et al.: Kola: Carefully benchmarking world knowledge of large language models. arXiv:2306.09296 (2023)
Ye, D., Lin, Y., Li, P., Sun, M.: Packed levitated marker for entity and relation extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 1: long papers), pp. 4904-4917 (2022)
Lang, K.: Newsweeder: Learning to filter netnews. In: Machine Learning Proceedings 1995, pp. 331-339. Elsevier (1995)
Wang, H., Shi, X., Yeung, D.-Y.: Collaborative recurrent autoencoder: recommend while learning to fill in the blanks. Adv. Neural. Inf. Process. Syst. 29 (2016)
Dong, X., Yu, L., Wu, Z., Sun, Y., Yuan, L., Zhang, F.: A hybrid collaborative filtering model with deep structure for recommender systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Li, X., She, J.: Collaborative variational autoencoder for recommender systems. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 305-314 (2017)
Wu, C., Wu, F., Huang, Y., Xie, X.: Personalized news recommendation: methods and challenges. ACM Trans. Inf. Syst. 41(1), 24-12450 (2023)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML. JMLR Workshop and Conference Proceedings, vol. 32, pp. 1188-1196. JMLR.org (2014)
Song, Y., Elkahky, A.M., He, X.: Multi-rate deep learning for temporal recommendation. In: SIGIR, pp. 909-912. ACM (2016)
Kumar, V., Khattar, D., Gupta, S., Gupta, M., Varma, V.: Deep neural architecture for news recommendation. In: CLEF (Working Notes). CEUR Workshop Proceedings, vol. 1866. CEUR-WS.org (2017)
Okura, S., Tagami, Y., Ono, S., Tajima, A.: Embedding-based news recommendation for millions of users. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1933-1942 (2017)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (Workshop Poster) (2013)
Wu, C., Wu, F., An, M., Huang, J., Huang, Y., Xie, X.: NPA: neural news recommendation with personalized attention. In: KDD, pp. 2576-2584. ACM (2019)
An, M., Wu, F., Wu, C., Zhang, K., Liu, Z., Xie, X.: Neural news recommendation with long- and short-term user representations. In: ACL (1), pp. 336-345. Association for Computational Linguistics (2019)
Wu, C., Wu, F., Ge, S., Qi, T., Huang, Y., Xie, X.: Neural news recommendation with multi-head selfattention. In: EMNLP/IJCNLP (1), pp. 6388-6393. Association for Computational Linguistics (2019)
Wu, C., Wu, F., Qi, T., Huang, Y.: User modeling with click preference and reading satisfaction for news recommendation. In: IJCAI, pp. 3023-3029. ijcai.org (2020)
Khattar, D., Kumar, V., Varma, V., Gupta, M.: Weave&rec: a word embedding based 3-d convolutional network for news recommendation. In: CIKM, pp. 1855-1858. ACM (2018)
Zhu, Q., Zhou, X., Song, Z., Tan, J., Guo, L.: DAN: deep attention neural network for news recommendation. In: AAAI, pp. 5973-5980. rAAAI Press (2019)
Qiu, Z., Wu, X., Gao, J., Fan, W.: U-bert: Pre-training user representations for improved recommendation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4320-4327 (2021)
Zhang, Q., Li, J., Jia, Q., Wang, C., Zhu, J., Wang, Z., He, X.: Unbert: user-news matching bert for news recommendation. In: IJCAI, pp. 3356-3362 (2021)
Wu, C., Wu, F., Qi, T., Huang, Y.: Empowering news recommendation with pre-trained language models. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1652-1656 (2021)
Liu, Q., Zhu, J., Dai, Q., Wu, X.: Boosting deep ctr prediction with a plug-and-play pre-trainer for news recommendation. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 2823-2833 (2022)
Wu, C., Wu, F., Qi, T., Zhang, C., Huang, Y., Xu, T.: Mm-rec: visiolinguistic model empowered multimodal news recommendation. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2560-2564 (2022)
Yu, Y., Wu, F., Wu, C., Yi, J., Liu, Q.: Tiny-newsrec: effective and efficient plm-based news recommendation. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 5478-5489 (2022)
Zou, L., Zhang, S., Cai, H., Ma, D., Cheng, S., Wang, S., Shi, D., Cheng, Z., Yin, D.: Pre-trained language model based ranking in baidu search. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 4014-4022 (2021)
Liu, Y., Lu, W., Cheng, S., Shi, D., Wang, S., Cheng, Z., Yin, D.: Pre-trained language model for webscale retrieval in baidu search. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3365-3375 (2021)
Muhamed, A., Keivanloo, I., Perera, S., Mracek, J., Xu, Y., Cui, Q., Rajagopalan, S., Zeng, B., Chilimbi, T.: Ctr-bert: cost-effective knowledge distillation for billion-parameter teacher models. In: NeurIPS Efficient Natural Language and Speech Processing Workshop (2021)
He, J., Xu, B., Yang, Z., Han, D., Yang, C., Lo, D.: Ptm4tag: sharpening tag recommendation of stack overflow posts with pre-trained models. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pp. 1-11 (2022)
Zhang, X., Malkov, Y., Florez, O., Park, S., McWilliams, B., Han, J., El-Kishky, A.: Twhin-bert: a socially-enriched pre-trained language model for multilingual tweet representations. arXiv:2209.07562 (2022)
Rahmani, S., Naghshzan, A., Guerrouj, L.: Improving code example recommendations on informal documentation using bert and query-aware 1sh: a comparative study. arXiv:2305.03017 (2023)
Ding, H., Ma, Y., Deoras, A., Wang, Y., Wang, H.: Zero-shot recommender systems. arXiv:2105.08318 (2021)
Hou, Y., Mu, S., Zhao, W.X., Li, Y., Ding, B., Wen, J.-R.: Towards universal sequence representation learning for recommender systems. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 585-593 (2022)
Hou, Y., He, Z., McAuley, J., Zhao, W.X.: Learning vector-quantized item representation for transferable sequential recommenders. In: Proceedings of the ACM Web Conference 2023, pp. 1162-1171 (2023)
Yuan, Z., Yuan, F., Song, Y., Li, Y., Fu, J., Yang, F., Pan, Y., Ni, Y.: Where to go next for recommender systems? id-vs. modality-based recommender models revisited. arXiv:2303.13835 (2023)
Fu, J., Yuan, F., Song, Y., Yuan, Z., Cheng, M., Cheng, S., Zhang, J., Wang, J., Pan, Y.: Exploring adapter-based transfer learning for recommender systems: empirical studies and practical insights. arXiv:2305.15036 (2023)
Bao, K., Zhang, J., Zhang, Y., Wang, W., Feng, F., He, X.: Tallrec: an effective and efficient tuning framework to align large language model with recommendation. arXiv:2305.00447 (2023)
Kang, W.-C., Ni, J., Mehta, N., Sathiamoorthy, M., Hong, L., Chi, E., Cheng, D.Z.: Do llms understand user preferences?. evaluating llms on user rating prediction. arXiv:2305.06474 (2023)
Chen, Z.: Palr: Personalization aware llms for recommendation. arXiv:2305.07622 (2023)
Zhang, J., Xie, R., Hou, Y., Zhao, W.X., Lin, L., Wen, J.-R.: Recommendation as instruction following: a large language model empowered recommendation approach. arXiv:2305.07001 (2023)
Li, R., Deng, W., Cheng, Y., Yuan, Z., Zhang, J., Yuan, F.: Exploring the upper limits of text-based collaborative filtering using large language models: discoveries and insights. arXiv:2305.11700 (2023)
Wang, X., Chen, Y., Yang, J., Wu, L., Wu, Z., Xie, X.: A reinforcement learning framework for explainable recommendation. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 587-596 (2018). IEEE
Gao, J., Wang, X., Wang, Y., Xie, X.: Explainable recommendation through attentive multi-view learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3622-3629 (2019)
Lee, S., Wang, X., Han, S., Yi, X., Xie, X., Cha, M.: Self-explaining deep models with logic rule reasoning. Adv. Neural. Inf. Process. Syst. (2022)
Nye, M., Andreassen, A.J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., Luan, D., et al.: Show your work: scratchpads for intermediate computation with language models. arXiv:2112.00114 (2021)
Lampinen, A.K., Dasgupta, I., Chan, S.C., Matthewson, K., Tessler, M.H., Creswell, A., McClelland, J.L., Wang, J.X., Hill, F.: Can language models learn from explanations in context?. arXiv:2204.02329 (2022)
Zelikman, E., Wu, Y., Mu, J., Goodman, N.: Star: bootstrapping reasoning with reasoning. Adv. Neural. Inf. Process. Syst. 35, 15476-15488 (2022)
Zhang, Y., Chen, X., et al.: Explainable recommendation: a survey and new perspectives. Found. Trends® in Inf. Retr. 14(1), 1-101 (2020)
Schafer, J.B., Konstan, J., Riedl, J.: Recommender systems in e-commerce. In: Proceedings of the 1st ACM Conference on Electronic Commerce, pp. 158-166 (1999)
Linden, G., Smith, B., York, J.: Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76-80 (2003)
Gomez-Uribe, C.A., Hunt, N.: The netflix recommender system: algorithms, business value, and innovation. ACM Trans. Manage. Inf. Syst. (TMIS) 6(4), 1-19 (2015)
Sinha, R., Swearingen, K.: The role of transparency in recommender systems. In: CHI’02 Extended Abstracts on Human Factors in Computing Systems, pp. 830-831 (2002)
Xian, Y., Zhao, T., Li, J., Chan, J., Kan, A., Ma, J., Dong, X.L., Faloutsos, C., Karypis, G., Muthukrishnan, S., et al.: Ex3: explainable attribute-aware item-set recommendations. In: Proceedings of the 15th ACM Conference on Recommender Systems, pp. 484-494 (2021)
Wang, X., Li, Q., Yu, D., Xu, G.: Reinforced path reasoning for counterfactual explainable recommendation. arXiv:2207.06674 (2022)
Verma, S., Beniwal, A., Sadagopan, N., Seshadri, A.: Recxplainer: post-hoc attribute-based explanations for recommender systems. arXiv:2211.14935 (2022)
Zhang, W., Yan, J., Wang, Z., Wang, J.: Neuro-symbolic interpretable collaborative filtering for attributebased recommendation. In: Proceedings of the ACM Web Conference 2022, pp. 3229-3238 (2022)
Li, P., Wang, Z., Ren, Z., Bing, L., Lam, W.: Neural rating regression with abstractive tips generation for recommendation. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345-354 (2017)
Dong, L., Huang, S., Wei, F., Lapata, M., Zhou, M., Xu, K.: Learning to generate product reviews from attributes. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: volume 1, long papers, pp. 623-632 (2017)
Li, L., Zhang, Y., Chen, L.: Generate neural template explanations for recommendation. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 755-764 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735-1780 (1997)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014)
Li, L., Zhang, Y., Chen, L.: Personalized transformer for explainable recommendation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 1: Long papers) (2021)
Zhan, H., Li, L., Li, S., Liu, W., Gupta, M., Kot, A.C.: Towards explainable recommendation via bertguided explanation generator. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5 (2023). IEEE
Ni, J., Li, J., McAuley, J.: Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 188-197 (2019)
Liu, Z., Ma, Y., Schubert, M., Ouyang, Y., Rong, W., Xiong, Z.: Multimodal contrastive transformer for explainable recommendation. IEEE Transactions on Computational Social Systems (2023)
Qu, Y., Nobuhara, H.: Explanation generated for sequential recommendation based on transformer model. In: 2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS&ISIS), pp. 1-6 (2022). IEEE
Bai, P., Xia, Y., Xia, Y.: Fusing knowledge and aspect sentiment for explainable recommendation. IEEE Access 8, 137150-137160 (2020)
Wang, L., Zhang, S., Wang, Y., Lim, E.-P., Wang, Y.: Llm4vis: explainable visualization recommendation using chatgpt. arXiv:2310.07652 (2023)
Lei, Y., Lian, J., Yao, J., Huang, X., Lian, D., Xie, X.: Recexplainer: aligning large language models for recommendation model interpretability. arXiv:2311.10947 (2023)
Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.: On the opportunities and risks of foundation models. arXiv:2108.07258 (2021)
Li, L., Zhang, Y., Chen, L.: Personalized prompt learning for explainable recommendation. ACM Trans. Inf. Syst. 41(4), 1-26 (2023)
Bills, S., Cammarata, N., Mossing, D., Tillman, H., Gao, L., Goh, G., Sutskever, I., Leike, J., Wu, J., Saunders, W.: Language models can explain neurons in language models (2023)
Wu, Z., Geiger, A., Potts, C., Goodman, N.D.: Interpretability at scale: identifying causal mechanisms in alpaca. arXiv:2305.08809 (2023)
Li, Y., Lin, Z., Zhang, S., Fu, Q., Chen, B., Lou, J.-G., Chen, W.: Making large language models better reasoners with step-aware verifier (2023)
Turpin, M., Michael, J., Perez, E., Bowman, S.R.: Language models don’t always say what they think: unfaithful explanations in chain-of-thought prompting. arXiv:2305.04388 (2023)
Li, S., Liu, H., Dong, T., Zhao, B.Z.H., Xue, M., Zhu, H., Lu, J.: Hidden backdoors in human-centric language models. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 3123-3140 (2021)
Wang, J., Hu, X., Hou, W., Chen, H., Zheng, R., Wang, Y., Yang, L., Huang, H., Ye, W., Geng, X., et al.: On the robustness of chatgpt: an adversarial and out-of-distribution perspective. arXiv:2302.12095 (2023)
Han, R., Peng, T., Yang, C., Wang, B., Liu, L., Wan, X.: Is information extraction solved by chatgpt? an analysis of performance, evaluation criteria, robustness and errors. arXiv:2305.14450 (2023)
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., et al.: Emergent abilities of large language models. arXiv:2206.07682 (2022)
Liu, J., Liu, C., Lv, R., Zhou, K., Zhang, Y.: Is chatgpt a good recommender? a preliminary study. arXiv:2304.10149 (2023)
Dai, S., Shao, N., Zhao, H., Yu, W., Si, Z., Xu, C., Sun, Z., Zhang, X., Xu, J.: Uncovering chatgpt’s capabilities in recommender systems. arXiv:2305.02182 (2023)
Wang, L., Lim, E.-P.: Zero-shot next-item recommendation using large pretrained language models. arXiv:2304.03153 (2023)
Hou, Y., Zhang, J., Lin, Z., Lu, H., Xie, R., McAuley, J., Zhao, W.X.: Large language models are zero-shot rankers for recommender systems. arXiv:2305.08845 (2023)
Li, X., Zhang, Y., Malthouse, E.C.: A preliminary study of chatgpt on news recommendation: personalization, provider fairness, fake news. arXiv:2306.10702 (2023)
Dong, Q., Li, L., Dai, D., Zheng, C., Wu, Z., Chang, B., Sun, X., Xu, J., Sui, Z.: A survey for in-context learning. arXiv:2301.00234 (2022)
Dai, D., Sun, Y., Dong, L., Hao, Y., Sui, Z., Wei, F.: Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers. arXiv:2212.10559 (2022)
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., Zettlemoyer, L.: Rethinking the role of demonstrations: What makes in-context learning work?. arXiv:2202.12837 (2022)
Levy, I., Bogin, B., Berant, J.: Diverse demonstrations improve in-context compositional generalization. arXiv:2212.06800 (2022)
Xie, S.M., Raghunathan, A., Liang, P., Ma, T.: An explanation of in-context learning as implicit bayesian inference. In: International Conference on Learning Representations
Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y., Chen, A., et al.: In-context learning and induction heads. arXiv:2209.11895 (2022)
Akyürek, E., Schuurmans, D., Andreas, J., Ma, T., Zhou, D.: What learning algorithm is in-context learning? investigations with linear models. arXiv:2211.15661 (2022)
Geng, S., Liu, S., Fu, Z., Ge, Y., Zhang, Y.: Recommendation as language processing (rlp): a unified pretrain, personalized prompt & predict paradigm (p5). In: Proceedings of the 16th ACM Conference on Recommender Systems, pp. 299-315 (2022)
Cui, Z., Ma, J., Zhou, C., Zhou, J., Yang, H.: M6-rec: generative pretrained language models are openended recommender systems. arXiv:2205.08084 (2022)
Liu, S., Gao, C., Chen, Y., Jin, D., Li, Y.: Learnable embedding sizes for recommender systems. arXiv:2101.07577 (2021)
Liu, H., Zhao, X., Wang, C., Liu, X., Tang, J.: Automated embedding size search in deep recommender systems. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2307-2316 (2020)
Deng, W., Pan, J., Zhou, T., Kong, D., Flores, A., Lin, G.: Deeplight: deep lightweight feature interactions for accelerating ctr predictions in ad serving. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 922-930 (2021)
Ginart, A.A., Naumov, M., Mudigere, D., Yang, J., Zou, J.: Mixed dimension embeddings with application to memory-efficient recommendation systems. In: 2021 IEEE International Symposium on Information Theory (ISIT), pp. 2786-2791 (2021). IEEE
Wang, Y., Zhao, X., Xu, T., Wu, X.: Autofield: automating feature selection in deep recommender systems. In: Proceedings of the ACM Web Conference 2022, pp. 1977-1986 (2022)
Lin, W., Zhao, X., Wang, Y., Xu, T., Wu, X.: Adafs: adaptive feature selection in deep recommender system. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3309-3317 (2022)
Tsang, M., Cheng, D., Liu, H., Feng, X., Zhou, E., Liu, Y.: Feature interaction interpretability: a case for explaining ad-recommendation systems via neural interaction detection. arXiv:2006.10966 (2020)
Yuanfei, L., Mengshuo, W., Hao, Z., Quanming, Y., WeiWei, T., Yuqiang, C., Qiang, Y., Wenyuan, D.: Autocross: automatic feature crossing for tabular data in real-world applications. arXiv:1904.12857 (2019)
Liu, B., Zhu, C., Li, G., Zhang, W., Lai, J., Tang, R., He, X., Li, Z., Yu, Y.: Autofis: automatic feature interaction selection in factorization models for click-through rate prediction. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 26362645 (2020)
Liu, B., Xue, N., Guo, H., Tang, R., Zafeiriou, S., He, X., Li, Z.: Autogroup: automatic feature grouping for modelling explicit high-order feature interactions in ctr prediction. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 199208 (2020)
Chen, Y., Ren, P., Wang, Y., Rijke, M.: Bayesian personalized feature interaction selection for factorization machines. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 665-674 (2019)
Xie, Y., Wang, Z., Li, Y., Ding, B., Gürel, N.M., Zhang, C., Huang, M., Lin, W., Zhou, J.: Fives: feature interaction via edge search for large-scale tabular data. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3795-3805 (2021)
Su, Y., Zhang, R., Erfani, S., Xu, Z.: Detecting beneficial feature interactions for recommender systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4357-4365 (2021)
Song, Q., Cheng, D., Zhou, H., Yang, J., Tian, Y., Hu, X.: Towards automated neural interaction discovery for click-through rate prediction. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 945-955 (2020)
Zhao, P., Xiao, K., Zhang, Y., Bian, K., Yan, W.: Ameir: automatic behavior modeling, interaction exploration and mlp investigation in the recommender system. In: IJCAI, pp. 2104-2110 (2021)
Wei, Z., Wang, X., Zhu, W.: Autoias: automatic integrated architecture searcher for click-trough rate prediction. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2101-2110 (2021)
Cheng, M., Liu, Z., Liu, Q., Ge, S., Chen, E.: Towards automatic discovering of deep hybrid network architecture for sequential recommendation. In: Proceedings of the ACM Web Conference 2022, pp. 1923-1932 (2022)
Yu, C., Liu, X., Tang, C., Feng, W., Lv, J.: Gpt-nas: Neural architecture search with the generative pre-trained model. arXiv:2305.05351 (2023)
Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., Hutter, F.: Nas-bench-101: Towards reproducible neural architecture search. In: International Conference on Machine Learning, pp. 7105-7114 (2019). PMLR
Zheng, M., Su, X., You, S., Wang, F., Qian, C., Xu, C., Albanie, S.: Can gpt-4 perform neural architecture search? arXiv:2304.10970 (2023)
Nasir, M.U., Earle, S., Togelius, J., James, S., Cleghorn, C.: Llmatic: Neural architecture search via large language models and quality-diversity optimization. arXiv:2306.01102 (2023)
Chen, A., Dohan, D.M., So, D.R.: Evoprompting: language models for code-level neural architecture search. arXiv:2302.14838 (2023)
Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. arXiv:1503.02364 (2015)
Vinyals, O., Le, Q.: A neural conversational model. arXiv:1506.05869 (2015)
Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.-Y., Gao, J., Dolan, B.: A neural network approach to context-sensitive generation of conversational responses. arXiv:1506.06714 (2015)
Wu, W., Yan, R.: Deep chit-chat: deep learning for chatbots. In: Proceedings of the 42 nd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1413-1414 (2019)
Qiu, X., Huang, X.: Convolutional neural tensor network architecture for community-based question answering. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Wan, S., Lan, Y., Guo, J., Xu, J., Pang, L., Cheng, X.: A deep architecture for semantic matching with multiple positional sentence representations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Greco, C., Suglia, A., Basile, P., Semeraro, G.: Converse-et-impera: exploiting deep learning and hierarchical reinforcement learning for conversational recommender systems. In: AI* IA 2017 Advances in Artificial Intelligence: XVIth International Conference of the Italian Association for Artificial Intelligence, Bari, Italy, November 14-17, 2017, Proceedings 16, pp. 372-386 (2017). Springer
Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech, pp. 2524-2528 (2013)
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech, pp. 3771-3775 (2013)
Goddeau, D., Meng, H., Polifroni, J., Seneff, S., Busayapongchai, S.: A form-based dialogue manager for spoken language applications. In: Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol. 2, pp. 701-704 (1996). IEEE
Henderson, M., Thomson, B., Young, S.: Deep neural network approach for the dialog state tracking challenge. In: Proceedings of the SIGDIAL 2013 Conference, pp. 467-471 (2013)
Mrkšić, N., Séaghdha, D.O., Wen, T.-H., Thomson, B., Young, S.: Neural belief tracker: data-driven dialogue state tracking. arXiv:1606.03777 (2016)
Cuayáhuitl, H., Keizer, S., Lemon, O.: Strategic dialogue management via deep reinforcement learning. arXiv:1511.08099 (2015)
Zhou, H., Huang, M., Zhu, X.: Context-aware natural language generation for spoken dialogue systems. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2032-2041 (2016)
Dušek, O., Jurčíćek, F.: Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. arXiv:1606.05491 (2016)
Wen, T.-H., Vandyke, D., Mrksic, N., Gasic, M., Rojas-Barahona, L.M., Su, P.-H., Ultes, S., Young, S.: A network-based end-to-end trainable task-oriented dialogue system. arXiv:1604.04562 (2016)
Bordes, A., Boureau, Y.-L., Weston, J.: Learning end-to-end goal-oriented dialog. arXiv:1605.07683 (2016)
Zhang, Y., Sun, S., Galley, M., Chen, Y.-C., Brockett, C., Gao, X., Gao, J., Liu, J., Dolan, B.: Dialogpt: large-scale generative pre-training for conversational response generation. arXiv:1911.00536 (2019)
Lei, W., He, X., Miao, Y., Wu, Q., Hong, R., Kan, M.-Y., Chua, T.-S.: Estimation-action-reflection: towards deep interaction between conversational and recommender systems. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 304-312 (2020)
Lei, W., Zhang, G., He, X., Miao, Y., Wang, X., Chen, L., Chua, T.-S.: Interactive path reasoning on graph for conversational recommendation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2073-2083 (2020)
Deng, Y., Li, Y., Sun, F., Ding, B., Lam, W.: Unified conversational recommendation policy learning via graph-based reinforcement learning. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1431-1441 (2021)
Li, R., Ebrahimi Kahou, S., Schulz, H., Michalski, V., Charlin, L., Pal, C.: Towards deep conversational recommendations. Adv. Neural. Inf. Process. Syst. 31 (2018)
Wang, T.-C., Su, S.-Y., Chen, Y.-N.: Barcor: Towards a unified framework for conversational recommendation systems. arXiv:2203.14257 (2022)
Wang, X., Zhou, K., Wen, J.-R., Zhao, W.X.: Towards unified conversational recommender systems via knowledge-enhanced prompt learning. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1929-1937 (2022)
Wang, L., Hu, H., Sha, L., Xu, C., Jiang, D., Wong, K.-F.: Recindial: a unified framework for conversational recommendation with pretrained language models. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, pp. 489-500 (2022)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Gao, Y., Sheng, T., Xiang, Y., Xiong, Y., Wang, H., Zhang, J.: Chat-rec: towards interactive and explainable llms-augmented recommender system. arXiv:2303.14524 (2023)
Friedman, L., Ahuja, S., Allen, D., Tan, T., Sidahmed, H., Long, C., Xie, J., Schubiner, G., Patel, A., Lara, H., et al.: Leveraging large language models in conversational recommender systems. arXiv:2305.07961 (2023)
Wang, X., Tang, X., Zhao, W.X., Wang, J., Wen, J.-R.: Rethinking the evaluation for conversational recommendation in the era of large language models. arXiv:2305.13112 (2023)
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T.L., Cao, Y., Narasimhan, K.: Tree of thoughts: deliberate problem solving with large language models. arXiv:2305.10601 (2023)
Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R.K.-W., Lim, E.-P.: Plan-and-solve prompting: improving zero-shot chain-of-thought reasoning by large language models. arXiv:2305.04091 (2023)
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: React: synergizing reasoning and acting in language models. arXiv:2210.03629 (2022)
Madaan, A., Tandon, N., Clark, P., Yang, Y.: Memory-assisted prompt editing to improve gpt-3 after deployment. arXiv:2201.06009 (2022)
Qin, Y., Hu, S., Lin, Y., Chen, W., Ding, N., Cui, G., Zeng, Z., Huang, Y., Xiao, C., Han, C., et al.: Tool learning with foundation models. arXiv:2304.08354 (2023)
Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A., et al.: Augmented language models: a survey. arXiv:2302.07842 (2023)
Yang, K., Peng, N., Tian, Y., Klein, D.: Re3: Generating longer stories with recursive reprompting and revision. arXiv:2210.06774 (2022)
Schick, T., Dwivedi-Yu, J., Jiang, Z., Petroni, F., Lewis, P., Izacard, G., You, Q., Nalmpantis, C., Grave, E., Riedel, S.: Peer: a collaborative language model. arXiv:2208.11663 (2022)
Hao, Y., Song, H., Dong, L., Huang, S., Chi, Z., Wang, W., Ma, S., Wei, F.: Language models are general-purpose interfaces. arXiv:2206.06336 (2022)
Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., Grave, E.: Few-shot learning with retrieval augmented language models. arXiv:2208.03299 (2022)
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., Du, Y., et al.: Lamda: language models for dialog applications. arXiv:2201.08239 (2022)
Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., Saunders, W., et al.: Webgpt: browser-assisted question-answering with human feedback. arXiv:2112.09332 (2021)
Liu, R., Wei, J., Gu, S.S., Wu, T.-Y., Vosoughi, S., Cui, C., Zhou, D., Dai, A.M.: Mind’s eye: grounded language model reasoning through simulation. arXiv:2210.05359 (2022)
Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., Neubig, G.: Pal: Program-aided language models. arXiv:2211.10435 (2022)
Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., Finn, C., Fu, C., Gopalakrishnan, K., Hausman, K., et al.: Do as i can, not as i say: grounding language in robotic affordances. arXiv:2204.01691 (2022)
Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: Hugginggpt: solving ai tasks with chatgpt and its friends in huggingface. arXiv:2303.17580 (2023)
Wu, C., Yin, S., Qi, W., Wang, X., Tang, Z., Duan, N.: Visual chatgpt: talking, drawing and editing with visual foundation models. arXiv:2303.04671 (2023)
Liang, Y., Wu, C., Song, T., Wu, W., Xia, Y., Liu, Y., Ou, Y., Lu, S., Ji, L., Mao, S., et al.: Taskmatrix. ai: completing tasks by connecting foundation models with millions of apis. arXiv:2303.16434 (2023)
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., Scialom, T.: Toolformer: language models can teach themselves to use tools. arXiv:2302.04761 (2023)
Li, J., Li, D., Xiong, C., Hoi, S.: Blip: Bootstrapping language-image pre-training for unified visionlanguage understanding and generation. In: International Conference on Machine Learning, pp. 1288812900 (2022). PMLR
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684-10695 (2022)
Cai, T., Wang, X., Ma, T., Chen, X., Zhou, D.: Large language models as tool makers. arXiv:2305.17126 (2023)
Shuster, K., Xu, J., Komeili, M., Ju, D., Smith, E.M., Roller, S., Ung, M., Chen, M., Arora, K., Lane, J., et al.: Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. arXiv:2208.03188 (2022)
Liu, J., Jin, J., Wang, Z., Cheng, J., Dou, Z., Wen, J.-R.: Reta-llm: a retrieval-augmented large language model toolkit. arXiv:2306.05212 (2023)
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap. arXiv:2306.08302 (2023)
Vempati, S., Malayil, K.T., Sruthi, V., Sandeep, R.: Enabling hyper-personalisation: automated ad creative generation and ranking for fashion e-commerce. In: Fashion Recommender Systems, pp. 25-48 (2020). Springer
Thomaidou, S., Lourentzou, I., Katsivelis-Perakis, P., Vazirgiannis, M.: Automated snippet generation for online advertising. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1841-1844 (2013)
Zhang, X., Zou, Y., Zhang, H., Zhou, J., Diao, S., Chen, J., Ding, Z., He, Z., He, X., Xiao, Y., et al.: Automatic product copywriting for e-commerce. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 12423-12431 (2022)
Lei, Z., Zhang, C., Xu, X., Wu, W., Niu, Z.-Y., Wu, H., Wang, H., Yang, Y., Li, S.: Plato-ad: a unified advertisement text generation framework with multi-task prompt learning. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pp. 512-520 (2022)
Bartz, K., Barr, C., Aijaz, A.: Natural language generation for sponsored-search advertisements. In: Proceedings of the 9th ACM Conference on Electronic Commerce, pp. 1-9 (2008)
Fujita, A., Ikushima, K., Sato, S., Kamite, R., Ishiyama, K., Tamachi, O.: Automatic generation of listing ads by reusing promotional texts. In: Proceedings of the 12th International Conference on Electronic Commerce: Roadmap for the Future of Electronic Business, pp. 179-188 (2010)
Hughes, J.W., Chang, K.-h., Zhang, R.: Generating better search engine text advertisements with deep reinforcement learning. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2269-2277 (2019)
Wang, X., Gu, X., Cao, J., Zhao, Z., Yan, Y., Middha, B., Xie, X.: Reinforcing pretrained models for generating attractive text advertisements. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3697-3707 (2021)
Chen, C., Wang, X., Yi, X., Wu, F., Xie, X., Yan, R.: Personalized chit-chat generation for recommendation using external chat corpora. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2721-2731 (2022)
Zhang, C., Zhou, J., Zang, X., Xu, Q., Yin, L., He, X., Liu, L., Xiong, H., Dou, D.: Chase: commonsenseenriched advertising on search engine with explicit knowledge. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 4352-4361 (2021)
Kanungo, Y.S., Negi, S., Rajan, A.: Ad headline generation using self-critical masked language model. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, pp. 263-271 (2021)
Wei, P., Yang, X., Liu, S., Wang, L., Zheng, B.: Creater: ctr-driven advertising text generation with controlled pre-training and contrastive fine-tuning. arXiv:2205.08943 (2022)
Kanungo, Y.S., Das, G., Negi, S.: Cobart: controlled, optimized, bidirectional and auto-regressive transformer for ad headline generation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3127-3136 (2022)
Chen, Q., Lin, J., Zhang, Y., Yang, H., Zhou, J., Tang, J.: Towards knowledge-based personalized product description generation in e-commerce. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3040-3050 (2019)
Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P.S., Sun, L.: A comprehensive survey of ai-generated content (aigc): a history of generative ai from gan to chatgpt. arXiv:2303.04226 (2023)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013)
Dinh, L., Krueger, D., Bengio, Y.: Nice: non-linear independent components estimation. arXiv:1410.8516 (2014)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840-6851 (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth words: transformers for image recognition at scale. In: International Conference on Learning Representations
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012-10022 (2021)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748-8763 (2021). PMLR
Chen, M., Tan, X., Li, B., Liu, Y., Qin, T., Liu, T.-Y., et al.: Adaspeech: adaptive text to speech for custom voice. In: International Conference on Learning Representations
Li, X., Taheri, A., Tu, L., Gimpel, K.: Commonsense knowledge base completion. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: long papers), pp. 1445-1455 (2016)
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1536-1547 (2020)
OpenAI: ChatGPT: A Large-Scale Generative Model for Conversation. OpenAI Blog (2020)
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821-8831 (2021). PMLR
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models trained on code. arXiv:2107.03374 (2021)
Midjourney: Midjourney. Retrieved from. https://midjourney.com (2022)
Wang, W., Lin, X., Feng, F., He, X., Chua, T.-S.: Generative recommendation: towards next-generation recommender paradigm. arXiv:2304.03516 (2023)
Borji, A.: A categorical archive of chatgpt failures. arXiv:2302.03494 (2023)
Carlini, N., Tramer, F., Wallace, E., et al.: Extracting training data from large language models. (2021)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners
Hu, E.J., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: low-rank adaptation of large language models. In: International Conference on Learning Representations (2021)
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: Efficient finetuning of quantized llms. arXiv:2305.14314 (2023)
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., et al.: Constitutional ai: Harmlessness from ai feedback. arXiv:2212.08073 (2022)

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

المؤلفون والانتماءات

جين تشينتشنغ ليوشو هوانغ تشينوانغ جانغوي جيانغيوانهاويوشوان ليشياولونغ تشينشينغمي وانغ كاي زينغ ديفو ليانإنهونغ تشين

جين تشين
جين تشين@ust.hk
تشنغ ليو
zhengliu1026@gmail.com
شو هوانغ
xuhuangcs@mail.ustc.edu.cn
تشينوانغ وو
wcw1996@mail.ustc.edu.cn
تشي ليو
qiliu67@mail.ustc.edu.cn
جانغوي جيانغ
gwjiang@mail.ustc.edu.cn
يوانهاو بو
puyuanhao@mail.ustc.edu.cn
يوشوان لي
lyx180812@mail.ustc.edu.cn
شياولونغ تشين
chenxiaolong@mail.ustc.edu.cn
شينغمي وانغ
xingmeiwang@mail.ustc.edu.cn
كاي زينغ
zhengkai@uestc.edu.cn
ديفو ليان
liandefu@ustc.edu.cn
إنهونغ تشين
cheneh@ustc.edu.cn
1 جامعة هونغ كونغ للعلوم والتكنولوجيا، هونغ كونغ، الصين
أكاديمية بكين للذكاء الاصطناعي، بكين، الصين
3 جامعة العلوم والتكنولوجيا في الصين، هيفي، الصين
4 جامعة علوم وتكنولوجيا الإلكترونيات في الصين، تشنغدو، الصين

جين تشين وزينغ ليو ساهموا بالتساوي في هذا العمل.
معلومات المؤلف الموسعة متاحة في الصفحة الأخيرة من المقال
https://www.apple.com/siri/
https://www.microsoft.com/ar-ae/cortana
https://huggingface.co
https://docs.langchain.com
https://github.com/Significant-Gravitas/Auto-GPT

Journal: World Wide Web, Volume: 27, Issue: 4
DOI: https://doi.org/10.1007/s11280-024-01276-1
Publication Date: 2024-06-28

When large language models meet personalization: perspectives of challenges and opportunities

Jin Chen Zheng Liu Xu Huang Chenwang Gangwei Jiang Yuanhao Yuxuan Lei Xiaolong Chen Xingmei Wang Kai Zheng Defu Lian . Enhong Chen

Received: 22 November 2023 / Revised: 15 March 2024 / Accepted: 14 May 2024 /
Published online: 28 June 2024
© The Author(s) 2024

Abstract

The advent of large language models marks a revolutionary breakthrough in artificial intelligence. With the unprecedented scale of training and model parameters, the capability of large language models has been dramatically improved, leading to human-like performances in understanding, language synthesizing, common-sense reasoning, etc. Such a major leap forward in general AI capacity will fundamentally change the pattern of how personalization is conducted. For one thing, it will reform the way of interaction between humans and personalization systems. Instead of being a passive medium of information filtering, like conventional recommender systems and search engines, large language models present the foundation for active user engagement. On top of such a new foundation, users’ requests can be proactively explored, and users’ required information can be delivered in a natural, interactable, and explainable way. For another thing, it will also considerably expand the scope of personalization, making it grow from the sole function of collecting personalized information to the compound function of providing personalized services. By leveraging large language models as a general-purpose interface, the personalization systems may compile user’s requests into plans, calls the functions of external tools (e.g., search engines, calculators, service APIs, etc.) to execute the plans, and integrate the tools’ outputs to complete the end-to-end personalization tasks. Today, large language models are still being rapidly developed, whereas the application in personalization is largely unexplored. Therefore, we consider it to be right the time to review the challenges in personalization and the opportunities to address them with large language models. In particular, we dedicate this perspective paper to the discussion of the following aspects: the development and challenges for the existing personalization system, the newly emerged capabilities of large language models, and the potential ways of making use of large language models for personalization.

Keywords Large language models

Personalization systems

Recommender systems

Tool-learning

AIGC

1 Introduction

The emergence of large language models [1], which have demonstrated remarkable progress in understanding human expression, is profoundly impacting the AI community. These models, equipped with vast amounts of data and large-scale neural networks, exhibit impressive capabilities in comprehending human language and generating text that closely resembles our own. Among these abilities are reasoning [2], few-shot learning [3], and the incorporation of extensive world knowledge within pre-trained models [1]. This marks a significant breakthrough in the field of artificial intelligence, leading to a revolution in our interactions with machines. Consequently, large language models have become indispensable across various applications, ranging from natural language processing and machine translation to creative content generation and chatbot development. The introduction of ChatGPT, in particular, has gained significant attention from the human community, prompting reflections on the transformative power of large language models and their potential to push the boundaries of what artificial intelligence (AI) can achieve. This disruptive technology holds the promise of transforming how we interact with and leverage AI in countless domains, opening up new possibilities and opportunities for innovation. As these language models continue to advance and evolve, they are likely to shape the future of artificial intelligence, empowering us to explore uncharted territories and unlock even greater potential in human-machine collaboration.

Personalization, the art of tailoring experiences to individual preferences, stands as an essential and dynamic connection that bridges the gap between humans and machines. In today’s technologically driven world, personalization plays a pivotal role in enhancing user interactions and engagements with a diverse array of digital platforms and services. By adapting to individual preferences, personalization systems empower machines to cater to each user’s unique needs, leading to more efficient and enjoyable interactions. Moreover, personalization goes beyond mere content recommendations; it encompasses various facets of user experiences, encompassing user interfaces, communication styles, and more. As artificial intelligence continues to advance, personalization becomes increasingly sophisticated in handling large volumes of interactions and diverse user intents. This calls for the development of more advanced techniques to tackle complex scenarios and provide even more enjoyable and satisfying experiences. The pursuit of improved personalization is driven by the desire to better understand users and cater to their ever-evolving needs. As technology evolves, personalization systems will likely continue to evolve, ultimately creating a future where human-machine interactions are seamlessly integrated into every aspect of our lives, offering personalized and tailored experiences that enrich daily routines.

Large language models, with their deep and broad capabilities, have the potential to revolutionize personalization systems, transforming the way humans interact and expanding the scope of personalization. The interaction between humans and machines can no longer be simply classified as active and passive, just like traditional search engines and recommendation systems. However, these large language models go beyond simple information filtering and they offer a diverse array of additional functionalities. Specifically, user intent will be actively and comprehensively explored, allowing for more direct and seamless communication between users and systems through natural language. Unlike traditional technologies that rely on abstract and less interpretable ID-based information representation, large language models enable a more profound understanding of users’ accurate demands and interests. This deeper comprehension paves the way for higher-quality personalized services, meeting users’ needs and preferences in a more refined and effective manner. Moreover, the integration of
various tools is greatly enhanced by the capabilities of large language models, significantly broadening the possibilities and scenarios for personalized systems. By transforming user requirements into plans, including understanding, generating, and executing them, users can access a diverse range of information and services. Importantly, users remain unaware of the intricate and complex transformations happening behind the scenes, as they experience a seamless end-to-end model. The potential of large language models in personal is largely unexplored.

This paper addresses the challenges in personalization and explores the potential solutions using large language models. In the existing related work, LaMP [4] introduces a novel benchmark for training and evaluating language models in producing personalized outputs for information retrieval systems. On the other hand, other related surveys [5-7] focus mainly on traditional personalization techniques, such as recommender systems. From the perspective of learning mechanisms, LLM4Rec [5] delves into both Discriminative LLM for Recommendation and Generative LLM for Recommendation. Regarding the adaptation of LLM for recommender systems in terms of ‘Where’ and ‘How’, Li et al [6] concentrate on the overall pipeline in industrial recommender phases. Fan et al [7], on the other hand, conduct a review with a focus on pre-training, fine-tuning, and prompting approaches. While these works discuss pre-trained language models like Bert for ease of analysis, they dedicate limited attention to the emergent capabilities of large language models. This paper aims to fill this gap by examining the unique and powerful abilities of large language models in personalization, and further expand the scope of personalization with tools.

The remaining of this survey is organized as follows: we review the personalization and large language models in Section 2 to overview the development and challenges. Then we carefully discuss the potential actors of large language models for personalization from Section 3, following the simple utilization of emergent capabilities and the complex integration with other tools. We also discuss the potential challenges when large language models are adapted for personalization. The whole architecture of this paper is shown in Figure 1.

Figure 1 Overview of this paper

2 Background overview

2.1 Personalization techniques

Personalization, a nuanced art that tailors experience to the unique preferences and needs of individual users, has become a cornerstone of modern artificial intelligence. In this section, we explore the captivating world of personalized techniques and their profound impact on user interactions with AI systems. We will delve into two key aspects of personalization: recommender systems and personalized assistance. These techniques not only enhance user satisfaction but also exemplify the evolution of AI, where machines seamlessly integrate with our lives, understanding us on a profound level. By tailoring recommendations, providing customized assistance, and delivering personalized search results, AI systems have the potential to create a truly immersive and individualized user experience.

2.1.1 Recommender systems

Recommender systems play a pivotal role in personalization, revolutionizing the way users discover and engage with content. These systems aim to predict and suggest items of interest to individual users, such as movies, products, or articles, based on their historical interactions and preferences.

Regarding the development of recommender systems, they have evolved significantly over the years, with collaborative filtering [8,9] being one of the earliest and most influential approaches. Collaborative filtering relies on user-item interaction data to identify patterns and make recommendations based on users with similar preferences. Traditional solutions, such as matrix factorization [10] and user/item-based approaches [11], extract potentially interesting items based on the idea that users who have shown similar preferences in the past are likely to have similar preferences in the future. While effective, collaborative filtering has limitations, such as the “cold start” problem for new users and items. To address these limitations, contentbased filtering [12] emerged, which considers the content of items to make recommendations. It leverages the features and attributes of items to find similarities and make personalized suggestions. These features can be grouped into user-side information, such as user profiles, item-side information [13, 14], such as item brands and item categories, and interactionbased information [15], such as reviews and comments. However, content-based filtering may struggle to capture complex user preferences and discover diverse recommendations restricted by the limited feature representations.

In recent years, deep learning has gained significant attention in the field of recommender systems due to its ability to model complex patterns and interactions in user-item data [16]. Deep learning-based methods have shown promising results in capturing sequential, temporal, and contextual information, as well as extracting meaningful representations from large-scale data. With the introduction of deep networks, high-order interactions between features of users and items are well captured to extract user interest. Deep learning-based methods offer approaches to capture high-order interactions by employing techniques like attention mechanisms [17, 18] and graph-based networks [19] to mine complex relationships between user and item. These methods have been shown to enhance recommendation performance by considering higher-order dependencies and inter-item relationships. Another area of deep learning-based recommender systems is sequential recommenders, specifically designed to handle sequential user-item interactions, such as user behavior sequences over time. Self-Attentions [20] and Gated Recurrent Units (GRUs) [21] are popular choices for
modeling sequential data in recommender systems. These models excel in capturing temporal dependencies and context, making them well-suited for tasks like next-item recommendation and session-based recommendation. Sequential-based models can take into account the order in which items are interacted with and learn patterns of user behavior that evolve. Furthermore, the rise of language models like BERT has further advanced recommender systems by enabling a better understanding of both natural language features and user sequential behaviors [22]. These language models can capture deep semantic representations and world knowledge, enriching the recommendation process and facilitating more personalized and context-aware recommendations. Overall, the application of deep learning techniques in recommender systems has opened new avenues for research and innovation, promising to revolutionize the field of personalized recommendations and enhance user experiences.

2.1.2 Personalized assistance

Personalization Assistance refers to the use of artificial intelligence and machine learning techniques to tailor and customize experiences, products, or content based on individual preferences, behavior, and characteristics of users [23]. By analyzing individual preferences, behaviors, and characteristics, it creates a personalized ecosystem that enhances user engagement and satisfaction. In contrast to traditional recommender systems, which rely on predicting user interests passively, personalized assistance takes a more proactive approach. It ventures into the realm of predicting users’ next intentions or actions by utilizing contextual information, such as historical instructions and speech signals. This deeper level of understanding enables the system to cater to users’ needs in a more anticipatory and intuitive manner. At the core of this capability lies the incorporation of cutting-edge technologies like natural language processing (NLP) and computer vision. These advanced tools empower the system to recognize and interpret user intentions, whether conveyed through spoken or written language, or even visual cues. Moreover, the potential of personalized assistance extends beyond static recommendations to dynamic and context-aware interactions. As the system becomes more and more familiar with a user’s preferences and patterns, it adapts and refines its recommendations in real-time, keeping pace with the ever-changing needs and preferences of the user.

Conversational Recommender Systems [24, 25] mark a remarkable stride forward in the realm of personalized assistance. By engaging users in interactive conversations, these systems delve deeper into their preferences and fine-tune their recommendations accordingly. Leveraging the power of natural language understanding, these conversational recommenders adeptly interpret user queries and responses, culminating in a seamless and engaging user experience. Notable instances of personalized assistance products, such as Siri

and Microsoft Cortana

, have already proven their effectiveness on mobile devices. Additionally, the integration of large language models like ChatGPT further elevates the capabilities of conversational recommenders, promising even more enhanced user experiences. As this technology continues to progress, we can anticipate its growing significance across diverse industries, including healthcare, education, finance, etc. While the growth of conversational recommenders and personalized assistance promises immense benefits, it is imperative to develop these products responsibly. Upholding user privacy and ensuring transparent data handling practices are essential to maintain user trust and safeguard sensitive information.

2.2 Large language models

Language models perform the probabilistic modeling for the generation of natural language [26,27], i.e., presented with one specific context, the language models make predictions for the words which are to be generated for the future steps. Nowadays, language models are mostly built upon deep neural networks, where two features need to be emphasized. First of all, the majority of language models are based on transformers or their close variations [28, 29]. Such types of neural networks are proficient at modeling context dependency within natural languages and exhibit superior and consistently improved performances when being scaled up. Secondly, the language models are pre-trained at scale with a massive amount of unlabeled corpus [3, 30-32]. The pre-trained models are further fine-tuned with task-oriented data to adapt to different downstream applications.

There has been tremendous progress in language models in recent years, where the emergence of large language models, represented by GPT-3 [3], marks an important milestone for the entire AI community. The large language models (LLMs), as the name suggests, are massively scaled-up derivatives of conventional language models. Particularly, the backbone networks and the training data have been largely magnified. For one thing, although there are no specific criteria for the minimum number, a typical LLM usually consists of no less than several billion and up to trillions of model parameters, which are orders larger than before [33]. For another thing, the pre-training is conducted based on much more unsupervised corpora, with hundreds of billions or trillions of tokens carefully filtered from sources like Common Crawl, GitHub, Wikipedia, Books, ArXiv, etc [1]. The impact of scaling is illustrated by the scaling laws [34, 35], which numerically uncover the power-law relationship between model size, data volume, training scale and the growth of the model’s performance.

The scaling up of network and training data lead to the leap-forward of large language models’ capability. They not only become more proficient at conventional skills, like understanding people’s intent and synthesising human-like languages, but also process capabilities which are rarely exhibited by those smaller models. Such a phenomenon is referred as the emergent abilities of LLMs, where three representatives capabilities are frequently discussed. One is the in-context learning capability [3], where LLMs may quickly learn from the fewshot examples provided in the prompt. Another one is the instruct following capability [36, 37]. After fine-tuned with diversified tasks in the form of instruction tuning, the LLMs are made proficient to follow the human’s instructions. Thus, they may handle different tasks presented in an ad-hoc manner. Last but not least, LLMs are found to be able to conduct step-by-step reasoning [38, 39]. With certain types of prompting strategies, like Chain-ofThought (CoT), LLMs may iteratively approach the final answer of some complex tasks, like mathematical word problems, by breaking down the tasks into sub-problems and figuring out the plausible intermediate answers for each of the sub-problems.

Thanks to the superior capabilities of understanding, reasoning, and generating, large language models, especially the chat models produced by instruction tuning, are presented as foundamental building blocks for many personalization services. One direct scenario is the conversational search and recommendation [40]. Once built upon large language models, the search and recommendation systems will be able to engage with users via interactions, present outputs in a verbalized and explainable way, receive feedback from the user and make adjustments on top of the feedback, etc. The above changes will bring about a paradigm shift for personalization services, from passively making search and recommendation, to proactively figuring out user’s need and seeking for user’s preferred items. In broader scopes, the LLMs may go beyond simply making personalized search and recommendation, but play as personalized assistants to help users with their task completions. The LLMs may take
notes of users’ important information within their memory, make personalized plans based on memorized information when new demands are raised, and execute plans by leveraging tools like search engines and recommendation systems.

Yet, we have to confront the reality that applying LLMs for personalization is not a trivial problem. To name a quite few of the open challenges. Firstly, personalization calls for the understanding of user preference, which is more domain-specific knowledge rather than the common-sense knowledge learned by LLMs. The effective and efficient adaptation of LLMs for personalized services remains to be resolved. Besides, the LLMs could memorize user’s confidential information while providing personalized services. Thus, it raises concerns for privacy protection. The LLMs are learned from Internet data; due to the exposure bias, it is almost inevitable to make unfair predictions for minorities. To address the above challenges, benchmarks and evaluation datasets are needed by the research communities. However, such resources are far from complete at present. To fully support personalization with LLMs, methodological and experimental frameworks need to be systematically established for all these perspectives.

3 LLMs for personalization

In the following sections, we delve into the potential of large language models for personalization, examining their evolution from simple use cases, like utilizing word knowledge as features, to more intricate integration with other tool modules to act as agents. Specifically, we focus on the progression of emergent capabilities, starting from basic world knowledge and understanding user intent, and advancing to high-level reasoning abilities. We explore how large language models can contribute to constructing a knowledge base that enriches common-sense knowledge about various items. Additionally, we discuss how the understanding capability of large language models can empower content interpreters and explainers for in-depth analysis of interactions. Furthermore, we observe attempts to leverage the reasoning ability of large language models for system reasoners to provide recommendation results. These increasingly sophisticated capabilities enable complex utilization of large language models with other tool modules, enabling them to better comprehend user intentions and fulfill user instructions. Consequently, we also explore the integration of large language models with other tools for personalization, including tool learning, conversational agents and personalized content creators. The overview of this chapter is depicted in Figure 2. Our comprehensive survey aims to provide a deeper understanding of the current landscape, shedding light on the opportunities and challenges associated with incorporating large language models into personalization.

Figure 2 The overview of LLM for personalization

4 LLMs as knowledge base

The knowledge base provides rich information with semantics, attracting increasing attention for the usage of the knowledge base in the recommender systems. Particularly, the knowledge graphs, where nodes represent entities and edges represent relations in the heterogeneous information graph, are the common format of knowledge bases and are introduced as side information to enhance the performance of recommenders. Knowledge graphs help understand the mutual relations between users and items and also provide better explainability for recommenders. Existing methods that incorporate knowledge graphs in recommender systems can be classified into three main groups: embedding-based methods, path-based methods and unified methods. Embedding-based methods, such as CKE [14] and DKN [41], KSR [42], SHINE [43], utilize semantic representations of users and items. These methods aim to capture the underlying semantic relationships between entities in the knowledge graph, which can improve the quality of recommendations. Path-based approaches, such as Hete-MF [44], SemRec [45], RuleRec [46], EIUM [47], exploit the semantic connectivity information present in the knowledge graph to regularize the user and item representations. These methods consider the paths between users and items in the graph and leverage them to incorporate interpretability into the recommendation process. Unified methods, such as RippleNet [48], KGCN [49], KGAT [50], AKUPM [51], IntentGC [52] refine the representations of entities in the knowledge graph by leveraging embedding propagation techniques. These methods propagate the embeddings of entities through the graph structure, allowing information to flow across connected entities and refining the representations accordingly.

However, the knowledge graphs adopted in recommender systems are limited and with low usability. Reviewing the various knowledge graph datasets for recommender systems, covering the domains of movies, books, news, products, etc., these datasets are still significantly sparse compared to the vast amount of human knowledge, particularly the lack of facts, due to the expensive supervision to construct the knowledge graph. Building a comprehensive and accurate knowledge graph would be a complex and resource-intensive task, which would include data collection, integration, and cleaning to assure data quality and consistency. Limited by the expensive cost of labelling the knowledge graphs, there would usually exist missing entities or relations. The user preferences for these entities or paths may be ignored, and the recommendation performance suffers.

The ability of Large Language Models to retrieve factual knowledge as explicit knowledge bases [53-56, 56-61] has been stirred discussed, which presents an opportunity to construct more comprehensive knowledge graphs within recommender systems. Tracing back to the work [53], large language models have shown their impressive power in storing factual information, such as entities and common sense, and then commonsense knowledge can be reliably transferred to downtown tasks. Existing methods in knowledge graphs fall short of handling incomplete KGs [62] and constructing KGs with text corpus [63] and many researchers attempt to leverage the power of LLM to solve the two tasks, i.e., the knowledge completion [64] and knowledge construction [65]. For knowledge graph completion, which refers to the task of missing facts in the given knowledge graph, recent efforts have been made to encode text or generate facts for knowledge graphs. MTL-KGC [66] encoders the text sequences to predict the possibility of the tuples. MEMKGC [67] predicts the masked entities of the triple. StAR [68] utilizes Siamese textual encoders to separately encode the entities. GenKGC [69] uses the decoder-only language models to directly generate the tail entity. TagReal [70] generates high-quality prompts from the external text corpora. AutoKG [63] directly adopts the LLMs, such as ChatGPT and GPT-4, and designs tailored
prompts to predict the tail entity. As for the another important task, i.e., knowledge graph construction, which refers to creating a structured representation of knowledge, LLMs can be applied in the process of constructing knowledge graphs, including entity discovery [71, 72], coreference resolution [73, 74] and relation extraction [75, 76]. LLMs can also achieve the end-to-end construction [

] to directly build KGs from raw text. LLMs enable the knowledge distillation to construct knowledge graphs. symbolic-kg [79] distills commonsense facts from GPT3 and then finetunes the small student model to generate knowledge graphs. These models have demonstrated the capacity to store large volumes of knowledge, providing a viable option for improving the scope and depth of knowledge graphs. Furthermore, these advancements have prompted research into the direct transfer of stored knowledge from LLMs to knowledge graphs, eliminating the need for human supervision. This interesting research throws light on the possibilities of automating knowledge graph completion utilizing cutting-edge big language models.

By leveraging the capabilities of LLMs, recommender systems would benefit from a more extensive and up-to-date knowledge base. Firstly, missing faculty information can be completed to construct more extensive knowledge graphs and thus the relations between entities can be extracted for better recommenders. Secondly, in contrast to the preceding exclusively in-domain data, the large language model itself contains plenty of cross-domain information that can help achieve cross-domain recommendations, such as recommending appropriate movies based on the user’s favourite music songs. To sum up, the stored knowledge can be utilized to enhance recommendation accuracy, relevance, and personalization, ultimately improving the overall performance of recommender systems. Existing work [80] prompts large language models to generate factual knowledge about movies to enhance the performance of CTR prediction models. To better utilize the factual knowledge, a Knowledge Adaptation module is adopted for better contextual information extraction. LLMRec [81] adopts LLM-based graph augmentation strategies to enrich the information of items and further designs denoise methods to ensure the augmentation. LLM-KRec [82] adopts the large language models to determine complementary relationships in each entity pair and construct a complementary knowledge graph, which enhances the industrial recommenders. The existing work leveraging the capabilities of knowledge base in personalization is summerized in Table 1.

It is worth noting that the phantom problem of large language models can be a challenge when applied to recommendation tasks. The inherent nature of large language models can introduce ambiguity or inaccurate provenance [83]. This issue can emerge as the introduction of extraneous information or even noise into the recommendation process. The large language models may generate responses that, while syntactically correct, lack informative context or relevance. According to the KoLA [84], a benchmark for evaluating word knowledge of LLMs, even the top-ranked GPT4 just achieves 0.012 in Precision and 0.013 in Recall on the task Named Entity Recognition, which falls far short of the performance ( 0.712 in Precision and 0.706 in Recall) of the task-specific models PL-Marker [85]. Such a finding suggests that common sense is still far from being sufficiently captured by LLM.

5 LLMs as content interpreter

Content-based recommenders provide an effective solution for mitigating the sparse feedback issue in recommender systems. By leveraging the attributes and characteristics of items, these systems achieve a more profound understanding of their properties, facilitating accurate

Table 1 LLMs as knowledge base

Approach	Knowledge	Task	LLM backbone	Datasets
KAR [80]	Factual knowledge	CTR prediction & Rerank	gpt-3.5-turbo	Ml-1M, Amazon Book
LLMRec [81]	User-item interactions, side information	TopK Recommendation	gpt-3.5-turbo	ML, Netflix
KRec [82]	Complementary relationship	CTR Prediction	ChatGPT 3.5, ChatGLM 2	Alipay Data

matching with user preferences. However, the content features used in content-based recommendation may also exhibit sparsity. Relying solely on the recommended supervision signal, such as clicking and browsing, might not fully exploit the potential benefits of these features. To overcome this challenge, language models emerge as powerful fundamental algorithms that act as content interpreters in processing textual features. Their utilization enhances the effectiveness of recommender systems by effectively understanding and interpreting textual content, leading to improved recommendations.

5.1 Conventional content interpreter

Conventional content interpreter includes statistical model, neural networks, and advanced NLP networks, as summarized in Figure 3. These approaches primarily focus on transforming content information, such as textual data, into feature embeddings to facilitate the recommendation process.

Statistical models like TF-IDF, Minimum Description Length (MDL) [86], and bag-ofwords have been traditionally used to encode textual data such as news articles and documents into continuous value vectors. However, with the advancement of deep learning techniques, researchers have explored various neural network architectures to learn more expressive content representations. Instead of relying solely on statistical embeddings, some approaches initialize the vectors with bag-of-words representations and then employ autoencoder-based models to learn more powerful representations. For example, CDL [16] combines the latent vectors obtained from autoencoders with the original ID embeddings to enhance content representations. CRAE [87] introduces a collaborative recurrent autoencoder that captures the word order in texts, enabling the modeling of content sequences in collaborative filtering scenarios. Dong et al. [88] propose a stacked denoising autoencoder that reconstructs item/user ratings and textual information simultaneously, allowing for the joint modeling of collaborative and textual knowledge. CVAE [89] introduces a collaborative variational autoencoder that learns probabilistic textual features. While autoencoders are effective in learning lowdimensional representations from text data, they may struggle to capture semantic information effectively [90]. In some cases, approaches like doc2vec [91] are used to construct content embeddings [92, 93] and learn hidden representations. Okura et al. [94] evaluate different network architectures, including word-models and GRU networks, for representing user states.

Figure 3 The development of content interpreter in recommendation

Following the advancements in neural natural language processing (NLP) models, more sophisticated architectures such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Neural Attention models have been employed as content interpreters to extract contextual information and capture user preferences. These models take sentence inputs, such as news titles, reviews, or comments, and transform them into word embedding matrices using random initialization or word2vec embeddings [95]. Various architectures, including CNNs, attention networks, and hybrid models, are utilized to learn representations of sentences. For example, NPA [96] and LSTUR [97] incorporate attention mechanisms to determine the importance of words after CNN layers. NRMS [98] and CPRS [99] utilize multi-head self-attention networks to learn word representations. These models are effective in capturing long-context dependencies and understanding the semantic information in the text. In addition to text modeling, language models are also used as content interpreters to capture user interests based on their historical interactions. For instance, WE3CN [100] employs a 3D CNN to extract temporal features from the historical data. DKN [41] utilizes an attention mechanism to aggregate historical information related to candidate items. DAN [101] proposes an attention-based LSTM to capture richer hidden sequence features. These models leverage different neural architectures to enhance the representation of text in the context of recommendation systems. It is worth noting that these models still have limitations in terms of depth and the ability to effectively generalize semantic information.

5.2 Language model based content interpreter

In recent years, there has been a growing interest in incorporating more powerful pre-trained language models, such as BERT and GPT, into recommendation systems. These language models have shown exceptional performance in various natural language processing tasks and have sparked researchers’ inspiration to leverage them for capturing deep semantic representations and incorporating world knowledge in recommendation systems. However, applying pre-trained language models to recommendation tasks presents two main challenges. Firstly, there is a misalignment of goals between general-purpose language models and the specific objectives of recommendation systems. To address this, researchers have proposed approaches that fine-tune the pre-trained models or design task-specific pre-training tasks to adapt them to recommendation tasks. For example, U-BERT [102] employs BERT as a content interpreter and introduces masked opinion token prediction and opinion rating prediction as pre-training tasks to better align BERT with recommendation objectives. Similarly, other works [103-107] have utilized pre-trained BERT to initialize the news encoder for news recommendation, enhancing the representation of textual features. The pre-trained model, ERNIE, is also utilized to enhance the representation ability of queries and documents [108, 109]. The second challenge is reducing the online inference latency caused by pre-trained language models, which can be computationally expensive. Researchers have explored techniques such as knowledge distillation and model optimization to obtain lightweight and efficient models suitable for online services. For instance, CTR-BERT [110] employs knowledge distillation to obtain a cache-friendly model for click-through rate prediction, addressing the latency issue.

Moreover, pre-trained language models have been applied beyond mainstream recommendation tasks. They have been integrated into various recommendation scenarios, including tag recommendation [111], tweet representations [112], and code example recommendation [113], to enhance the representation of textual features in those specific domains.

Additionally, some recent works [114-117] have explored using only textual features as inputs to recommendation models, leveraging pre-trained language models to alleviate coldstart problems and enable cross-domain recommendations. This paradigm offers advantages in alleviating cold-start problems and facilitating cross-domain recommendations based on the universality of natural language. ZESREC [114] uses BERT to obtain universal continuous representations of item descriptions for zero-shot recommendation. Unisrec [115] focuses on cross-domain sequential recommendation and employs a lightweight MoE-enhanced module to incorporate the fixed BERT representation into the recommendation task. VQ-Rec [116] further aligns the textual embeddings produced by pre-trained language models to the recommendation task with the help of vector quantization. Fu et al. [118] explore layerwise adaptor tuning to achieve parameter-efficient transferable recommendations.

While the pre-trained language models empower the text understanding with the benefit of capturing world knowledge first, the development of pre-trained large language models provides great emergency ability in the fields of reasoning and generalization, as shown in Table 2. TALLRe [119] explores the ability of large language models for the sequential recommendation. They observe that original language models perform poorly in zero-shot and few-shot scenarios, while recommendation-specific instruction-tuned language models demonstrate superior performance in few-shot learning and cross-domain generalization. Li et al. [123] adopt the large language models as context encoders and attempt to examine the upper limits of the large language models. Similarly, Kang et al. [120] propose a similar instruction tuning method for rating prediction recommendation tasks based on the T5 backbone. They find that the tuned language models, which leverage data efficiency, outperform traditional recommenders. PALR [121] further enhances the construction pipeline of recommendation-specific instruction tuning, which first employs large language models to generate reasoning as additional features based on the user’s behavior history. Next, a small set of candidates is retrieved using any existing model based on the user profile. Finally, to adapt general-purpose language models to the recommendation task, they convert the generated reasoning features, user interaction history, and retrieved candidates into natural language instruction data and fine-tune a language model. Existing instruction tuning methods of language models for recommendation scenarios typically focus on a single type of recommendation task, limiting the full utilization of language models’ strong generalization ability. InstructRec [122] addresses this limitation by formulating recommendation as an instruction-following procedure. They design various instruction templates to accommodate different recommendation tasks and employ GPT-3.5 to generate high-quality instruction data based on the historical data of users and templates. The language models fine-tuned with this instruction data can effectively handle a wide range of recommendation tasks and cater to diverse information requirements from different users.

6 LLMs as explainer

In addition to valuing the suggestions made by a recommendation model, users are also interested in the comprehensible justifications for these recommendations [124, 125]. This is crucial as most recommender systems are black boxes whose inner workings are inscrutable to human understanding [126], diminishing user trust. Taking drug recommendations, for instance, it is unacceptable to recommend drugs with good curative effects simply but fail to give reasons why they are effective. To this end, explainable recommendations aim to couple high-quality suggestions with accessible explanations. This not only helps to improve the

Table 2 LLMs for content interpreter

Approach	Task	LLM backbone	Tuning strategy	Datasets
TALLRe [119]	Sequential recommendation	LLaMA-7B	Instruct tuning & Fine tuning	MovieLens100k, BookCrossing
LLMs-Rec [120]	Rating prediction	Flan-T5-Base, Flan-T5-XXL	Fine tuning	MovieLens-1M, Amazon book
PALR [121]	Item recommendation	LLaMa-7B	Instruction tuning	MovieLens-1M, Amazon beauty
InstructRec [122]	Sequential recommendation personalized search	Flan-T5-XL	Instruction tuning	Amazon-Games, CDs

model’s transparency, persuasiveness, and reliability, but also facilitates the identification and rectification of potential errors through insightful explanations. These benefits have been extensively documented in recent work [38, 127-129]. For instance, [128] conducted a study that involved addressing 40 difficult tasks and evaluating the impact of explanations on zero-shot and few-shot scenarios. Their findings demonstrated that explanations have a positive effect on model performance by establishing a connection between examples and interpretation.

Traditional approaches mainly focus on template-based explanations, which can be broadly categorized into item-based, user-based, and attribute-based explanations [130]. Item-based explainable methods relate recommendations to familiar items [131], explaining that the recommended item bears similarity to others the user prefers, which are prevalent on platforms like Amazon [132] and Netflix [133]. However, due to its collaboration, it may underperform in personalized recommendations requiring diversity and can struggle to identify relevant items among industrial settings with vast items efficiently. In contrast, user-based explanations [134] leverage social relationships to make recommendations by explaining that users with similar interests also favor the recommended item. The user’s social property makes these explanations more persuasive, encouraging users to try the recommendations. However, the variance in user preferences may render this approach less impactful in gauging actual preference. Lastly, attribute-based explanations focus on highlighting the attributes of recommended items that users might find appealing, essentially conveying “these features might interest you”. This method demands customization according to each user’s interests, yielding higher accuracy and satisfaction. Thus, they are at the forefront of research [124, 135-138].

Obviously, such explanations typically employ pre-defined and formulaic formats, such as explanations based on similar items or friends. Although capable of conveying essential information, such inflexible formats may diminish the user experience and satisfaction by lacking adaptability and personalization [124]. For this reason, natural language generation approaches have received increasing attention. Early work [139-141] mainly relied on recurrent neural networks (e.g., LSTM [142], GRU [143]). Limited by the model’s expressiveness, they often suffer from the issue of insufficient diversity. With the excellent performance of Transformer-based models in various natural language tasks, some work attempts to integrate Transformer-based models into explainable recommendations. Li et al. [144] use the position vectors corresponding to the user (item) IDs to predict interpreted tokens. Subsequent work [145] has shown that the generated explanation cannot justify the user’s preference by synthesizing irrelevant descriptions. Therefore, Ni et al. [146] used such information as guided input to BERT to obtain a controllable justification. Considering that such auxiliary information is not always available in real-world scenarios, ExBERT [145] only requires historical explanations written by users, and utilizes a multi-head self-attention based encoder to capture the relevance between these explanations and user-item pairs. Recently, MMCT [147], EG4Rec [148], and KAER [149] have further carried out finer-grained modeling of information such as visual images, time series, and emotional tendencies to obtain high-quality interpretations. An early attempt, LLM4Vis [150], has been paid to the explainable visualization recommendation through ChatGPT. RecExplainer [151] proposes three alignment strategies for interpretability.

Due to the limited expressive power of traditional language models, natural language generation methods are prone to long-range dependence problems [145], that is, the input of long texts will appear to generate explanations that lack diversity and coherence in content. In addition, these explanation methods are tightly coupled with specific recommendation models (e.g., NETE [141]), or directly design a new recommendation model (e.g., NRT [139],

PETER [144]), and they are often powerless when faced with existing advanced recommendation models, which limits their generalizability. This is also a flaw in template-based methods. Notably, in industrial settings, recommendation algorithms frequently involve not just a single model but a cascade or integration of multiple models, and these elaborate combinations further exacerbate the difficulty of deciphering recommendations.

Thanks to LLMs’ remarkable generative ability in language tasks, they are ideal for tackling the aforementioned challenges [152]. Firstly, with the leverage of extensive training data, LLMs adeptly harness human language, encompassing context, metaphors, and complex syntax. This equips them to craft customized explanations that are precise, natural, and adaptable to various user preferences [141, 144, 153], mitigating the limitations of conventional, formulaic explanations. Secondly, the unique in-context learning capabilities of LLMs, such as zero-shot prompting, few-shot prompting, and chain-of-thought prompting, enable them to garner real-time user feedback during interactions, furnish recommendation outcomes, and their corresponding interpretations, fostering bidirectional human-machine alignment. Recent study [154] has demonstrated the potential of LLMs in elucidating the intricacies of complex models, as evidenced by GPT-4 autonomously interpreting the function of GPT-2’s each neuron by inputting appropriate prompts and the corresponding neuron activation. This showcases an innovative approach to interpreting deep learning-based recommendation models. It’s critical to highlight that this interpretation technique is agnostic to the model’s architecture, distinguishing it from traditional interpretations that are bound to specific algorithms. Thus, recommendation interpretations founded on LLMs pave the way for a versatile and scalable interpretational framework with broader applicability.

Although LLMs have inherently significant advantages in recommendation explanations, it is imperative to recognize potential issues. Firstly, akin to recommendation models, LLMs are essentially black boxes that are difficult for humans to understand. We cannot identify what concepts they give explanations based on [155]. Also, the explanation given may be insincere; that is, the explanations are inconsistent with their recommended behaviors. Some recent developments [38,156] involve utilizing chains of thought to prompt reasoning for improved interpretability; however, the opacity of the reasoning process of each step remains a concern, and [157] has questioned the possible unfaithful explanations of chain-of-thought prompting. Secondly, the extensive data utilized by LLMs may encompass human biases and erroneous content [158]. Consequently, even if the explanation aligns with the model’s recommendation behavior, both the explanation and recommendation could be flawed. Monitoring and calibrating these models to ensure fairness and accuracy in explainable recommendations is essential. Lastly, generative models exhibit varying levels of proficiency across different tasks, leading to inconsistencies in performance. Identical semantic cues could yield disparate recommendation explanations. This inconsistency has been substantiated by recent studies [159, 160] focusing on the LLMs’ robustness. Addressing these issues calls for exploring techniques to mitigate or even circumvent low-reliability explanatory behavior, and investigating how LLMs can be trained to consistently generate reliable recommendation explanations, especially under adversarial conditions, is a worthwhile avenue for further research.

7 LLMs as common system reasoner

With the development of large language models, there is an observation that LLMs exhibit reasoning abilities [ 2,161 ] when they are sufficiently large, which is fundamental for human
intelligence for decision-making and problem-solving. By providing the models with the ‘chain of thoughts’ [38], such as prompting with ‘let us think about it step by step’, the large language models exhibit emergent abilities for reasoning and can arrive at conclusions or judgments according to the evidence or logics. Accordingly, for recommender systems, large language models are capable of reasoning to help user interest mining, thus improving performance.

7.1 Making direct recommendations

In-context learning [167-173] is one of the emergent abilities of LLMs that differentiate LLMs from previous pre-trained language models, where, given a natural language instruction and task demonstrations, LLMs would generate the output by completing the word sequence without training or tuning [3]. As for in-context learning, the prompt follows by the task instruction and/or the several input-output pairs to demonstrate the task and a test input is added to require the LLM to make predictions. The input-output pair is called a shot. This emergent ability enables prediction on new cases without tuning unlike previous machine learning.

In the realm of recommender systems, numerous studies have explored the performance of zero-shot/few-shot learning using large language models, covering the common recommendation tasks such as rating prediction, and ranking prediction. These studies evaluate the ability of language models to provide recommendations without explicit tuning, as summarized in Table 3, where all methods adopt in-context learning for direct recommenders. The general process can be attached in Figure 4. Accordingly, we have the following findings:

The aforementioned studies primarily focused on evaluating zero-shot/few-shot recommenders using open-domain datasets, predominantly in domains such as movies and books. Large language models are trained on extensive open-domain datasets, enabling them to possess a significant amount of common-sense knowledge, including information about well-known movies. However, when it comes to private domain data, such as e-commerce products or specific locations, the ability of zero-shot recommenders lacks of validation, which is expected to be challenging.
Current testing methods necessitate the integration of additional modules to validate the performance of zero-shot recommenders for specific tasks. In particular, for ranking tasks that involve providing a list of items in order of preference, a candidate generation module is employed to narrow down the pool of items [164] and [165]. Generative-based models like gpt-3.5-turbo generate results in a generative manner rather than relying on recall from existing memories, thus requiring additional modules to implement ID-based item recommendations.
From the perspective of recommendation performance, zero-shot recommenders exhibit some capabilities and few-shot learners perform better. However, there still exists a substantial gap when compared to traditional recommendation models, particularly finetuned large language models designed specifically for recommenders, such as P5 [174] and M6-Rec [175]. This highlights that large language models do not possess a significant advantage in personalized modeling.

Another important emergent ability is the ‘step by step’ reasoning, where LLMs can solve complex tasks by utilizing prompts including previous intermediate reasoning steps, called the ‘chain of thoughts’ strategy [38]. Wang and Lim [164] design a three-step prompt, namely NIR, to capture user preferences, extract the most representative movies and rerank

Table 3 Zero/Few-shot learners of LLMs for RS

Approach

LLM backbone

Task

Metric

Datasets

ICL

COT

[162]

gpt-3.5-turbo

Rating prediction sequential recommendation direct recommendation explanation generation review summarization

RMSE,MAE

HR,NDCG

BLUE4,ROUGE,Human

Eval

BLUE4,ROUGE,Human

Eval

Amazon beauty

[163]

Text-davinci-002

text-davinci-003

gpt-3.5-turbo

Point-wise pairwise list-wise

NDCG,MRR

MovieLens-1M

Amazon-Book

Amazon-Music

MIND-small

[120]

Flan-U-PALM gpt-3.5-turbo text-davinci-003

Rating prediction ranking prediction

RMSE,MAE

ROC-AUC

MovieLens-1M

Amazon-Books

[164]

Text-davinci-003

Reranking

NDCG,HR

MovieLens 100 K

[165]

gpt-3.5-turbo

Reranking

NDCG

MovieLens-1M

Amazon-Games

[166]

gpt-3.5-turbo

Reranking

Precision

MIND

Figure 4 An example of zero/few-shot learning for direct recommenders

the items after item filtering. Such a multi-step reasoning strategy significantly improves recommendation performance.

7.2 Reasoning for automated selection

Automated Machine Learning (AutoML) is widely applied in recommender systems to eliminate the costly manual setup with trials and errors. The search space in recommender systems can be categorized in (1) Embedding size (2) Feature (3) Feature interaction (4) Model architecture. Embedding size search, such as [176-179] seeks for appropriate embedding size for each feature to avoid resources overconsumption. Searching for features consisting of raw feature search [180,181] and synthetic feature search [182,183], which selects a subset from the set of original or cross features to maintain informative features to reduce both computation and space cost. Feature interaction search, such as [184-188], automatically filters out feature interactions that are not helpful. Model architecture search, like [189-192], expands the search space to the integral architectures. The search strategy shifts from the discrete reinforcement learning process, which iteratively samples architectures for training and is time-consuming, into the differentiable searching, which adaptively selects architectures within one-shot learning to circumvent the computational burden, for more efficient convergence. The evaluation for each sampled architecture then acts as the signal to adjust the selections. That is, there is a decision maker who memorizes the prior results of previous architecture choices and analyzes the prior results to give the next recommended choice.

The emergent LLMs have excellent memorization and reasoning capabilities that would work for automated learning. Several works have attempted to validate the potential of automated machine learning with LLMs. Preliminarily, GPT-NAS [193] takes advantage of the generative capability of LLMs. The architecture of networks is formulated into sequential characters, and thus the generation of network architectures can be easily achieved through the generative pre-training models. NAS-Bench-101 [194] is utilized for pre-training and the state-of-the-art results are used for fine-tuning. The generative pre-training models produce reasonable architectures, which would reduce the search space for later genetic algorithms for searching optimal architectures. The relatively advanced reasoning ability is further evaluated in GENIUS [195], where GPT-4 is employed as a black-box agent to generate potential better-performing architectures according to previous trials including tried architectures with their evaluation performance. According to the results, GPT-4 can generate good architecture networks, showing the potential for more complicated tasks. Yet it is too difficult for LLMs to directly make decisions on challenging technical problems only by prompting. To balance
efficiency and interpretability, one approach is to integrate the LLMs into certain search strategies, where the genetic algorithm guides the search process and LLMs generate the candidate crossovers. LLMatic [196] and EvoPrompting [197] use code-LLMs as mutation and crossover operators for a genetic NAS algorithm. During the evolution process, each generation has a certain probability of deciding whether to perform crossover or mutation to produce new offspring. Crossover and mutation are generated by prompting LLMs. Such a solution integrates LLM into the genetic search algorithm, which would achieve better performances than direct reasoning over the whole space.

The research mentioned above brings valuable insights into the field of automated learning in recommender systems. However, several challenges need to be addressed. Firstly, the search space in recommender systems is considerably more complex, encompassing diverse types of search space and facing significant volume issues. This complexity poses a challenge in effectively exploring and optimizing the search space. Secondly, compared to the common architecture search in other domains, recommender systems lack a strong foundation of knowledge regarding the informative components within the search space, especially the effective high-order feature interactions. Unlike well-established network structures in other areas, recommender systems operate in various domains and scenarios, resulting in diverse and domain-specific components. Addressing these challenges and advancing the understanding of the search space and informative components in recommender systems will pave the way for significant improvements in automated learning approaches.

8 LLMs as conversational agent

Conversational recommender system (CRS) is a specialized type of recommendation tool that aims to uncover users’ interests and preferences through dialogue, enabling personalized recommendations and real-time adjustment of recommendation strategies based on user feedback. Compared to traditional recommender systems, conversational recommender systems have the advantage of real-time understanding of user intents and the ability to adapt recommendations based on user feedback. Typically, a conversational recommender system consists of two main components: a dialogue module and a recommendation module.

In this section, we will primarily focus on discussing the dialogue module, which plays a crucial role in facilitating effective user-system interactions and understanding user preferences (Figure 5).

In a conversational recommender system, the dialogue module typically takes the form of a dialogue system. Dialogue systems can generally be classified into two main categories: chit-chat and task-oriented. The former focuses on open-domain question answering, and two major methods are commonly employed: generative and retrieval-based methods. Generative methods [198-200] utilize a sequence-to-sequence model structure to generate responses, while retrieval-based methods [201-203] transform the task of generating responses into a retrieval problem by searching for the most relevant response in a response database based on the dialogue context. In conversational recommender systems, task-oriented dialogue systems are more often required, as they are specifically designed to assist users in accomplishing specific tasks. For task-oriented dialogue systems, a common approach [24,204] is to treat the response generation as a pipeline and handle it separately using four components: dialogue understanding [205, 206], dialogue state tracking [207-209], dialogue policy learning [24, 210], and natural language generation [211,212]. Another approach is to employ an end-toend method [213-215], training an encoder-decoder model to handle all the processing steps

Figure 5 Important approaches in conversational agent

collectively. The first approach suffers from scalability issues and lacks synergy between the components, while the second approach requires a substantial amount of supervised data for training.

Based on the classification of dialogue systems, common approaches in conversational recommender systems can also be divided into two categories: attribute-based QA (questionanswering) and generative methods. The attribute-based QA approach [24, 216-218] utilizes a pipeline method within the dialogue system. In each dialogue turn, the system needs to decide whether to ask the user a question or provide a recommendation. The decision-making process, particularly regarding which attribute to ask about, is typically handled by a policy network. On the other hand, generative methods do not explicitly model the decision-making process. Instead, they often employ an end-to-end training approach, where a sequence-to-sequence model generates output directly from a shared vocabulary of words and items. Whether the generated output is chit-chat, a question, or a recommendation is implicitly determined during the generation process. Compared to attribute-based QA methods, generative methods [219-222] appear to be simpler and more scalable. However, they require a large amount of supervised training data. With the advancement of pre-trained language models (PLMs) in the field of natural language processing, particularly models like BERT [223] and GPT [224], the capabilities of pre-trained models in language understanding and generation have become increasingly powerful. Researchers have found that fine-tuning pre-trained models with a small amount of supervised data can yield impressive results on specific tasks. This discovery has led to the application of PLMs in generative conversational recommender systems. For example, DialoGPT [215] achieved promising dialogue intelligence by fine-tuning GPT-2 on dialogue data collected from platforms like Reddit. Subsequently, BARCOR [220], RecInDial [222], and UniCRS [221] utilized DialoGPT for constructing conversational recommender systems, with variations in their action decision strategies. While PLMs reduce the dependency of generative dialogue models on extensive data, the fine-tuning process still incurs significant computational time and requires the collection of high-quality domainspecific training data due to the large parameter space.

With the increase in model parameters and training data, the intelligence and knowledge capacity of models continues to improve. OpenAI has been expanding the model parameters and training data while employing techniques such as RLHF (Reinforcement Learning from

Human Feedback) and Instruction Tuning to further fine-tune GPT-3 [3]. This has led to the emergent abilities of models like InstructGPT [37] and subsequent models like ChatGPT, which exhibit incredible intelligence and have opened the doors to new intelligent dialogue systems based on large language models (LLMs). Furthermore, Google’s BARD and META’s LLaMA [32] are also large language dialogue models that have been proposed and demonstrated remarkable performance in conversational abilities. The Vicuna model, for instance, utilizes dialogue corpora shared by users in using ChatGPT to fine-tune the open-source LLaMA model, with the team claiming it can achieve over

of ChatGPT’s capability. This series of successive LLM introductions has brought new insights to conversational recommender systems. Due to the utilization of extensive open-domain corpora during the training of LLM, it possesses inherent conversational recommendation capabilities and can provide reasonable recommendations in open domains such as movies, music, news, and games.

However, there are still significant challenges in building an enterprise-level CRS. The first challenge is the lack of awareness of large models about private domain data. It is well known that most of the training data for LLMs, such as GPT-3, comes from publicly available sources on the internet. As a result, these models may lack visibility into the data that resides within information platforms, making their modeling and understanding capabilities of such data relatively poor. To address this challenge, there are currently two approaches being explored: fine-tuning [215] and tool learning [225, 226]. Fine-tuning involves tuning LLM using private domain-specific dialogue data. There are two major concerns in the approach. First, massive high-quality domain-specific dialogue data is required to tune the extremely large model. However, in most recommendation scenarios, data primarily consists of explicit or implicit user-item interactions, which may lack conversational context. Therefore, generating high-quality dialogue data from interaction data is a key concern in the approach. In RecLLM [226] and iEvaLM [227], researchers have proposed using LLMs to construct a user simulator for generating conversational data. Besides, the fine-tuning technique plays a crucial role in determining the ultimate quality of LLMs. A well-designed and effective fine-tuning strategy can lead to significant improvements in the model’s performance and capabilities, such as instruction tuning and RLHF proposed in InstructGPT [3]. Tool learning is another approach to address this challenge, and its main idea is to treat traditional recommendation models as tools to be utilized, such as Matrix Factorization (MF) and DeepFM. For a more detailed explanation of tool learning, please refer to Section 9. Since recommendation models are domain-specific, LLM can leverage these models to obtain recommendation results and recommend them to the users in the response. In this approach, there are two main technical points: the construction of the tool model and the engineering of prompts to guide the LLM in the proper utilization of the tool. First of all, conventional recommendation models generally use id or categorical features as input, while users always give their requirements or preferences in natural language in conversations. Therefore, unstructured text features should be taken into consideration in tool construction. In Chat-Rec [225], a conventional recommendation model and a text embedding-based model(text-embedding-ada-002) are used as tools. RecLLM [226] adapted a language model enhanced dual-encoder model and several text retrieval methods as the recommendation engine. On the other hand, despite the strong intelligence and reasoning capabilities of LLMs, effectively harnessing these abilities requires well-crafted prompts for guidance. For instance, the Chain of Thought proposed by Jason [38] could trigger LLM to reason and engage in step-by-step thinking, which
benefits the tool-using capability. Subsequent studies like ToT [228], Plan-and-Solve [229] and ReAct [230] have proposed more advanced techniques for prompt design to assist in guiding LLM to engage in deeper thinking and tool planning.

The second challenge lies in the issue of memory and comprehension in long conversations. Due to the input constraints of LLMs, models like ChatGPT can support a maximum of 4096 tokens in a single call, including both input and output. In multi-turn dialogue scenarios, longer dialogue contexts often meet the risk of exceeding this token limit. The simplest approach to tackle this challenge is to trim the dialogue by discarding earlier turns. However, in conversational recommender systems, users may express a significant amount of personal information and interests in the early stages of the conversation. The omission of such information directly impacts the accuracy of recommendations. To address this issue, several relevant works have proposed solutions. MemPrompt [231] enhances the prompt by incorporating a memory module, enabling GPT-3 to possess stronger long-dialogue memory capability. Similarly, RecLLM [226] leverages LLM to extract user profiles and store them as factual statements in user memory. When processing user queries, relevant facts are retrieved based on text similarity.

9 Tool-learning and its applications in recommendation

9.1 LLM-based tool learning

Tool learning is an emerging research field that aims to enhance task-solving capabilities by combining specialized tools with foundational models, which has been understood by [232] as two perspectives:

Tool-augmented learning treats specialized tools as assistants in order for improving the quality and accuracy of tasks, or Tool for AI;
Tool-oriented learning focuses more on training models to effectively use tools, controlling and optimizing tool-applying processes, or AI for Tool.
Tool learning has found applications in various fields, and this section primarily focuses on tool learning paradigms based on large language models (LLMs). While recent works often involve a combination of these two perspectives, we do not specifically categorize each work into one type. LLMs, such as GPT, are well-suited for tool learning applications [233]. With their powerful natural language processing capabilities, LLMs can break down complex tasks into smaller sub-tasks and convert them into executable instructions. Specialized tools allow LLMs to access knowledge that is beyond their own understanding. By integrating specialized tools, LLMs can better understand and then address complex problems, offering more accurate and efficient solutions for personalized systems.

LLMs are commonly applied as controllers to select and manage various existing AI models to solve complex tasks, which rely on user input and language interfaces on making summarizations. They act as the central component, responsible for comprehending problem statements and making decisions regarding which actions to execute. Additionally, they aggregate the outcomes based on the results of the executed actions. In that case, HuggingGPT [243] leverages existing models from the Hugging Face community

to assist in

task-solving. Visual ChatGPT [244] combines visual foundation models like BLIP [247], Stable Diffusion [248], etc. with LangChain

to handle complex visual tasks, while the following TaskMatrix.AI [245] maintains a unified API Platform extending the capabilities of Visual ChatGPT, extends the capabilities of Visual ChatGPT by maintaining a unified API Platform, enabling input from multiple modalities and generating more complex task solutions. On the contrary, Auto-GPT

operates as an agent that autonomously understands specific targets through natural language and performs all processes in an automated loop, without requiring mandatory human input.

WebGPT [239] introduces a text-based Web browsing interactive environment, where LLMs learn to emulate the complete process of human interaction with a Web browser using behavior cloning and rejection sampling techniques. In ReAct [230], by leveraging an intuitive prompt, LLMs learn to generate both reasoning paths and task-specific actions alternately when solving a specific task. The execution of specific actions is delegated to corresponding tools, and external feedback obtained from these tools is utilized to validate and guide the reasoning process further. The motivation behind Toolformer [246] aligns closely with ReAct; however, it goes a step further by combining diverse tools within a single model. This integration provides the model with flexible decision-making abilities and improved generalization capabilities, achieved through a simple yet effective self-supervised method. In contrast to prior works, LATM [249] takes a novel approach by empowering LLMs to directly generate tools. It achieves a division of labor within the task-solving process by employing LLMs at different scales: the tool maker, tool user, and dispatcher. LATM is entirely composed of LLMs, enabling the self-generation and self-utilization of tools.

9.2 Applications in personalization scenarios

Recently, LLMs have demonstrated impressive abilities in leveraging internal world knowledge and common sense reasoning to accurately understand user intent from dialogues. Moreover, LLMs can communicate with users fluently in natural language, offering a seamless and delightful user experience. These advantages make LLMs an appealing choice as recommendation agents to enhance the personalized experience.

However, despite the impressive memory capacity of LLMs, they face challenges in memorizing specific knowledge in private and specialized domains without sufficient training. For instance, storing the item corpus and all user profiles in a recommender system can be challenging for LLMs. This limitation can result in LLMs generating inaccurate or incorrect responses and makes it difficult to control their behavior within a specific domain. Furthermore, LLMs face the challenge of the temporal generalization problem as external knowledge continues to evolve and change over time. To address these issues, various tools can be utilized to augment LLMs and enhance their effectiveness as recommendation agents. Table 4 shows examples of related tools.

Search engine Search engines are widely employed to provide external knowledge to LLMs, reducing LLMs’ memory burden and alleviating the occurrence of hallucinations in LLMs’ responses. BlenderBot 3 [250] uses specific datasets to fine-tune a series of modules, enabling LLMs to learn to invoke the search engine at the appropriate time and extract useful knowledge from the retrieval results. LaMDA [238] learns to use a toolset that includes an IR

Table 4 LLM-based tool learning approaches

Approach	Tool usage	LLM backbone	Task
Re3 [234]	LLM	gpt3-instruct-175B gpt3-instruct13B	Long stories generation
PEER [235]	LLM	LM-Adapted T5	Editions, citations, quotes
METALM [236]	Pretrained encoders with diverse modalities	Transformer (pretrained from scratch)	language-only tasks visionlanguage tasks
Atlas [237]	Dense retriever	T5	Knowledge-intensive language tasks massively-multitask language understanding question answering fact checking
LaMDA [238]	Retriever translator calculator	Decoder-only transformer	Dialog
WebGPT [239]	Web browser	gpt-3	Question answering
Mind’s Eye [240]	Physics engine text-to-code LM	gpt-3 PaLM	Reasoning
PAL [241]	Python interpreter	CODEX(code-davinci-002)	Mathematical symbolic algorithmic reasoning
SayCan [242]	Robots	PaLM	Real-world robotic tasks
HuggingGPT [243]	AI models in hugging face community	gpt-3.5-turbo text-davinci-003 gpt4	Image classification image captioning object detection etc.
Auto-GPT	Web browser	gpt-3.5-turbo text-davinci-003 gpt4	User-specified tasks
Visual ChatGPT [244] Taskmatrix.AI [245]	Visual foundation models customized models with unified API form	text-davinci-003	Visual customized tasks
ReAct [230]	Wikipedia API	PaLM-540B	Question answering face verificaiton
Toolformer [246]	Calculator Q&A system search engine translation system calendar	GPT-J	Downstream tasks

system, a translator, and a calculator through fine-tuning to generate more factual responses. RETA-LLM [251] is a toolkit for retrieval-augmented LLMs. It disentangles IR systems and LLMs entirely, facilitating the development of in-domain LLM-based systems.Thoppilan et al. [238] shows a case of applying LaMDA to content recommendation. Preconditioned on a few role-specific dialogues, LaMDA can play the role of a music recommendation agent.

Recommendation engine Some works have attempted to alleviate the memory burden of LLMs by equipping them with a recommendation engine as a tool, enabling LLMs to offer recommendations grounded on the item corpus. The recommendation engine in ChatREC [225] is further divided into two stages: retrieve and reranking, which aligns with typical recommendation system strategies. In the retrieval stage, LLMs utilize traditional recommendation systems as tools to retrieve 20 items from the item corpus as a candidate item set. Subsequently, LLMs employ themselves as tools to rerank the candidate item set. LLMs’ commonsense reasoning ability, coupled with the internal world knowledge within them, allow them to provide explanations for the sorting results. The recommendation engine tool used in RecLLM [226] is highly similar to it in Chat-REC, and it is also divided into retrieval and reranking stages. RecLLM provides several practical solutions for large-scale retrievals, such as Generalized Dual Encoder Model and Concept Based Search, and so on.

Database Databases are also utilized as tools to supplement additional information for LLMs. In order to better cope with the cold-start problem for new items and alleviate the temporal generalization problem of LLMs, a vector database is utilized to provide information for new items that the LLMs are unaware of in Chat-REC [225]. When encountering new items, LLMs can utilize this database to access information about them based on the similarity between the user’s request embedding and item embeddings in the database. User profiles can also help LLMs better understand the user’s intent. RecLLM [226] employs a user profile module as a tool to deposit meaningful and enduring facts about users exposed during historical conversations in user memory and retrieve a single fact related to the current dialogue when necessary.

Although some works have applied the concept of tool learning to personalization systems, there are still interesting and promising research topics that deserve exploration. 1) Fine-tuning models for better tool use. In-context learning has shown promise in teaching LLMs how to effectively use tools with a small number of demonstrations, as shown in Chat-REC and RecLLM. However, LLMs often struggled to learn strategies for handling complex contexts with limited demonstrations. Fine-tuning is a viable option for improving tool use, but it requires sufficient training data and effective techniques. RecLLM further fine-tunes some modules of it using synthetic data generated by a user simulator through RLHF [37] technique. Investigating methods to obtain sufficient training data and developing tailored fine-tuning techniques for recommendation systems is a worthwhile research direction. 2) Developing a more powerful recommendation engine. Traditional recommendation systems often rely on collaborative filtering signals and item-to-item transition relationships for recommendations. However, with the use of LLMs as the foundation models, user preferences can be reflected through natural language and even images. Therefore, developing a recommendation engine that supports multimodal data is a crucial research direction. Additionally, the recommendation engine should also be capable of adjusting the candidate set based on user preferences or feedback (such as querying movies of a specific genre or disliking an item in the recommendation set). 3) Building more tools. To provide LLMs with more authentic and personalized information, the development of additional tools is crucial. For example, APIs for querying knowledge graphs [252] or accessing users’ social
relationships can enhance the knowledge available to LLMs, enabling more accurate and tailored recommendations.

10 LLMs as personalized content creator

Traditional recommender systems focus on suggesting existing items based on user preferences and historical data, where displayed content is already generated for retrieval. However, with the advancements in techniques and platforms for content creators, personalized content creator has attracted more and more attention, where more appealing content is customized generated to match the user’s interests and preferences, especially in the realm of online advertising [253]. The common contents contain the visual and semantic contents [254-256], such as title, abstract, description, copywritings, ad banners, thumbnail, and videos. One more widely discussed topic is text ad generation, where the ad title and ad description are generated with personalized information. Earlier works adopt the pre-defined templates [254, 257, 258] to reduce the extensive human effort, which, however, often fail to fully meet the user’s interests and preferences. More recent data-driven methods have emerged, which incorporate user feedback as rewards in the reinforcement learning framework to guide the generation process [259-262]. Furthermore, the incorporation of pre-trained language models has played a significant role in improving the generation process for multiple content items [263-266]. This integration helps refine the content generation models and improve their ability to meet user preferences effectively.

As recommender systems and large language models continue to evolve, a promising technique that would bring new opportunities is the integration of AI Generated Content (AIGC). AIGC [267] involves the creation of digital content, such as images, music and natural language through AI models, with the aim of making the content creation process more efficient and accessible. Earlier efforts in this field focused on deep-learning-based generative models, including Generative Adversarial Networks (GANs) [268], Variational AutoEncoders (VAEs) [269], Normalizing Flows [270], and diffusion-based models [271] for high-quality image generation. As the generative model evolves, it eventually emerges as the transformer architecture [28], acting as the foundational blocks for BERT [29] and GPT [224] in the field of NLP, and for Vision Transformer (ViT) [272] and Swin Transformer [273] in the field of CV. Moreover, the scope of generation tasks expanded from uni-modal to multi-modal tasks, including the representative model CLIP [274], which can be used as image encoders with multi-modal prompting for generation. The multi-modal generation has become an essential aspect of AIGC, which learns the multimodal connection and interaction, typically including vision language generation [274], text audio generation [275], text graph generation [276], text Code Generation [277]. With the emergence of large language models, nowadays AIGC is achieved by extracting the human intention from instructions and generating the content according to its knowledge and intention. Representative products, including ChatGPT [278], DALL-E-2 [279], Codex [280] and Midjourney [281], have attaining significant attention from society. With the growth of data and model size, the model can learn more comprehensive information and thus leading to more realistic and high-quality content creators.

Recall to the personalized content creator, the large language models would bring opportunities from the following points (Figure 6). Large language models would further extend

Figure 6 Advantages of content creator with LLMs

the capabilities of the pre-trained model, allowing for better reasoning of user personalized intent and interest. Previous methods [264, 265] depending on tailored pre-training models may be enhanced to better improve the reasoning abilities and few-shot prompting. Secondly, Reinforcement Learning from Human Feedback (RLHF) strategy can be applied to fine-tune models to better capture the user intent information, similar to existing RL-based framework [260] for text ad generation. Last but not least, the powerful generative abilities of large language models empower realistic creation thanks to the availability of sufficient cross-modal knowledge bases. The work [282] more specifically proposes a recommendation paradigm based on ChatGPT, where the generation process receives feedback and multiple rounds of conversions to better capture the user explicit preferences. Compared to previous training paradigms, more explicit expressions of user interest can be understood by the large language models and converted into corresponding instructions to guide the generation of content, significantly alleviating the problem of extremely sparse feedback. The integration of AIGC (Artificial Intelligence and Generative Content) with recommender systems offers valuable opportunities in various business scenarios. In E-commerce, large language models can power chatbots for personalized product recommendations and create captivating content to attract users. In Customer Service, AIGC enables automated responses, FAQs, and personalized assistance, improving support efficiency and customer satisfaction. Overall, integrating AIGC with recommender systems enhances user experiences and drives business growth.

However, there are two major security and privacy risks for personalized content creators. One of the concerns is the reliability of models like ChatGPT in terms of factuality, as indicated in the work [283]. While these models generate content that appears reasonable, there is a risk of distributing misleading or inaccurate information, which can weaken the truthfulness of internet content. This concern becomes particularly crucial in personalized recommendations, where the model may inadvertently promote misleading information tailored to the user’s interests. The second concern revolves around data privacy, encompassing both user profiles and long-term human interaction histories. In the case of large language models, these interaction histories are collected or shared, potentially leading to the large
models memorizing sensitive user data. Previous work [284] has demonstrated that large language models, especially GPT-2 [285], memorize and leak individual training examples. This emphasizes the need for strict user approval and careful handling of annotator data to mitigate privacy risks. It is crucial to develop new techniques that prioritize privacy preservation during the training process.

11 Open challenges

11.1 Deployment difficulties

11.1.1 Industrial challenges

Personalization services, particularly with recommender systems, are complex industrial products that face numerous challenges when implemented in real-world scenarios. We will now summarize the key challenges as follows:

Scaling computational resources Existing large language models, such as BERT and GPT, demand significant computational power for training and inference. This includes high memory usage and time consumption. Fine-tuning these models to align them with personalization systems, which has shown promising results for improved personalization performance, can be computationally intensive. Several efficient finetuning strategies, e.g., option tuning in M6-Rec [175], Lora [286], QLora [287], have been developed to address this issue and pave the way for more efficient tuning.

Significant response time Achieving efficient response times is crucial for online serving and greatly impacts the personalized user experience. Response time includes both the inference phase of large language models and the concurrent user requests in large numbers. The introduction of large language models can result in considerable inference time, posing a challenge for real-world deployment. One approach is to pre-compute the embeddings of intermediate outputs from language models, storing and indexing them in a vector database, particularly for methods that utilize large language models as textual encoders. Other approaches, such as distillation and quantization, aim to strike a balance between performance and latency.

11.1.2 Laborious data collection

Large language models are widely known to leverage extensive amounts of open-domain knowledge during their training and fine-tuning processes. These knowledge sources include well-known references such as Wikipedia, books, and various websites [3]. Similarly, when applied in recommender systems, these models often rely on representative open-domain datasets such as MovieLens and Amazon Books. While this type of open-domain knowledge contains a wealth of common-sense information, personalized tasks require access to more domain-specific data that is not easily shareable. Additionally, the nature of user feedback in personalized tasks can be complex and sparse, often accompanied by noisy feedback. Collecting and filtering this data, in contrast to acquiring common-sense knowledge, presents challenges. It incurs higher labour costs and introduces additional training redundancy due to the need for extensive data processing and filtering. Furthermore, designing appropriate prompts to instruct or fine-tune large language models is crucial for aligning them with the distribution of in-domain inputs in personalization tasks. By carefully tailoring the prompts,
researchers and practitioners can guide the model to produce outputs that better cater to personalized applications, thereby maximizing performance and effectiveness.

11.2 Capacities of LLMs

11.2.1 Long text modeling

Large language models have a limitation on the maximum number of input tokens they can handle, typically constrained by the context window size, e.g., 4096 for ChatGPT. This poses challenges when dealing with long user behavior sequences, which are common in modern recommender systems. Careful design is necessary to generate effective and appropriate prompt inputs within this limited length. In the case of conversations with multiple rounds, accumulating several rounds of dialogue can easily exceed the token limit of models. The current approach in handling long conversations is to truncate the history, keeping only the most recent tokens. However, this truncation discards valuable historical information, potentially harming the performance of models. To address these challenges, several techniques can be employed. One approach is to prioritize and select the most relevant parts of the user behavior sequence or conversation history to include in the prompt. This selection can be based on various criteria such as recency, importance, or relevance to the task at hand. Another technique involves summarizing or compressing the lengthy input while preserving essential information. This can be achieved through techniques like extractive summarization or representing the long sequence in a condensed form. Moreover, architectural modifications, such as hierarchical or memory-augmented models, can be explored to better handle long sequences by incorporating mechanisms to store and retrieve relevant information efficiently.

In addition, collaborative modeling of long text data and recommendation tasks is an emerging and pressing challenge. In conventional personalization systems, item ID information along with other categorical information is commonly used for modeling feature interactions and user preferences. With the rise of large language models, there would be a growing trend toward leveraging textual information more extensively. Textual data provides unique insights about items or users, making it valuable for modeling purposes. From the perspective of modeling, dealing with long text data requires more attention and complexity compared to categorical data, not to mention the need to match the modeling of user interests. From the perspective of implementation, reforming the entire pipeline becomes necessary to accommodate the requirements of efficient latency. There lie the technical challenges of incorporating long text data into recommendation models and serving them in real time.

11.2.2 Interpretability and explainability

While large language models provide good reasoning capabilities, they are notorious for the nature of the ‘black box’, which is highly complex and non-linear in their enormous size and layered architecture, making it challenging to comprehend the internal workings and understand the generation process of recommendations. Without a deep understanding of how the model operates, it becomes challenging to detect and address biases or ensure fair and ethical recommendations. Once transparency about the internal mechanisms is lacking, users struggle to trust and accept the decisions made by the system. Users often desire understandable explanations for recommended choices. Addressing the challenge of model interpretability and explainability requires research involving natural language processing, explainable AI, human-computer interaction, and recommendation systems. The development of techniques
that unveil the inner workings of language models, facilitate the generation of meaningful and accurate interpretations, and enable robust evaluation methods is the main focus. By providing transparent and interpretable recommendations, users can establish trust, understand the reasoning behind the recommendations, and make informed decisions.

11.3 Evaluation

11.3.1 Tools and metrics

Conventional personalization systems typically rely on task-specific metrics such as rankingoriented metrics, NDCG, AUC, and Recall to evaluate model performance. However, with the integration of large language models into recommender systems, the evaluation tools and metrics undergo significant changes. Traditional metrics may not sufficiently capture the performance of recommender systems powered by large language models, which introduce novel capabilities and generate recommendations in a different manner and require the development of new evaluation tools.

One crucial aspect of evaluation is considering user preferences in large language modelpowered systems, which requires a user-centric approach. Metrics such as user satisfaction, engagement, and overall experience become essential considerations. For example, Liu’s work [162] proposes a crowdsourcing task to assess the quality of generated explanations and review summaries, providing a way to evaluate the effectiveness of the generated content. Additionally, user satisfaction surveys and feedback questionnaires can serve as valuable options.

Another perspective to consider is the health of the system, which involves evaluating novelty and assessing factors like diversity, novelty, serendipity, and user retention rates. These metrics help evaluate the freshness of recommendations and the long-term effects of large language models. Furthermore, it is crucial to assess the interpretability and fairness of recommendations. The interpretability assessment focuses on measuring the clarity, understandability, and transparency of recommendations. Simultaneously, the fairness evaluation aims to address potential biases in personalized results. By prioritizing fairness, we strive to create personalized experiences that are equitable and inclusive for all users. Both of these evaluations are essential to enhance the overall user experience and build confidence in the personalized recommendations delivered by the system.

11.3.2 Trade-off between helpfulness, honesty, harmlessness

When large language models are employed for personalization, some of their disadvantages would be magnified. Striving for a more honest and harmless system may come at the expense of system performance.

First of all, the accuracy and factuality of the system must be ensured. Although large language models can generate seemingly reasonable content, there is a risk of disseminating misleading or inaccurate information. This becomes even more critical when incorporating user feedback, as the model may mimic user behaviors in an attempt to appear honest. However, this imitation can result in biased guidance for users, offering no real benefits.

Secondly, in terms of harmlessness, concerns regarding privacy, discrimination, and ethics arise. While large language models have the potential to provide highly personalized recommendations by leveraging user data, privacy, and data security become paramount. Unlike open-domain datasets, the privacy of individual data used for training should be rigorously
protected, with strict user permissions for sharing their personal information. For discrimination, large language models may inevitably reflect biases inherent in the training data, leading to discriminatory recommendations. Considering the biased user and item distribution, which is much more significant in recommender systems with the long-tail effect, where biased user and item distribution can lead to decisions that favor majority choices, resulting in discrimination against certain users. The final concern revolves around ethical considerations. Harmful messages, if clicked by users unconsciously, can guide large language models toward generating similar harmful content. However, when assisting in personalized decision-making, it is essential for large language models to have the capability to minimize exposure to harmful messages and guide users in a responsible manner. Approaches like constructing a Constitutional AI [288], where critiques, revisions, and supervised Learning are adopted for better training models, may offer valuable insights.

By addressing these concerns, safeguarding privacy, mitigating discrimination, and adhering to ethical guidelines, recommender systems can leverage the power of large language models while ensuring user trust, fairness, and responsible recommendations.

12 Conclusion

In conclusion, the emergence of large language models represents a significant breakthrough in the field of artificial intelligence. Their enhanced abilities in understanding, language analysis, and common-sense reasoning have opened up new possibilities for personalization. In this paper, we provide several perspectives on when large language models adapt to personalization systems. We have observed a progression from utilizing low-level capabilities of large language models to enhance performance, to leveraging their potential in complex interactions with external tools for end-to-end tasks. This evolution promises to revolutionize the way personalized services are delivered. We also acknowledge the open challenges that come with the integration of large language models into personalization systems.

Author Contributions Jin Chen organized the main writing and wrote the main manuscript text. Zheng Liu organized the entire structure of the manuscript and wrote the introduction of large language models. Xu Huang wrote the Section 8: LLMs as Conversational Agent. Chenwang Wu wrote the Section 6: LLMs as Explainer. Qi Liu and Gangwei Jiang wrote the Section 5: LLMs as Content Interpreter. Yuanhao Pu, Yuxuan Lei and Xiaolong Chen wrote Section 9: Tool-Learning and its Applications in Recommendation. Xingmei Wang wrote the Section 7.2 Reasoning for Automated Selection. Kai zheng, Defu Lian, and Enhong Chen revised the article over and over again. All authors reviewed the manuscript.

Funding Open access funding provided by Hong Kong University of Science and Technology. This work was supported by the National Natural Science Foundation of China (NSFC) No. 62022077.

Data Availability No datasets were generated or analysed during the current study.

Declarations

Competing of interest The authors declare no competing interests.
Ethical Approval This research does not involve human participants or pose potential harm to individuals. As such, formal ethical approval was not required. However, we would like to emphasize our commitment to ethical research practices and confirm that the study adheres to the ethical guidelines and principles set forth in ACM Code of Ethics.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

References

Zhao, W.X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., Min, Y., Zhang, B., Zhang, J., Dong, Z., et al.: A survey of large language models. arXiv:2303.18223 (2023)
Huang, J., Chang, K.C.-C.: Towards reasoning in large language models: a survey. arXiv:2212.10403 (2022)
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al.: Language models are few-shot learners. Adv. Neural. Inf. Process. Syst. 33, 1877-1901 (2020)
Salemi, A., Mysore, S., Bendersky, M., Zamani, H.: Lamp: When large language models meet personalization. arXiv:2304.11406 (2023)
Wu, L., Zheng, Z., Qiu, Z., Wang, H., Gu, H., Shen, T., Qin, C., Zhu, C., Zhu, H., Liu, Q., et al.: A survey on large language models for recommendation. arXiv:2305.19860 (2023)
Lin, J., Dai, X., Xi, Y., Liu, W., Chen, B., Li, X., Zhu, C., Guo, H., Yu, Y., Tang, R., et al.: How can recommender systems benefit from large language models: a survey. arXiv:2306.05817 (2023)
Fan, W., Zhao, Z., Li, J., Liu, Y., Mei, X., Wang, Y., Tang, J., Li, Q.: Recommender systems in the era of large language models (llms). arXiv:2307.02046 (2023)
Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., Riedl, J.: Grouplens: an open architecture for collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work, pp. 175-186 (1994)
Pan, R., Zhou, Y., Cao, B., Liu, N.N., Lukose, R., Scholz, M., Yang, Q.: One-class collaborative filtering. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 502-511 (2008). IEEE
Koren, Y., Bell, R., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30-37 (2009)
Wang, J., De Vries, A.P., Reinders, M.J.: Unifying user-based and item-based collaborative filtering approaches by similarity fusion. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 501-508 (2006)
Pazzani, M.J., Billsus, D.: Content-based recommendation systems. In: The adaptive Web: Methods and Strategies of Web Personalization, pp. 325-341. Springer (2007)
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 448-456 (2011)
Zhang, F., Yuan, N.J., Lian, D., Xie, X., Ma, W.-Y.: Collaborative knowledge base embedding for recommender systems. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 353-362 (2016)
Liu, H., Wu, F., Wang, W., Wang, X., Jiao, P., Wu, C., Xie, X.: Nrpa: neural recommendation with personalized attention. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1233-1236 (2019)
Wang, H., Wang, N., Yeung, D.-Y.: Collaborative deep learning for recommender systems. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1235-1244 (2015)
Zhou, G., Zhu, X., Song, C., Fan, Y., Zhu, H., Ma, X., Yan, Y., Jin, J., Li, H., Gai, K.: Deep interest network for click-through rate prediction. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1059-1068 (2018)
Zhou, G., Mou, N., Fan, Y., Pi, Q., Bian, W., Zhou, C., Zhu, X., Gai, K.: Deep interest evolution network for click-through rate prediction. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 5941-5948 (2019)
Wang, X., He, X., Wang, M., Feng, F., Chua, T.-S.: Neural graph collaborative filtering. In: Proceedings of the 42 nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 165-174 (2019)
Kang, W.-C., McAuley, J.: Self-attentive sequential recommendation. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 197-206 (2018). IEEE
Hidasi, B., Karatzoglou, A., Baltrunas, L., Tikk, D.: Session-based recommendations with recurrent neural networks. arXiv:1511.06939 (2015)
Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., Jiang, P.: Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1441-1450 (2019)
Paschou, M., Sakkopoulos, E.: Personalized assistant apps in healthcare: a systematic review. In: 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1-8 (2019). IEEE
Sun, Y., Zhang, Y.: Conversational recommender system. In: The 41st International Acm Sigir Conference on Research & Development in Information Retrieval, pp. 235-244 (2018)
Jannach, D., Manzoor, A., Cai, W., Chen, L.: A survey on conversational recommender systems. ACM Computing Surveys (CSUR) 54(5), 1-36 (2021)
Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. Adv. Neural. Inf. Process. Syst. 13 (2000)
Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, vol. 2, pp. 1045-1048 (2010). Makuhari
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30 (2017)
Kenton, J.D.M.-W.C., Toutanova, L.K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of NAACL-HLT, pp. 4171-4186 (2019)
Shanahan, M.: Talking about large language models. Commun. ACM 67(2), 68-79 (2024)
Chowdhery, A., Narang, S., Devlin, J., Bosma, M., Mishra, G., Roberts, A., Barham, P., Chung, H.W., Sutton, C., Gehrmann, S., et al.: Palm: scaling language modeling with pathways. J. Mach. Learn. Res. 24(240), 1-113 (2023)
Touvron, H., Lavril, T., Izacard, G., Martinet, X., Lachaux, M.-A., Lacroix, T., Rozière, B., Goyal, N., Hambro, E., Azhar, F., et al.: Llama: open and efficient foundation language models. arXiv:2302.13971 (2023)
Le Scao, T., Wang, T., Hesslow, D., Bekman, S., Bari, M.S., Biderman, S., Elsahar, H., Muennighoff, N., Phang, J., Press, O., et al.: What language model to train if you have one million gpu hours?. In: Findings of the Association for Computational Linguistics: EMNLP 2022, pp. 765-782 (2022)
Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling laws for neural language models. arXiv:2001.08361 (2020)
Hoffmann, J., Borgeaud, S., Mensch, A., Buchatskaya, E., Cai, T., Rutherford, E., Casas, D.d.L., Hendricks, L.A., Welbl, J., Clark, A., et al.: Training compute-optimal large language models. arXiv:2203.15556 (2022)
Sanh, V., Webson, A., Raffel, C., Bach, S., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Raja, A., Dey, M., et al.: Multitask prompted training enables zero-shot task generalization. In: International Conference on Learning Representations (2021)
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., et al.: Training language models to follow instructions with human feedback. Adv. Neural. Inf. Process. Syst. 35, 27730-27744 (2022)
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Chi, E., Le, Q., Zhou, D.: Chain of thought prompting elicits reasoning in large language models. arXiv:2201.11903 (2022)
Fu, Y., Peng, H., Khot, T.: How does gpt obtain its ability? tracing emergent abilities of language models to their sources. Yao Fu’s Notion (2022)
Zhang, Y., Chen, X., Ai, Q., Yang, L., Croft, W.B.: Towards conversational search and recommendation: system ask, user respond. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 177-186 (2018)
Wang, H., Zhang, F., Xie, X., Guo, M.: Dkn: deep knowledge-aware network for news recommendation. In: Proceedings of the 2018 World Wide Web Conference, pp. 1835-1844 (2018)
Huang, J., Zhao, W.X., Dou, H., Wen, J.-R., Chang, E.Y.: Improving sequential recommendation with knowledge-enhanced memory networks. In: The 41st international ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 505-514 (2018)
Wang, H., Zhang, F., Hou, M., Xie, X., Guo, M., Liu, Q.: Shine: signed heterogeneous information network embedding for sentiment link prediction. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 592-600 (2018)
Yu, X., Ren, X., Gu, Q., Sun, Y., Han, J.: Collaborative filtering with entity similarity regularization in heterogeneous information networks. IJCAI HINA 27 (2013)
Shi, C., Zhang, Z., Luo, P., Yu, P.S., Yue, Y., Wu, B.: Semantic path based personalized recommendation on weighted heterogeneous information networks. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 453-462 (2015)
Ma, W., Zhang, M., Cao, Y., Jin, W., Wang, C., Liu, Y., Ma, S., Ren, X.: Jointly learning explainable rules for recommendation with knowledge graph. In: The World Wide Web Conference, pp. 1210-1221 (2019)
Huang, X., Fang, Q., Qian, S., Sang, J., Li, Y., Xu, C.: Explainable interaction-driven user modeling over knowledge graph for sequential recommendation. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 548-556 (2019)
Wang, H., Zhang, F., Wang, J., Zhao, M., Li, W., Xie, X., Guo, M.: Ripplenet: propagating user preferences on the knowledge graph for recommender systems. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 417-426 (2018)
Wang, H., Zhao, M., Xie, X., Li, W., Guo, M.: Knowledge graph convolutional networks for recommender systems. In: The World Wide Web Conference, pp. 3307-3313 (2019)
Wang, X., He, X., Cao, Y., Liu, M., Chua, T.-S.: Kgat: knowledge graph attention network for recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 950-958 (2019)
Tang, X., Wang, T., Yang, H., Song, H.: Akupm: attention-enhanced knowledge-aware user preference model for recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1891-1899 (2019)
Zhao, J., Zhou, Z., Guan, Z., Zhao, W., Ning, W., Qiu, G., He, X.: Intentgc: a scalable graph convolution framework fusing heterogeneous information for recommendation. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2347-2357 (2019)
Petroni, F., Rocktäschel, T., Lewis, P., Bakhtin, A., Wu, Y., Miller, A.H., Riedel, S.: Language models as knowledge bases?. arXiv:1909.01066 (2019)
Roberts, A., Raffel, C., Shazeer, N.: How much knowledge can you pack into the parameters of a language model?. arXiv:2002.08910 (2020)
Petroni, F., Lewis, P., Piktus, A., Rocktäschel, T., Wu, Y., Miller, A.H., Riedel, S.: How context affects language models’ factual predictions. In: Automated Knowledge Base Construction
Jiang, Z., Xu, F.F., Araki, J., Neubig, G.: How can we know what language models know? Trans. Assoc. Comput. Linguist 8, 423-438 (2020)
Wang, C., Liu, X., Song, D.: Language models are open knowledge graphs. arXiv:2010.11967 (2020)
Poerner, N., Waltinger, U., Schütze, H.: E-bert: efficient-yet-effective entity embeddings for bert. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 803-818 (2020)
Heinzerling, B., Inui, K.: Language models as knowledge bases: On entity representations, storage capacity, and paraphrased queries. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 1772-1791 (2021)
Wang, C., Liu, P., Zhang, Y.: Can generative pre-trained language models serve as knowledge bases for closed-book qa?. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 1: long papers), pp. 3241-3251 (2021)
Guu, K., Lee, K., Tung, Z., Pasupat, P., Chang, M.: Retrieval augmented language model pre-training. In: International Conference on Machine Learning, pp. 3929-3938 (2020). PMLR
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. Adv. Neural. Inf. Process. Syst. 26 (2013)
Zhu, Y., Wang, X., Chen, J., Qiao, S., Ou, Y., Yao, Y., Deng, S., Chen, H., Zhang, N.: Llms for knowledge graph construction and reasoning: recent capabilities and future opportunities. arXiv:2305.13168 (2023)
Zhang, Z., Liu, X., Zhang, Y., Su, Q., Sun, X., He, B.: Pretrain-kge: learning knowledge representation from pretrained language models. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 259-266 (2020)
Kumar, A., Pandey, A., Gadia, R., Mishra, M.: Building knowledge graph using pre-trained language model for learning entity-aware relationships. In: 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), pp. 310-315 (2020). IEEE
Kim, B., Hong, T., Ko, Y., Seo, J.: Multi-task learning for knowledge graph completion with pre-trained language models. In: Proceedings of the 28th International Conference on Computational Linguistics, pp. 1737-1743 (2020)
Choi, B., Jang, D., Ko, Y.: Mem-kgc: masked entity model for knowledge graph completion with pretrained language model. IEEE Access 9, 132025-132032 (2021)
Wang, B., Shen, T., Long, G., Zhou, T., Wang, Y., Chang, Y.: Structure-augmented text representation learning for efficient knowledge graph completion. In: Proceedings of the Web Conference 2021, pp. 1737-1748 (2021)
Xie, X., Zhang, N., Li, Z., Deng, S., Chen, H., Xiong, F., Chen, M., Chen, H.: From discrimination to generation: knowledge graph completion with generative transformer. In: Companion Proceedings of the Web Conference 2022, pp. 162-165 (2022)
Jiang, P., Agarwal, S., Jin, B., Wang, X., Sun, J., Han, J.: Text-augmented open knowledge graph completion via pre-trained language models. arXiv:2305.15597 (2023)
Yan, H., Gui, T., Dai, J., Guo, Q., Zhang, Z., Qiu, X.: A unified generative framework for various ner subtasks. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 1: long papers), pp. 5808-5822 (2021)
Li, B., Yin, W., Chen, M.: Ultra-fine entity typing with indirect supervision from natural language inference. Trans. Assoc. Comput. Linguist. 10, 607-622 (2022)
Kirstain, Y., Ram, O., Levy, O.: Coreference resolution without span representations. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 2: short papers), pp. 14-19 (2021)
Cattan, A., Eirew, A., Stanovsky, G., Joshi, M., Dagan, I.: Cross-document coreference resolution over predicted mentions. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 5100-5107 (2021)
Lyu, S., Chen, H.: Relation classification with entity type restriction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 390-395 (2021)
Wang, H., Focke, C., Sylvester, R., Mishra, N., Wang, W.: Fine-tune bert for docred with two-step process. arXiv:1909.11898 (2019)
Han, J., Collier, N., Buntine, W., Shareghi, E.: Pive: prompting with iterative verification improving graph-based generative capability of llms. arXiv:2305.12392 (2023)
Trajanoska, M., Stojanov, R., Trajanov, D.: Enhancing knowledge graph construction using large language models. arXiv:2305.04676 (2023)
West, P., Bhagavatula, C., Hessel, J., Hwang, J., Jiang, L., Le Bras, R., Lu, X., Welleck, S., Choi, Y.: Symbolic knowledge distillation: from general language models to commonsense models. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 4602-4625 (2022)
Xi, Y., Liu, W., Lin, J., Zhu, J., Chen, B., Tang, R., Zhang, W., Zhang, R., Yu, Y.: Towards open-world recommendation with knowledge augmentation from large language models. arXiv:2306.10933 (2023)
Wei, W., Ren, X., Tang, J., Wang, Q., Su, L., Cheng, S., Wang, J., Yin, D., Huang, C.: Llmrec: large language models with graph augmentation for recommendation. In: Proceedings of the 17th ACM International Conference on Web Search and Data Mining, pp. 806-815 (2024)
Zhao, Q., Qian, H., Liu, Z., Zhang, G.-D., Gu, L.: Breaking the barrier: utilizing large language models for industrial recommendation systems through an inferential knowledge graph. arXiv:2402.13750 (2024)
Razniewski, S., Yates, A., Kassner, N., Weikum, G.: Language models as or for knowledge bases. arXiv:2110.04888 (2021)
Yu, J., Wang, X., Tu, S., Cao, S., Zhang-Li, D., Lv, X., Peng, H., Yao, Z., Zhang, X., Li, H., et al.: Kola: Carefully benchmarking world knowledge of large language models. arXiv:2306.09296 (2023)
Ye, D., Lin, Y., Li, P., Sun, M.: Packed levitated marker for entity and relation extraction. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (vol. 1: long papers), pp. 4904-4917 (2022)
Lang, K.: Newsweeder: Learning to filter netnews. In: Machine Learning Proceedings 1995, pp. 331-339. Elsevier (1995)
Wang, H., Shi, X., Yeung, D.-Y.: Collaborative recurrent autoencoder: recommend while learning to fill in the blanks. Adv. Neural. Inf. Process. Syst. 29 (2016)
Dong, X., Yu, L., Wu, Z., Sun, Y., Yuan, L., Zhang, F.: A hybrid collaborative filtering model with deep structure for recommender systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)
Li, X., She, J.: Collaborative variational autoencoder for recommender systems. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 305-314 (2017)
Wu, C., Wu, F., Huang, Y., Xie, X.: Personalized news recommendation: methods and challenges. ACM Trans. Inf. Syst. 41(1), 24-12450 (2023)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML. JMLR Workshop and Conference Proceedings, vol. 32, pp. 1188-1196. JMLR.org (2014)
Song, Y., Elkahky, A.M., He, X.: Multi-rate deep learning for temporal recommendation. In: SIGIR, pp. 909-912. ACM (2016)
Kumar, V., Khattar, D., Gupta, S., Gupta, M., Varma, V.: Deep neural architecture for news recommendation. In: CLEF (Working Notes). CEUR Workshop Proceedings, vol. 1866. CEUR-WS.org (2017)
Okura, S., Tagami, Y., Ono, S., Tajima, A.: Embedding-based news recommendation for millions of users. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1933-1942 (2017)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (Workshop Poster) (2013)
Wu, C., Wu, F., An, M., Huang, J., Huang, Y., Xie, X.: NPA: neural news recommendation with personalized attention. In: KDD, pp. 2576-2584. ACM (2019)
An, M., Wu, F., Wu, C., Zhang, K., Liu, Z., Xie, X.: Neural news recommendation with long- and short-term user representations. In: ACL (1), pp. 336-345. Association for Computational Linguistics (2019)
Wu, C., Wu, F., Ge, S., Qi, T., Huang, Y., Xie, X.: Neural news recommendation with multi-head selfattention. In: EMNLP/IJCNLP (1), pp. 6388-6393. Association for Computational Linguistics (2019)
Wu, C., Wu, F., Qi, T., Huang, Y.: User modeling with click preference and reading satisfaction for news recommendation. In: IJCAI, pp. 3023-3029. ijcai.org (2020)
Khattar, D., Kumar, V., Varma, V., Gupta, M.: Weave&rec: a word embedding based 3-d convolutional network for news recommendation. In: CIKM, pp. 1855-1858. ACM (2018)
Zhu, Q., Zhou, X., Song, Z., Tan, J., Guo, L.: DAN: deep attention neural network for news recommendation. In: AAAI, pp. 5973-5980. rAAAI Press (2019)
Qiu, Z., Wu, X., Gao, J., Fan, W.: U-bert: Pre-training user representations for improved recommendation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4320-4327 (2021)
Zhang, Q., Li, J., Jia, Q., Wang, C., Zhu, J., Wang, Z., He, X.: Unbert: user-news matching bert for news recommendation. In: IJCAI, pp. 3356-3362 (2021)
Wu, C., Wu, F., Qi, T., Huang, Y.: Empowering news recommendation with pre-trained language models. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1652-1656 (2021)
Liu, Q., Zhu, J., Dai, Q., Wu, X.: Boosting deep ctr prediction with a plug-and-play pre-trainer for news recommendation. In: Proceedings of the 29th International Conference on Computational Linguistics, pp. 2823-2833 (2022)
Wu, C., Wu, F., Qi, T., Zhang, C., Huang, Y., Xu, T.: Mm-rec: visiolinguistic model empowered multimodal news recommendation. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2560-2564 (2022)
Yu, Y., Wu, F., Wu, C., Yi, J., Liu, Q.: Tiny-newsrec: effective and efficient plm-based news recommendation. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 5478-5489 (2022)
Zou, L., Zhang, S., Cai, H., Ma, D., Cheng, S., Wang, S., Shi, D., Cheng, Z., Yin, D.: Pre-trained language model based ranking in baidu search. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 4014-4022 (2021)
Liu, Y., Lu, W., Cheng, S., Shi, D., Wang, S., Cheng, Z., Yin, D.: Pre-trained language model for webscale retrieval in baidu search. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3365-3375 (2021)
Muhamed, A., Keivanloo, I., Perera, S., Mracek, J., Xu, Y., Cui, Q., Rajagopalan, S., Zeng, B., Chilimbi, T.: Ctr-bert: cost-effective knowledge distillation for billion-parameter teacher models. In: NeurIPS Efficient Natural Language and Speech Processing Workshop (2021)
He, J., Xu, B., Yang, Z., Han, D., Yang, C., Lo, D.: Ptm4tag: sharpening tag recommendation of stack overflow posts with pre-trained models. In: Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, pp. 1-11 (2022)
Zhang, X., Malkov, Y., Florez, O., Park, S., McWilliams, B., Han, J., El-Kishky, A.: Twhin-bert: a socially-enriched pre-trained language model for multilingual tweet representations. arXiv:2209.07562 (2022)
Rahmani, S., Naghshzan, A., Guerrouj, L.: Improving code example recommendations on informal documentation using bert and query-aware 1sh: a comparative study. arXiv:2305.03017 (2023)
Ding, H., Ma, Y., Deoras, A., Wang, Y., Wang, H.: Zero-shot recommender systems. arXiv:2105.08318 (2021)
Hou, Y., Mu, S., Zhao, W.X., Li, Y., Ding, B., Wen, J.-R.: Towards universal sequence representation learning for recommender systems. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 585-593 (2022)
Hou, Y., He, Z., McAuley, J., Zhao, W.X.: Learning vector-quantized item representation for transferable sequential recommenders. In: Proceedings of the ACM Web Conference 2023, pp. 1162-1171 (2023)
Yuan, Z., Yuan, F., Song, Y., Li, Y., Fu, J., Yang, F., Pan, Y., Ni, Y.: Where to go next for recommender systems? id-vs. modality-based recommender models revisited. arXiv:2303.13835 (2023)
Fu, J., Yuan, F., Song, Y., Yuan, Z., Cheng, M., Cheng, S., Zhang, J., Wang, J., Pan, Y.: Exploring adapter-based transfer learning for recommender systems: empirical studies and practical insights. arXiv:2305.15036 (2023)
Bao, K., Zhang, J., Zhang, Y., Wang, W., Feng, F., He, X.: Tallrec: an effective and efficient tuning framework to align large language model with recommendation. arXiv:2305.00447 (2023)
Kang, W.-C., Ni, J., Mehta, N., Sathiamoorthy, M., Hong, L., Chi, E., Cheng, D.Z.: Do llms understand user preferences?. evaluating llms on user rating prediction. arXiv:2305.06474 (2023)
Chen, Z.: Palr: Personalization aware llms for recommendation. arXiv:2305.07622 (2023)
Zhang, J., Xie, R., Hou, Y., Zhao, W.X., Lin, L., Wen, J.-R.: Recommendation as instruction following: a large language model empowered recommendation approach. arXiv:2305.07001 (2023)
Li, R., Deng, W., Cheng, Y., Yuan, Z., Zhang, J., Yuan, F.: Exploring the upper limits of text-based collaborative filtering using large language models: discoveries and insights. arXiv:2305.11700 (2023)
Wang, X., Chen, Y., Yang, J., Wu, L., Wu, Z., Xie, X.: A reinforcement learning framework for explainable recommendation. In: 2018 IEEE International Conference on Data Mining (ICDM), pp. 587-596 (2018). IEEE
Gao, J., Wang, X., Wang, Y., Xie, X.: Explainable recommendation through attentive multi-view learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3622-3629 (2019)
Lee, S., Wang, X., Han, S., Yi, X., Xie, X., Cha, M.: Self-explaining deep models with logic rule reasoning. Adv. Neural. Inf. Process. Syst. (2022)
Nye, M., Andreassen, A.J., Gur-Ari, G., Michalewski, H., Austin, J., Bieber, D., Dohan, D., Lewkowycz, A., Bosma, M., Luan, D., et al.: Show your work: scratchpads for intermediate computation with language models. arXiv:2112.00114 (2021)
Lampinen, A.K., Dasgupta, I., Chan, S.C., Matthewson, K., Tessler, M.H., Creswell, A., McClelland, J.L., Wang, J.X., Hill, F.: Can language models learn from explanations in context?. arXiv:2204.02329 (2022)
Zelikman, E., Wu, Y., Mu, J., Goodman, N.: Star: bootstrapping reasoning with reasoning. Adv. Neural. Inf. Process. Syst. 35, 15476-15488 (2022)
Zhang, Y., Chen, X., et al.: Explainable recommendation: a survey and new perspectives. Found. Trends® in Inf. Retr. 14(1), 1-101 (2020)
Schafer, J.B., Konstan, J., Riedl, J.: Recommender systems in e-commerce. In: Proceedings of the 1st ACM Conference on Electronic Commerce, pp. 158-166 (1999)
Linden, G., Smith, B., York, J.: Amazon. com recommendations: item-to-item collaborative filtering. IEEE Internet Comput. 7(1), 76-80 (2003)
Gomez-Uribe, C.A., Hunt, N.: The netflix recommender system: algorithms, business value, and innovation. ACM Trans. Manage. Inf. Syst. (TMIS) 6(4), 1-19 (2015)
Sinha, R., Swearingen, K.: The role of transparency in recommender systems. In: CHI’02 Extended Abstracts on Human Factors in Computing Systems, pp. 830-831 (2002)
Xian, Y., Zhao, T., Li, J., Chan, J., Kan, A., Ma, J., Dong, X.L., Faloutsos, C., Karypis, G., Muthukrishnan, S., et al.: Ex3: explainable attribute-aware item-set recommendations. In: Proceedings of the 15th ACM Conference on Recommender Systems, pp. 484-494 (2021)
Wang, X., Li, Q., Yu, D., Xu, G.: Reinforced path reasoning for counterfactual explainable recommendation. arXiv:2207.06674 (2022)
Verma, S., Beniwal, A., Sadagopan, N., Seshadri, A.: Recxplainer: post-hoc attribute-based explanations for recommender systems. arXiv:2211.14935 (2022)
Zhang, W., Yan, J., Wang, Z., Wang, J.: Neuro-symbolic interpretable collaborative filtering for attributebased recommendation. In: Proceedings of the ACM Web Conference 2022, pp. 3229-3238 (2022)
Li, P., Wang, Z., Ren, Z., Bing, L., Lam, W.: Neural rating regression with abstractive tips generation for recommendation. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345-354 (2017)
Dong, L., Huang, S., Wei, F., Lapata, M., Zhou, M., Xu, K.: Learning to generate product reviews from attributes. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: volume 1, long papers, pp. 623-632 (2017)
Li, L., Zhang, Y., Chen, L.: Generate neural template explanations for recommendation. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management, pp. 755-764 (2020)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735-1780 (1997)
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078 (2014)
Li, L., Zhang, Y., Chen, L.: Personalized transformer for explainable recommendation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (vol. 1: Long papers) (2021)
Zhan, H., Li, L., Li, S., Liu, W., Gupta, M., Kot, A.C.: Towards explainable recommendation via bertguided explanation generator. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1-5 (2023). IEEE
Ni, J., Li, J., McAuley, J.: Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 188-197 (2019)
Liu, Z., Ma, Y., Schubert, M., Ouyang, Y., Rong, W., Xiong, Z.: Multimodal contrastive transformer for explainable recommendation. IEEE Transactions on Computational Social Systems (2023)
Qu, Y., Nobuhara, H.: Explanation generated for sequential recommendation based on transformer model. In: 2022 Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on Advanced Intelligent Systems (SCIS&ISIS), pp. 1-6 (2022). IEEE
Bai, P., Xia, Y., Xia, Y.: Fusing knowledge and aspect sentiment for explainable recommendation. IEEE Access 8, 137150-137160 (2020)
Wang, L., Zhang, S., Wang, Y., Lim, E.-P., Wang, Y.: Llm4vis: explainable visualization recommendation using chatgpt. arXiv:2310.07652 (2023)
Lei, Y., Lian, J., Yao, J., Huang, X., Lian, D., Xie, X.: Recexplainer: aligning large language models for recommendation model interpretability. arXiv:2311.10947 (2023)
Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E., et al.: On the opportunities and risks of foundation models. arXiv:2108.07258 (2021)
Li, L., Zhang, Y., Chen, L.: Personalized prompt learning for explainable recommendation. ACM Trans. Inf. Syst. 41(4), 1-26 (2023)
Bills, S., Cammarata, N., Mossing, D., Tillman, H., Gao, L., Goh, G., Sutskever, I., Leike, J., Wu, J., Saunders, W.: Language models can explain neurons in language models (2023)
Wu, Z., Geiger, A., Potts, C., Goodman, N.D.: Interpretability at scale: identifying causal mechanisms in alpaca. arXiv:2305.08809 (2023)
Li, Y., Lin, Z., Zhang, S., Fu, Q., Chen, B., Lou, J.-G., Chen, W.: Making large language models better reasoners with step-aware verifier (2023)
Turpin, M., Michael, J., Perez, E., Bowman, S.R.: Language models don’t always say what they think: unfaithful explanations in chain-of-thought prompting. arXiv:2305.04388 (2023)
Li, S., Liu, H., Dong, T., Zhao, B.Z.H., Xue, M., Zhu, H., Lu, J.: Hidden backdoors in human-centric language models. In: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pp. 3123-3140 (2021)
Wang, J., Hu, X., Hou, W., Chen, H., Zheng, R., Wang, Y., Yang, L., Huang, H., Ye, W., Geng, X., et al.: On the robustness of chatgpt: an adversarial and out-of-distribution perspective. arXiv:2302.12095 (2023)
Han, R., Peng, T., Yang, C., Wang, B., Liu, L., Wan, X.: Is information extraction solved by chatgpt? an analysis of performance, evaluation criteria, robustness and errors. arXiv:2305.14450 (2023)
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., Bosma, M., Zhou, D., Metzler, D., et al.: Emergent abilities of large language models. arXiv:2206.07682 (2022)
Liu, J., Liu, C., Lv, R., Zhou, K., Zhang, Y.: Is chatgpt a good recommender? a preliminary study. arXiv:2304.10149 (2023)
Dai, S., Shao, N., Zhao, H., Yu, W., Si, Z., Xu, C., Sun, Z., Zhang, X., Xu, J.: Uncovering chatgpt’s capabilities in recommender systems. arXiv:2305.02182 (2023)
Wang, L., Lim, E.-P.: Zero-shot next-item recommendation using large pretrained language models. arXiv:2304.03153 (2023)
Hou, Y., Zhang, J., Lin, Z., Lu, H., Xie, R., McAuley, J., Zhao, W.X.: Large language models are zero-shot rankers for recommender systems. arXiv:2305.08845 (2023)
Li, X., Zhang, Y., Malthouse, E.C.: A preliminary study of chatgpt on news recommendation: personalization, provider fairness, fake news. arXiv:2306.10702 (2023)
Dong, Q., Li, L., Dai, D., Zheng, C., Wu, Z., Chang, B., Sun, X., Xu, J., Sui, Z.: A survey for in-context learning. arXiv:2301.00234 (2022)
Dai, D., Sun, Y., Dong, L., Hao, Y., Sui, Z., Wei, F.: Why can gpt learn in-context? language models secretly perform gradient descent as meta optimizers. arXiv:2212.10559 (2022)
Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., Zettlemoyer, L.: Rethinking the role of demonstrations: What makes in-context learning work?. arXiv:2202.12837 (2022)
Levy, I., Bogin, B., Berant, J.: Diverse demonstrations improve in-context compositional generalization. arXiv:2212.06800 (2022)
Xie, S.M., Raghunathan, A., Liang, P., Ma, T.: An explanation of in-context learning as implicit bayesian inference. In: International Conference on Learning Representations
Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y., Chen, A., et al.: In-context learning and induction heads. arXiv:2209.11895 (2022)
Akyürek, E., Schuurmans, D., Andreas, J., Ma, T., Zhou, D.: What learning algorithm is in-context learning? investigations with linear models. arXiv:2211.15661 (2022)
Geng, S., Liu, S., Fu, Z., Ge, Y., Zhang, Y.: Recommendation as language processing (rlp): a unified pretrain, personalized prompt & predict paradigm (p5). In: Proceedings of the 16th ACM Conference on Recommender Systems, pp. 299-315 (2022)
Cui, Z., Ma, J., Zhou, C., Zhou, J., Yang, H.: M6-rec: generative pretrained language models are openended recommender systems. arXiv:2205.08084 (2022)
Liu, S., Gao, C., Chen, Y., Jin, D., Li, Y.: Learnable embedding sizes for recommender systems. arXiv:2101.07577 (2021)
Liu, H., Zhao, X., Wang, C., Liu, X., Tang, J.: Automated embedding size search in deep recommender systems. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2307-2316 (2020)
Deng, W., Pan, J., Zhou, T., Kong, D., Flores, A., Lin, G.: Deeplight: deep lightweight feature interactions for accelerating ctr predictions in ad serving. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 922-930 (2021)
Ginart, A.A., Naumov, M., Mudigere, D., Yang, J., Zou, J.: Mixed dimension embeddings with application to memory-efficient recommendation systems. In: 2021 IEEE International Symposium on Information Theory (ISIT), pp. 2786-2791 (2021). IEEE
Wang, Y., Zhao, X., Xu, T., Wu, X.: Autofield: automating feature selection in deep recommender systems. In: Proceedings of the ACM Web Conference 2022, pp. 1977-1986 (2022)
Lin, W., Zhao, X., Wang, Y., Xu, T., Wu, X.: Adafs: adaptive feature selection in deep recommender system. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3309-3317 (2022)
Tsang, M., Cheng, D., Liu, H., Feng, X., Zhou, E., Liu, Y.: Feature interaction interpretability: a case for explaining ad-recommendation systems via neural interaction detection. arXiv:2006.10966 (2020)
Yuanfei, L., Mengshuo, W., Hao, Z., Quanming, Y., WeiWei, T., Yuqiang, C., Qiang, Y., Wenyuan, D.: Autocross: automatic feature crossing for tabular data in real-world applications. arXiv:1904.12857 (2019)
Liu, B., Zhu, C., Li, G., Zhang, W., Lai, J., Tang, R., He, X., Li, Z., Yu, Y.: Autofis: automatic feature interaction selection in factorization models for click-through rate prediction. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 26362645 (2020)
Liu, B., Xue, N., Guo, H., Tang, R., Zafeiriou, S., He, X., Li, Z.: Autogroup: automatic feature grouping for modelling explicit high-order feature interactions in ctr prediction. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 199208 (2020)
Chen, Y., Ren, P., Wang, Y., Rijke, M.: Bayesian personalized feature interaction selection for factorization machines. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 665-674 (2019)
Xie, Y., Wang, Z., Li, Y., Ding, B., Gürel, N.M., Zhang, C., Huang, M., Lin, W., Zhou, J.: Fives: feature interaction via edge search for large-scale tabular data. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3795-3805 (2021)
Su, Y., Zhang, R., Erfani, S., Xu, Z.: Detecting beneficial feature interactions for recommender systems. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 4357-4365 (2021)
Song, Q., Cheng, D., Zhou, H., Yang, J., Tian, Y., Hu, X.: Towards automated neural interaction discovery for click-through rate prediction. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 945-955 (2020)
Zhao, P., Xiao, K., Zhang, Y., Bian, K., Yan, W.: Ameir: automatic behavior modeling, interaction exploration and mlp investigation in the recommender system. In: IJCAI, pp. 2104-2110 (2021)
Wei, Z., Wang, X., Zhu, W.: Autoias: automatic integrated architecture searcher for click-trough rate prediction. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 2101-2110 (2021)
Cheng, M., Liu, Z., Liu, Q., Ge, S., Chen, E.: Towards automatic discovering of deep hybrid network architecture for sequential recommendation. In: Proceedings of the ACM Web Conference 2022, pp. 1923-1932 (2022)
Yu, C., Liu, X., Tang, C., Feng, W., Lv, J.: Gpt-nas: Neural architecture search with the generative pre-trained model. arXiv:2305.05351 (2023)
Ying, C., Klein, A., Christiansen, E., Real, E., Murphy, K., Hutter, F.: Nas-bench-101: Towards reproducible neural architecture search. In: International Conference on Machine Learning, pp. 7105-7114 (2019). PMLR
Zheng, M., Su, X., You, S., Wang, F., Qian, C., Xu, C., Albanie, S.: Can gpt-4 perform neural architecture search? arXiv:2304.10970 (2023)
Nasir, M.U., Earle, S., Togelius, J., James, S., Cleghorn, C.: Llmatic: Neural architecture search via large language models and quality-diversity optimization. arXiv:2306.01102 (2023)
Chen, A., Dohan, D.M., So, D.R.: Evoprompting: language models for code-level neural architecture search. arXiv:2302.14838 (2023)
Shang, L., Lu, Z., Li, H.: Neural responding machine for short-text conversation. arXiv:1503.02364 (2015)
Vinyals, O., Le, Q.: A neural conversational model. arXiv:1506.05869 (2015)
Sordoni, A., Galley, M., Auli, M., Brockett, C., Ji, Y., Mitchell, M., Nie, J.-Y., Gao, J., Dolan, B.: A neural network approach to context-sensitive generation of conversational responses. arXiv:1506.06714 (2015)
Wu, W., Yan, R.: Deep chit-chat: deep learning for chatbots. In: Proceedings of the 42 nd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1413-1414 (2019)
Qiu, X., Huang, X.: Convolutional neural tensor network architecture for community-based question answering. In: Twenty-Fourth International Joint Conference on Artificial Intelligence (2015)
Wan, S., Lan, Y., Guo, J., Xu, J., Pang, L., Cheng, X.: A deep architecture for semantic matching with multiple positional sentence representations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30 (2016)
Greco, C., Suglia, A., Basile, P., Semeraro, G.: Converse-et-impera: exploiting deep learning and hierarchical reinforcement learning for conversational recommender systems. In: AI* IA 2017 Advances in Artificial Intelligence: XVIth International Conference of the Italian Association for Artificial Intelligence, Bari, Italy, November 14-17, 2017, Proceedings 16, pp. 372-386 (2017). Springer
Yao, K., Zweig, G., Hwang, M.-Y., Shi, Y., Yu, D.: Recurrent neural networks for language understanding. In: Interspeech, pp. 2524-2528 (2013)
Mesnil, G., He, X., Deng, L., Bengio, Y.: Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding. In: Interspeech, pp. 3771-3775 (2013)
Goddeau, D., Meng, H., Polifroni, J., Seneff, S., Busayapongchai, S.: A form-based dialogue manager for spoken language applications. In: Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, vol. 2, pp. 701-704 (1996). IEEE
Henderson, M., Thomson, B., Young, S.: Deep neural network approach for the dialog state tracking challenge. In: Proceedings of the SIGDIAL 2013 Conference, pp. 467-471 (2013)
Mrkšić, N., Séaghdha, D.O., Wen, T.-H., Thomson, B., Young, S.: Neural belief tracker: data-driven dialogue state tracking. arXiv:1606.03777 (2016)
Cuayáhuitl, H., Keizer, S., Lemon, O.: Strategic dialogue management via deep reinforcement learning. arXiv:1511.08099 (2015)
Zhou, H., Huang, M., Zhu, X.: Context-aware natural language generation for spoken dialogue systems. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 2032-2041 (2016)
Dušek, O., Jurčíćek, F.: Sequence-to-sequence generation for spoken dialogue via deep syntax trees and strings. arXiv:1606.05491 (2016)
Wen, T.-H., Vandyke, D., Mrksic, N., Gasic, M., Rojas-Barahona, L.M., Su, P.-H., Ultes, S., Young, S.: A network-based end-to-end trainable task-oriented dialogue system. arXiv:1604.04562 (2016)
Bordes, A., Boureau, Y.-L., Weston, J.: Learning end-to-end goal-oriented dialog. arXiv:1605.07683 (2016)
Zhang, Y., Sun, S., Galley, M., Chen, Y.-C., Brockett, C., Gao, X., Gao, J., Liu, J., Dolan, B.: Dialogpt: large-scale generative pre-training for conversational response generation. arXiv:1911.00536 (2019)
Lei, W., He, X., Miao, Y., Wu, Q., Hong, R., Kan, M.-Y., Chua, T.-S.: Estimation-action-reflection: towards deep interaction between conversational and recommender systems. In: Proceedings of the 13th International Conference on Web Search and Data Mining, pp. 304-312 (2020)
Lei, W., Zhang, G., He, X., Miao, Y., Wang, X., Chen, L., Chua, T.-S.: Interactive path reasoning on graph for conversational recommendation. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2073-2083 (2020)
Deng, Y., Li, Y., Sun, F., Ding, B., Lam, W.: Unified conversational recommendation policy learning via graph-based reinforcement learning. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1431-1441 (2021)
Li, R., Ebrahimi Kahou, S., Schulz, H., Michalski, V., Charlin, L., Pal, C.: Towards deep conversational recommendations. Adv. Neural. Inf. Process. Syst. 31 (2018)
Wang, T.-C., Su, S.-Y., Chen, Y.-N.: Barcor: Towards a unified framework for conversational recommendation systems. arXiv:2203.14257 (2022)
Wang, X., Zhou, K., Wen, J.-R., Zhao, W.X.: Towards unified conversational recommender systems via knowledge-enhanced prompt learning. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 1929-1937 (2022)
Wang, L., Hu, H., Sha, L., Xu, C., Jiang, D., Wong, K.-F.: Recindial: a unified framework for conversational recommendation with pretrained language models. In: Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing, pp. 489-500 (2022)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 (2018)
Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al.: Improving language understanding by generative pre-training (2018)
Gao, Y., Sheng, T., Xiang, Y., Xiong, Y., Wang, H., Zhang, J.: Chat-rec: towards interactive and explainable llms-augmented recommender system. arXiv:2303.14524 (2023)
Friedman, L., Ahuja, S., Allen, D., Tan, T., Sidahmed, H., Long, C., Xie, J., Schubiner, G., Patel, A., Lara, H., et al.: Leveraging large language models in conversational recommender systems. arXiv:2305.07961 (2023)
Wang, X., Tang, X., Zhao, W.X., Wang, J., Wen, J.-R.: Rethinking the evaluation for conversational recommendation in the era of large language models. arXiv:2305.13112 (2023)
Yao, S., Yu, D., Zhao, J., Shafran, I., Griffiths, T.L., Cao, Y., Narasimhan, K.: Tree of thoughts: deliberate problem solving with large language models. arXiv:2305.10601 (2023)
Wang, L., Xu, W., Lan, Y., Hu, Z., Lan, Y., Lee, R.K.-W., Lim, E.-P.: Plan-and-solve prompting: improving zero-shot chain-of-thought reasoning by large language models. arXiv:2305.04091 (2023)
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y.: React: synergizing reasoning and acting in language models. arXiv:2210.03629 (2022)
Madaan, A., Tandon, N., Clark, P., Yang, Y.: Memory-assisted prompt editing to improve gpt-3 after deployment. arXiv:2201.06009 (2022)
Qin, Y., Hu, S., Lin, Y., Chen, W., Ding, N., Cui, G., Zeng, Z., Huang, Y., Xiao, C., Han, C., et al.: Tool learning with foundation models. arXiv:2304.08354 (2023)
Mialon, G., Dessì, R., Lomeli, M., Nalmpantis, C., Pasunuru, R., Raileanu, R., Rozière, B., Schick, T., Dwivedi-Yu, J., Celikyilmaz, A., et al.: Augmented language models: a survey. arXiv:2302.07842 (2023)
Yang, K., Peng, N., Tian, Y., Klein, D.: Re3: Generating longer stories with recursive reprompting and revision. arXiv:2210.06774 (2022)
Schick, T., Dwivedi-Yu, J., Jiang, Z., Petroni, F., Lewis, P., Izacard, G., You, Q., Nalmpantis, C., Grave, E., Riedel, S.: Peer: a collaborative language model. arXiv:2208.11663 (2022)
Hao, Y., Song, H., Dong, L., Huang, S., Chi, Z., Wang, W., Ma, S., Wei, F.: Language models are general-purpose interfaces. arXiv:2206.06336 (2022)
Izacard, G., Lewis, P., Lomeli, M., Hosseini, L., Petroni, F., Schick, T., Dwivedi-Yu, J., Joulin, A., Riedel, S., Grave, E.: Few-shot learning with retrieval augmented language models. arXiv:2208.03299 (2022)
Thoppilan, R., De Freitas, D., Hall, J., Shazeer, N., Kulshreshtha, A., Cheng, H.-T., Jin, A., Bos, T., Baker, L., Du, Y., et al.: Lamda: language models for dialog applications. arXiv:2201.08239 (2022)
Nakano, R., Hilton, J., Balaji, S., Wu, J., Ouyang, L., Kim, C., Hesse, C., Jain, S., Kosaraju, V., Saunders, W., et al.: Webgpt: browser-assisted question-answering with human feedback. arXiv:2112.09332 (2021)
Liu, R., Wei, J., Gu, S.S., Wu, T.-Y., Vosoughi, S., Cui, C., Zhou, D., Dai, A.M.: Mind’s eye: grounded language model reasoning through simulation. arXiv:2210.05359 (2022)
Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., Neubig, G.: Pal: Program-aided language models. arXiv:2211.10435 (2022)
Ahn, M., Brohan, A., Brown, N., Chebotar, Y., Cortes, O., David, B., Finn, C., Fu, C., Gopalakrishnan, K., Hausman, K., et al.: Do as i can, not as i say: grounding language in robotic affordances. arXiv:2204.01691 (2022)
Shen, Y., Song, K., Tan, X., Li, D., Lu, W., Zhuang, Y.: Hugginggpt: solving ai tasks with chatgpt and its friends in huggingface. arXiv:2303.17580 (2023)
Wu, C., Yin, S., Qi, W., Wang, X., Tang, Z., Duan, N.: Visual chatgpt: talking, drawing and editing with visual foundation models. arXiv:2303.04671 (2023)
Liang, Y., Wu, C., Song, T., Wu, W., Xia, Y., Liu, Y., Ou, Y., Lu, S., Ji, L., Mao, S., et al.: Taskmatrix. ai: completing tasks by connecting foundation models with millions of apis. arXiv:2303.16434 (2023)
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Zettlemoyer, L., Cancedda, N., Scialom, T.: Toolformer: language models can teach themselves to use tools. arXiv:2302.04761 (2023)
Li, J., Li, D., Xiong, C., Hoi, S.: Blip: Bootstrapping language-image pre-training for unified visionlanguage understanding and generation. In: International Conference on Machine Learning, pp. 1288812900 (2022). PMLR
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10684-10695 (2022)
Cai, T., Wang, X., Ma, T., Chen, X., Zhou, D.: Large language models as tool makers. arXiv:2305.17126 (2023)
Shuster, K., Xu, J., Komeili, M., Ju, D., Smith, E.M., Roller, S., Ung, M., Chen, M., Arora, K., Lane, J., et al.: Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. arXiv:2208.03188 (2022)
Liu, J., Jin, J., Wang, Z., Cheng, J., Dou, Z., Wen, J.-R.: Reta-llm: a retrieval-augmented large language model toolkit. arXiv:2306.05212 (2023)
Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., Wu, X.: Unifying large language models and knowledge graphs: a roadmap. arXiv:2306.08302 (2023)
Vempati, S., Malayil, K.T., Sruthi, V., Sandeep, R.: Enabling hyper-personalisation: automated ad creative generation and ranking for fashion e-commerce. In: Fashion Recommender Systems, pp. 25-48 (2020). Springer
Thomaidou, S., Lourentzou, I., Katsivelis-Perakis, P., Vazirgiannis, M.: Automated snippet generation for online advertising. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 1841-1844 (2013)
Zhang, X., Zou, Y., Zhang, H., Zhou, J., Diao, S., Chen, J., Ding, Z., He, Z., He, X., Xiao, Y., et al.: Automatic product copywriting for e-commerce. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 12423-12431 (2022)
Lei, Z., Zhang, C., Xu, X., Wu, W., Niu, Z.-Y., Wu, H., Wang, H., Yang, Y., Li, S.: Plato-ad: a unified advertisement text generation framework with multi-task prompt learning. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track, pp. 512-520 (2022)
Bartz, K., Barr, C., Aijaz, A.: Natural language generation for sponsored-search advertisements. In: Proceedings of the 9th ACM Conference on Electronic Commerce, pp. 1-9 (2008)
Fujita, A., Ikushima, K., Sato, S., Kamite, R., Ishiyama, K., Tamachi, O.: Automatic generation of listing ads by reusing promotional texts. In: Proceedings of the 12th International Conference on Electronic Commerce: Roadmap for the Future of Electronic Business, pp. 179-188 (2010)
Hughes, J.W., Chang, K.-h., Zhang, R.: Generating better search engine text advertisements with deep reinforcement learning. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2269-2277 (2019)
Wang, X., Gu, X., Cao, J., Zhao, Z., Yan, Y., Middha, B., Xie, X.: Reinforcing pretrained models for generating attractive text advertisements. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp. 3697-3707 (2021)
Chen, C., Wang, X., Yi, X., Wu, F., Xie, X., Yan, R.: Personalized chit-chat generation for recommendation using external chat corpora. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 2721-2731 (2022)
Zhang, C., Zhou, J., Zang, X., Xu, Q., Yin, L., He, X., Liu, L., Xiong, H., Dou, D.: Chase: commonsenseenriched advertising on search engine with explicit knowledge. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 4352-4361 (2021)
Kanungo, Y.S., Negi, S., Rajan, A.: Ad headline generation using self-critical masked language model. In: Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers, pp. 263-271 (2021)
Wei, P., Yang, X., Liu, S., Wang, L., Zheng, B.: Creater: ctr-driven advertising text generation with controlled pre-training and contrastive fine-tuning. arXiv:2205.08943 (2022)
Kanungo, Y.S., Das, G., Negi, S.: Cobart: controlled, optimized, bidirectional and auto-regressive transformer for ad headline generation. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp. 3127-3136 (2022)
Chen, Q., Lin, J., Zhang, Y., Yang, H., Zhou, J., Tang, J.: Towards knowledge-based personalized product description generation in e-commerce. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3040-3050 (2019)
Cao, Y., Li, S., Liu, Y., Yan, Z., Dai, Y., Yu, P.S., Sun, L.: A comprehensive survey of ai-generated content (aigc): a history of generative ai from gan to chatgpt. arXiv:2303.04226 (2023)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. Adv. Neural. Inf. Process. Syst. 27 (2014)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv:1312.6114 (2013)
Dinh, L., Krueger, D., Bengio, Y.: Nice: non-linear independent components estimation. arXiv:1410.8516 (2014)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural. Inf. Process. Syst. 33, 6840-6851 (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth words: transformers for image recognition at scale. In: International Conference on Learning Representations
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012-10022 (2021)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748-8763 (2021). PMLR
Chen, M., Tan, X., Li, B., Liu, Y., Qin, T., Liu, T.-Y., et al.: Adaspeech: adaptive text to speech for custom voice. In: International Conference on Learning Representations
Li, X., Taheri, A., Tu, L., Gimpel, K.: Commonsense knowledge base completion. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (vol. 1: long papers), pp. 1445-1455 (2016)
Feng, Z., Guo, D., Tang, D., Duan, N., Feng, X., Gong, M., Shou, L., Qin, B., Liu, T., Jiang, D., et al.: Codebert: a pre-trained model for programming and natural languages. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 1536-1547 (2020)
OpenAI: ChatGPT: A Large-Scale Generative Model for Conversation. OpenAI Blog (2020)
Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., Chen, M., Sutskever, I.: Zero-shot text-to-image generation. In: International Conference on Machine Learning, pp. 8821-8831 (2021). PMLR
Chen, M., Tworek, J., Jun, H., Yuan, Q., Pinto, H.P.d.O., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., et al.: Evaluating large language models trained on code. arXiv:2107.03374 (2021)
Midjourney: Midjourney. Retrieved from. https://midjourney.com (2022)
Wang, W., Lin, X., Feng, F., He, X., Chua, T.-S.: Generative recommendation: towards next-generation recommender paradigm. arXiv:2304.03516 (2023)
Borji, A.: A categorical archive of chatgpt failures. arXiv:2302.03494 (2023)
Carlini, N., Tramer, F., Wallace, E., et al.: Extracting training data from large language models. (2021)
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., Sutskever, I., et al.: Language models are unsupervised multitask learners
Hu, E.J., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W., et al.: Lora: low-rank adaptation of large language models. In: International Conference on Learning Representations (2021)
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: Qlora: Efficient finetuning of quantized llms. arXiv:2305.14314 (2023)
Bai, Y., Kadavath, S., Kundu, S., Askell, A., Kernion, J., Jones, A., Chen, A., Goldie, A., Mirhoseini, A., McKinnon, C., et al.: Constitutional ai: Harmlessness from ai feedback. arXiv:2212.08073 (2022)

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Authors and Affiliations

Jin Chen Zheng Liu Xu Huang Chenwang Gangwei Jiang Yuanhao Yuxuan Lei Xiaolong Chen Xingmei Wang Kai Zheng Defu Lian . Enhong Chen

Jin Chen
jinchen@ust.hk
Zheng Liu
zhengliu1026@gmail.com
Xu Huang
xuhuangcs@mail.ustc.edu.cn
Chenwang Wu
wcw1996@mail.ustc.edu.cn
Qi Liu
qiliu67@mail.ustc.edu.cn
Gangwei Jiang
gwjiang@mail.ustc.edu.cn
Yuanhao Pu
puyuanhao@mail.ustc.edu.cn
Yuxuan Lei
lyx180812@mail.ustc.edu.cn
Xiaolong Chen
chenxiaolong@mail.ustc.edu.cn
Xingmei Wang
xingmeiwang@mail.ustc.edu.cn
Kai Zheng
zhengkai@uestc.edu.cn
Defu Lian
liandefu@ustc.edu.cn
Enhong Chen
cheneh@ustc.edu.cn
1 Hong Kong University of Science and Technology, Hong Kong, China
2 Beijing Academy of Artificial Intelligence, Beijing, China
3 University of Science and Technology of China, Hefei, China
4 University of Electronic Science and Technology of China, Chengdu, China

Jin Chen and Zheng Liu contributed equally to this work.
Extended author information available on the last page of the article
https://www.apple.com/siri/
https://www.microsoft.com/en-us/cortana
https://huggingface.co
https://docs.langchain.com
https://github.com/Significant-Gravitas/Auto-GPT