تسويق-AutoM3L: التعلم الآلي الآلي المدرك للمجال لتحليلات العملاء المالية Marketing-AutoM3L: domain-aware automated machine learning for financial customer analytics

المجلة: Frontiers in Artificial Intelligence، المجلد: 9
DOI: https://doi.org/10.3389/frai.2026.1726900
PMID: https://pubmed.ncbi.nlm.nih.gov/41675582
تاريخ النشر: 2026-01-27
المؤلف: Ye Tian وآخرون
الموضوع الرئيسي: تسرب العملاء والتجزئة

نظرة عامة

تقدم ورقة البحث إطار عمل آلي، Marketing-AutoM3L، مصمم لتعزيز تحليلات العملاء المالية من خلال دمج هندسة الميزات الخاصة بالمجال في خطوط أنابيب التعلم الآلي. غالبًا ما تفشل أنظمة التعلم الآلي الآلي التقليدية (AutoML) في توليد ميزات ذات صلة بالتسويق، والتي تعتبر ضرورية للتنبؤ الفعال بسلوك العملاء. يعالج الإطار المقترح بيانات العملاء الخام وتعليمات اللغة الطبيعية بشكل مستقل، مما يمكّنه من أداء التعرف على نوع البيانات، وهندسة الميزات، واختيار النموذج، وتجميع خطوط الأنابيب. يتم حساب مؤشرات التسويق الرئيسية مثل مقاييس القرب، والتكرار، والقيمة النقدية (RFM)، وقيمة عمر العميل (CLV)، ودرجات التفاعل تلقائيًا، مما يؤدي إلى تحسينات في الدقة تتراوح من 1.4% إلى 5.4% مقارنة بالطرق الحالية مع تقليل وقت التطوير بشكل كبير من 156.9 دقيقة إلى 23.4 دقيقة.

تتمثل مساهمات هذا العمل في ثلاثة جوانب: تقديم مكونات هندسة ميزات مدركة للمجال تعزز الأداء التنبؤي، وتنفيذ أتمتة قائمة على LLM لتسريع تطوير خطوط الأنابيب، وتوفير واجهة لغة طبيعية تسمح لغير الخبراء بتكوين خطوط أنابيب التحليلات. تشمل اتجاهات البحث المستقبلية دمج تحليل المشاعر من اتصالات العملاء لالتقاط الإشارات السلوكية، واستخدام تقنيات الاستدلال السببي للحصول على رؤى قابلة للتنفيذ، وتطوير وحدات تفسير النموذج الآلي التي تترجم أهمية الميزات إلى توصيات تجارية. تهدف هذه التطورات إلى ديمقراطية الوصول إلى تحليلات العملاء المتقدمة وتحسين اتخاذ القرار في استراتيجيات التسويق.

مقدمة

تسلط مقدمة ورقة البحث الضوء على التحديات الملحة التي تواجهها المؤسسات المالية في التنبؤ بسلوك العملاء وتقليل التسرب في ظل تزايد المنافسة وتكاليف اكتساب العملاء. تؤكد على الدور الحاسم لتحليلات العملاء المتقدمة في تعزيز الاحتفاظ، وتحسين الإيرادات، وتسهيل التسويق المستهدف عبر مختلف القطاعات، بما في ذلك البنوك والاتصالات. على الرغم من أهمية هذه التحليلات، لا يزال النمذجة التنبؤية التقليدية تعتمد إلى حد كبير على اليدوي، مما يتطلب هندسة ميزات خاصة بالمجال وضبط معلمات فرعية بشكل مكثف، مما يخلق اختناقات ويحتاج إلى خبرة متخصصة تكافح العديد من المؤسسات للحفاظ عليها.

لمعالجة هذه الكفاءات، تقترح الورقة إطار عمل لبناء خطوط الأنابيب الآلية مصمم لتحليلات العملاء المالية، مستفيدة من التطورات الأخيرة في نماذج اللغة الكبيرة (LLMs). يهدف هذا الإطار إلى سد الفجوة بين أصحاب المصلحة في الأعمال والأنظمة التقنية من خلال أتمتة توليد خطوط الأنابيب التدريبية القابلة للتنفيذ من مجموعات بيانات العملاء الخام وتوجيهات اللغة الطبيعية. تشمل الميزات الرئيسية للإطار التعرف على نوع البيانات، وهندسة الميزات المدركة للمجال، واختيار النموذج، وتحسين المعلمات الفرعية، جميعها موجهة بواسطة الأهداف التجارية. يتضمن الإطار منهجيات تحليلات التسويق المعتمدة، مثل تحليل القرب-التكرار-القيمة النقدية (RFM) ونمذجة قيمة عمر العميل (CLV)، لتعزيز الدقة التنبؤية مع تقليل وقت التطوير بشكل كبير. تظهر التحقق التجريبي عبر مجموعات بيانات متعددة تحسينات في الدقة تتراوح من 1.4% إلى 5.4% مقارنة بالطرق الحالية، مما يبرز إمكانية الإطار في تبسيط وتعزيز تحليلات العملاء المالية.

طرق

في هذا القسم، يناقش المؤلفون تطور منهجيات التنبؤ بتسرب العملاء، مع تسليط الضوء على الانتقال من التقنيات الإحصائية التقليدية إلى الأساليب المتقدمة للتعلم الآلي. في البداية، كانت نماذج الانحدار اللوجستي مفضلة بسبب قابليتها للتفسير وتحليل أهمية الميزات المباشر. ومع ذلك، أصبحت طرق التجميع مثل الغابة العشوائية وآلات تعزيز التدرج أكثر شيوعًا بسبب فعاليتها في التقاط العلاقات غير الخطية بين سمات العملاء. بالإضافة إلى ذلك، أظهرت هياكل التعلم العميق، وخاصة النماذج الهجينة مثل BiLSTM-CNN، أداءً محسنًا من خلال دمج الاعتماد التسلسلي واستخراج الميزات المكانية. لا يزال تحليل RFM (القرب، التكرار، القيمة النقدية) تقنية أساسية في تقسيم العملاء، وقد تم تعزيزها الآن من خلال هندسة الميزات الآلية التي تعزز توقعات قيمة عمر العميل ودرجات التفاعل.

كما يحدد المؤلفون تحليلهم المقارن لـ Marketing-AutoM3L مقابل عدة أطر AutoML رائدة ومنهجيات تقليدية. يعمل Marketing-AutoM3L كخط أساس لـ AutoML متعدد الأغراض بدون تخصيصات تسويقية محددة. تشمل المقارنة TPOT، الذي يستخدم البرمجة الجينية لبناء خطوط الأنابيب الآلية؛ وAutoGluon، الذي يقدم توقعات جدولية مع تجميع النماذج؛ وGoogle AutoML عبر Vertex AI، المعروف بقدرات البحث في الهيكل العصبي. يمثل خط أنابيب ML اليدوي الممارسات التقليدية في علم البيانات، حيث يتم تصميم الميزات واختيار النموذج يدويًا. لضمان تقييم عادل، يتم تزويد جميع الطرق بمجموعات بيانات مسبقة المعالجة متطابقة، ويتم تعطيل التحسينات الخاصة بالطرق. يتم تطبيق حدود زمن التدريب المتسقة عبر جميع الأساليب لتقييم قابليتها العملية في سياقات الأعمال.

نتائج

تسلط النتائج المقدمة في هذا القسم الضوء على الأداء المتفوق لإطار عمل Marketing-AutoM3L عبر مجموعات بيانات مختلفة ومقاييس تقييم. كما هو موضح في الجدول 2، يتفوق Marketing-AutoM3L باستمرار على الطرق الأساسية، محققًا أعلى درجات ROC-AUC في جميع مجموعات البيانات الخمس، مع تحسينات تتراوح من 1.4% إلى 5.4% مقارنة بأقوى خط أساس. من الجدير بالذكر أن الإطار حقق ROC-AUC قدره 0.941 في مجموعة بيانات تسرب عملاء البنك، وذلك بفضل هندسة الميزات الخاصة بالمجال الفعالة التي تلتقط سمات العملاء المنظمة وأنماط السلوك. بالمقابل، شكلت مجموعة بيانات العملاء في التجارة الإلكترونية تحديات كبيرة بسبب الطبيعة الديناميكية للسلوك عبر الإنترنت، ومع ذلك، أظهر Marketing-AutoM3L مكاسب أداء كبيرة.

تؤكد اختبارات الدلالة الإحصائية أن تحسينات Marketing-AutoM3L مقارنة بالطرق الأساسية قوية (p < 0.05)، مع كون مكاسب أداء الإطار ملحوظة بشكل خاص في مجموعات البيانات ذات أنواع الميزات المتنوعة. الآثار العملية لهذه النتائج كبيرة؛ على سبيل المثال، تعني تحسين بنسبة 5.4% في مجموعة بيانات التجارة الإلكترونية تحديد حوالي 380 عميلًا إضافيًا معرضين للخطر من قاعدة تضم 10,000، مما يمكن أن يؤدي إلى احتفاظ كبير بالإيرادات. علاوة على ذلك، حتى عندما تم تزويد الطرق الأساسية بميزات المجال المحسوبة مسبقًا، حافظ Marketing-AutoM3L على مزايا ذات دلالة إحصائية، مما يشير إلى أن قيمته تمتد إلى ما هو أبعد من هندسة الميزات لتشمل اختيار النموذج الذكي وتحسين المعلمات السياقية. كما أن كفاءة الإطار ملحوظة، حيث يتطلب متوسط 23.4 دقيقة لبناء خطوط الأنابيب، مما يمثل تسريعًا بمعدل 6.7× مقارنة بالطرق اليدوية. بشكل عام، تؤكد هذه النتائج أن الأتمتة المدركة للمجال المصممة لتحليلات العملاء المالية تتفوق بشكل كبير على الأساليب العامة، مما يبرز أهمية دمج المعرفة بمجال التسويق في النمذجة التنبؤية.

نقاش

يسلط قسم النقاش في ورقة البحث الضوء على التقدم في أنظمة التعلم الآلي الآلي (AutoML)، مع التركيز على تطبيقاتها العملية بدلاً من الأسس النظرية. يشير إلى التطور من الأنظمة المبكرة مثل TPOT، التي استخدمت البرمجة الجينية، إلى المنصات الحديثة المستندة إلى السحابة مثل Google Cloud AutoML وAmazon SageMaker Autopilot، التي تقدم قابلية توسيع محسّنة من خلال الحوسبة الموزعة. لا تزال التحديات الرئيسية قائمة، بما في ذلك الشفافية في البحث عن الهيكل العصبي، وقابلية التوسع الحاسوبية لمجموعات البيانات الكبيرة، وتخفيف التحيز. يناقش القسم أيضًا دمج التعلم الميتا وأطر AutoML المتخصصة التي تلبي مجالات معينة، مثل التشخيص الطبي، محققة معدلات دقة ملحوظة.

علاوة على ذلك، تقدم الورقة إطار عمل Marketing-AutoM3L، المصمم لأتمتة بناء خطوط أنابيب التعلم الآلي في تحليلات العملاء. يستخدم هذا الإطار نماذج اللغة الكبيرة (LLMs) لمعالجة بيانات العملاء الخام وتوجيهات اللغة الطبيعية، مما يسهل توليد خطوط الأنابيب التدريبية لمهام التسويق المختلفة. تتكون البنية من خمس مراحل مترابطة، بما في ذلك التعرف على نوع البيانات وهندسة الميزات الخاصة بالمجال، مما يضمن أن يتكيف النظام مع كل من خصائص البيانات والأهداف التجارية. تستند منهجيات الإطار، مثل تحليل RFM وتوقعات قيمة عمر العميل، إلى أدبيات تحليلات التسويق المعتمدة، مما يضمن أن العمليات الآلية تلتقط الأنماط ذات الصلة مع الحفاظ على قابلية التفسير وقابلية التوسع.

Journal: Frontiers in Artificial Intelligence, Volume: 9
DOI: https://doi.org/10.3389/frai.2026.1726900
PMID: https://pubmed.ncbi.nlm.nih.gov/41675582
Publication Date: 2026-01-27
Author(s): Ye Tian et al.
Primary Topic: Customer churn and segmentation

Overview

The research paper presents an automated framework, Marketing-AutoM3L, designed to enhance financial customer analytics by integrating domain-specific feature engineering into machine learning pipelines. Traditional automated machine learning (AutoML) systems often fall short in generating marketing-relevant features, which are crucial for effective customer behavior prediction. The proposed framework autonomously processes raw customer data and natural language instructions, enabling it to perform data modality recognition, feature engineering, model selection, and pipeline assembly. Key marketing indicators such as Recency, Frequency, and Monetary (RFM) metrics, Customer Lifetime Value (CLV), and engagement scores are computed automatically, leading to accuracy improvements of 1.4% to 5.4% over existing methods while significantly reducing development time from 156.9 minutes to 23.4 minutes.

The contributions of this work are threefold: it introduces domain-aware feature engineering components that enhance predictive performance, implements LLM-based automation to expedite pipeline development, and provides a natural language interface that allows non-experts to configure analytics pipelines. Future research directions include incorporating sentiment analysis from customer communications to capture attitudinal signals, employing causal inference techniques for actionable insights, and developing automated model interpretation modules that translate feature importance into business recommendations. These advancements aim to democratize access to sophisticated customer analytics and improve decision-making in marketing strategies.

Introduction

The introduction of the research paper highlights the pressing challenges faced by financial institutions in predicting customer behavior and reducing churn amidst rising competition and customer acquisition costs. It emphasizes the critical role of advanced customer analytics in enhancing retention, optimizing revenue, and facilitating targeted marketing across various sectors, including banking and telecommunications. Despite the significance of these analytics, traditional predictive modeling remains largely manual, requiring extensive domain-specific feature engineering and hyperparameter tuning, which creates bottlenecks and necessitates specialized expertise that many institutions struggle to maintain.

To address these inefficiencies, the paper proposes an automated pipeline construction framework tailored for financial customer analytics, leveraging recent advancements in large language models (LLMs). This framework aims to bridge the gap between business stakeholders and technical systems by automating the generation of executable training pipelines from raw customer datasets and natural language directives. Key features of the framework include modality recognition, domain-aware feature engineering, model selection, and hyperparameter optimization, all guided by business objectives. The framework incorporates established marketing analytics methodologies, such as Recency-Frequency-Monetary (RFM) analysis and customer lifetime value (CLV) modeling, to enhance predictive accuracy while significantly reducing development time. Experimental validation across multiple datasets demonstrates improvements in accuracy of 1.4% to 5.4% over existing approaches, underscoring the framework’s potential to streamline and enhance financial customer analytics.

Methods

In this section, the authors discuss the evolution of customer churn prediction methodologies, highlighting the transition from traditional statistical techniques to advanced machine learning approaches. Initially, logistic regression models were favored for their interpretability and straightforward feature importance analysis. However, ensemble methods such as Random Forest and Gradient Boosting Machines have become more prevalent due to their effectiveness in capturing non-linear relationships among customer attributes. Additionally, deep learning architectures, particularly hybrid models like BiLSTM-CNN, have demonstrated enhanced performance by integrating sequential dependencies and spatial feature extraction. RFM (Recency, Frequency, Monetary) analysis remains a foundational technique in customer segmentation, now augmented by automated feature engineering that enhances customer lifetime value projections and engagement scoring.

The authors also outline their comparative analysis of Marketing-AutoM3L against several leading AutoML frameworks and traditional methodologies. Marketing-AutoM3L serves as a baseline for general-purpose multimodal AutoML without specific marketing customizations. The comparison includes TPOT, which utilizes genetic programming for automated pipeline construction; AutoGluon, which offers tabular prediction with model stacking; and Google AutoML via Vertex AI, known for its neural architecture search capabilities. The Manual ML Pipeline represents conventional data science practices, where feature design and model selection are performed manually. To ensure a fair evaluation, all methods are provided with identical preprocessed datasets, and method-specific optimizations are disabled. Consistent training time limits are applied across all approaches to assess their practical applicability in business contexts.

Results

The results presented in this section highlight the superior performance of the Marketing-AutoM3L framework across various datasets and evaluation metrics. As shown in Table 2, Marketing-AutoM3L consistently outperforms baseline methods, achieving the highest ROC-AUC scores on all five datasets, with improvements ranging from 1.4% to 5.4% over the strongest baseline. Notably, the framework achieved a ROC-AUC of 0.941 on the Bank Customer Churn dataset, attributed to effective domain-specific feature engineering that captures structured customer attributes and behavioral patterns. In contrast, the E-commerce Customer dataset posed significant challenges due to the dynamic nature of online behavior, yet Marketing-AutoM3L still demonstrated substantial performance gains.

Statistical significance testing confirms that the improvements of Marketing-AutoM3L over baseline methods are robust (p < 0.05), with the framework's performance gains particularly pronounced in datasets with diverse feature types. The practical implications of these findings are significant; for instance, a 5.4% improvement in the E-commerce dataset translates to identifying approximately 380 additional at-risk customers in a base of 10,000, which can lead to substantial revenue retention. Furthermore, even when baseline methods were provided with pre-computed domain features, Marketing-AutoM3L maintained statistically significant advantages, indicating that its value extends beyond feature engineering to include intelligent model selection and contextual hyperparameter optimization. The framework's efficiency is also noteworthy, requiring an average of 23.4 minutes for pipeline construction, representing a 6.7× speedup compared to manual methods. Overall, these results affirm that domain-aware automation tailored for financial customer analytics significantly outperforms generic approaches, underscoring the importance of integrating marketing domain knowledge into predictive modeling.

Discussion

The discussion section of the research paper highlights advancements in Automated Machine Learning (AutoML) systems, emphasizing their practical applications rather than theoretical underpinnings. It notes the evolution from early systems like TPOT, which utilized genetic programming, to modern cloud-based platforms such as Google Cloud AutoML and Amazon SageMaker Autopilot, which offer enhanced scalability through distributed computing. Key challenges persist, including transparency in neural architecture search, computational scalability for large datasets, and bias mitigation. The section also discusses the integration of meta-learning and specialized AutoML frameworks that cater to specific domains, such as medical diagnostics, achieving notable accuracy rates.

Furthermore, the paper introduces the Marketing-AutoM3L framework, designed for automating machine learning pipeline construction in customer analytics. This framework employs large language models (LLMs) to process raw customer data and natural language directives, facilitating the generation of training pipelines for various marketing tasks. The architecture comprises five interconnected stages, including data modality recognition and domain-specific feature engineering, ensuring that the system adapts to both data characteristics and business objectives. The framework’s methodologies, such as RFM analysis and customer lifetime value projections, are grounded in established marketing analytics literature, ensuring that the automated processes capture relevant patterns while maintaining interpretability and scalability.