حول التوصيف الخاطئ المستمر لاختبارات A/B من Google وFacebook: كيفية إجراء وإبلاغ دراسات المنصات عبر الإنترنت On the persistent mischaracterization of Google and Facebook A/B tests: How to conduct and report online platform studies

المجلة: International Journal of Research in Marketing، المجلد: 42، العدد: 3
DOI: https://doi.org/10.1016/j.ijresmar.2024.12.004
تاريخ النشر: 2025-01-02
المؤلف: Johannes Boegershausen وآخرون
الموضوع الرئيسي: وسائل التواصل الاجتماعي في التعليم الصحي

نظرة عامة

تقدم المخطوطة مراجعة نقدية لـ 133 دراسة منشورة حول المنصات الإلكترونية في أبحاث التسويق، مع تسليط الضوء على الاعتماد على أدوات اختبار A/B من منصات مثل فيسبوك وإعلانات جوجل. بينما تقدم هذه الدراسات رؤى حول سلوك المستهلك الحقيقي، مثل نقرات الإعلانات، إلا أنها تعاني من نقص في التوزيع العشوائي الحقيقي، مما يضعف الاستدلال السببي. يشير المؤلفون إلى أن العديد من الدراسات تُعرض بشكل غير دقيق كتجارب عشوائية، مما يؤدي إلى ادعاءات خاطئة حول السببية. تكشف المراجعة عن نقص مستمر في الوعي بشأن العوامل المربكة الكامنة في هذه الدراسات، وخاصة التفاعل بين إبداعات الإعلانات وخوارزميات استهداف المنصات.

على الرغم من صلاحيتها البيئية، تظهر دراسات المنصات الإلكترونية انخفاضًا في الصلاحية الداخلية بسبب عدم تسليم العلاج بشكل عشوائي، مما يجعلها غير مناسبة لاختبار النظريات. يدعو المؤلفون إلى أن تكمل هذه الدراسات، بدلاً من استبدال، التجارب التقليدية المعتمدة على المختبر، حيث تعمل كـ “إثبات للمفهوم” للتدخلات التي تم تطويرها في بيئات خاضعة للرقابة. يؤكدون على أهمية وضع هذه الدراسات بدقة والإبلاغ عنها بشفافية، خاصة فيما يتعلق بظواهر التسليم المتباينة، مما يتطلب مزيدًا من التحقيق. كما يدعو المؤلفون منصات الإعلان إلى تحسين التواصل بشأن الطبيعة غير التجريبية لأدواتهم والنظر في خيارات للتوزيع العشوائي الحقيقي للمستخدمين في جهود البحث المستقبلية.

مقدمة

تسلط المقدمة الضوء على التحول الكبير في سلوك المستهلك نحو المنصات الرقمية، حيث يقضي الأفراد ما يقرب من 7 ساعات يوميًا في أنشطة تعتمد على الشاشة. أدى هذا الاتجاه إلى ظهور التجارب المعتمدة على الاستطلاعات عبر الإنترنت ودراسات المنصات الإلكترونية في أبحاث التسويق، والتي تستخدم البرمجيات السحابية ومنصات جمع البيانات. تهدف هذه الدراسات، وخاصة تلك التي تستخدم اختبار A/B على منصات مثل فيسبوك وإعلانات جوجل، إلى تقييم فعالية الإعلانات المختلفة من خلال مقارنة مقاييس مثل معدلات النقر (CTR). ومع ذلك، يحذر المؤلفون من أن هذه الدراسات غالبًا ما تفتقر إلى العشوائية الحقيقية بسبب خوارزميات الاستهداف بعد العشوائية، مما يؤدي إلى تسليم منحرف ويثير تساؤلات حول صلاحيتها الداخلية.

تهدف المخطوطة إلى مراجعة نقدية لاستخدام وسوء استخدام دراسات المنصات الإلكترونية في البحث الأكاديمي، مع معالجة قيودها المنهجية وتقديم إرشادات للباحثين. توضح هيكل الورقة، الذي يتضمن تحليلًا لـ 133 دراسة منشورة لتقييم كيفية الاعتراف بمشكلات الصلاحية الداخلية، ودراسة حالة توضح تأثير التسليم المتباين على الاستدلال السببي، وتوصيات للإبلاغ الأخلاقي والاستخدام المناسب لهذه الدراسات. يؤكد المؤلفون على الحاجة إلى اعتبار دقيق لتصميم الدراسة وآثار استخدام دراسات المنصات الإلكترونية في أبحاث التسويق.

الطرق

توضح قسم المنهجية في الدراسة نهجًا منهجيًا للتحقيق في تفضيلات المستخدمين للخدمات المقدمة من الروبوتات مقابل البشر في سياقات محرجة، تحديدًا من خلال إعلانات فيسبوك لنصائح فقدان الوزن. أجرى المؤلفون ثلاثة اختبارات A/B متميزة على مدى ثلاثة أيام، مع تحسين تسليم الإعلانات لمقاييس مختلفة: “نقرات الروابط”، “الانطباعات”، و”الوصول” مع حد تكراري. كان الهدف من كل اختبار هو تقييم كيفية تأثير استراتيجيات التحسين هذه على تفاعل المستخدمين، مع تخصيص ميزانية إجمالية قدرها 200 دولار أمريكي لكل اختبار. يتم تلخيص نتائج هذه الاختبارات في الجدول 1، مع تسليط الضوء على فعالية إعلانات الروبوتات مقابل البشر في توليد معدلات النقر.

بالإضافة إلى تصميم التجربة، استخدم المؤلفون منهجية بحث متعددة المراحل لتجميع قاعدة بيانات من الدراسات التي تستخدم أدوات اختبار A/B على منصات مثل ميتا وجوجل. بدأت هذه العملية بقائمة أولية من 344 مقالة، تم اختيار 26 دراسة للإدراج بناءً على معايير محددة. شمل تعزيز قاعدة البيانات مزيدًا من البحث المستهدف عبر المنصات الأكاديمية وتتبع الاقتباسات، مما أسفر في النهاية عن مجموعة نهائية من 133 دراسة. تم تفصيل معايير الاستبعاد للدراسات التي لا تفي بالمعايير المحددة، مما يضمن تحليلًا مركزًا على دراسات المنصات الإلكترونية ذات الصلة، مع تلخيص الخصائص الأساسية في الجدولين 2 و3.

المناقشة

تسلط قسم المناقشة في الورقة البحثية الضوء على مشكلات كبيرة في الصلاحية الداخلية في دراسات المنصات الإلكترونية، خاصة فيما يتعلق بعملية العشوائية وظاهرة “التسليم المتباين”. غالبًا ما يتنازل الباحثون عن السيطرة على العشوائية للمنصات الرقمية، التي تستخدم خوارزميات التعلم الآلي لتحسين تسليم الإعلانات بناءً على خصائص المستخدمين. يؤدي ذلك إلى تعرض مجموعات مستخدمين مختلفة لإعلانات متميزة، مما يؤدي إلى متغيرات مربكة تضعف الاستدلال السببي. على سبيل المثال، قد يتأثر أداء الإعلان ليس فقط بمحتواه ولكن أيضًا بالخصائص الديموغرافية للمستخدمين الذين يصل إليهم، مما يثير القلق بشأن صلاحية الاستنتاجات المستخلصة من مثل هذه الدراسات.

تجادل الورقة ضد تصنيف دراسات المنصات الإلكترونية كتجارب حقلية حقيقية، حيث تفتقر إلى السيطرة اللازمة على العشوائية. بدلاً من ذلك، تظهر خصائص تشبه أكثر طرق جمع البيانات العضوية، مثل جمع البيانات من الويب. كما يميز المؤلفون بين دراسات المنصات الإلكترونية واختبارات الرفع، مشيرين إلى أنه بينما تسمح اختبارات الرفع بالاستدلال السببي من خلال مقارنة المستخدمين المعرضين وغير المعرضين، فإنها لا تسهل المقارنات بين إعلانات مختلفة. توضح دراسة الحالة المقدمة في الورقة كيف يؤثر التسليم المتباين على معدلات النقر (CTRs) وتؤكد على الحاجة إلى الشفافية بشأن طرق التحسين وتفاعلات المستخدمين. في النهاية، يدعو المؤلفون إلى إعادة تقييم كيفية إجراء دراسات المنصات الإلكترونية والإبلاغ عنها، داعين إلى اعتراف أوضح بحدودها وتعقيدات قياس فعالية الإعلانات.

Journal: International Journal of Research in Marketing, Volume: 42, Issue: 3
DOI: https://doi.org/10.1016/j.ijresmar.2024.12.004
Publication Date: 2025-01-02
Author(s): Johannes Boegershausen et al.
Primary Topic: Social Media in Health Education

Overview

The manuscript provides a critical review of 133 published online platform studies in marketing research, highlighting the reliance on A/B testing tools from platforms like Facebook and Google Ads. While these studies offer insights into real consumer behavior, such as ad clicks, they suffer from a lack of true random assignment, which undermines causal inference. The authors note that many studies are inaccurately presented as randomized experiments, leading to erroneous claims of causality. The review reveals a persistent lack of awareness regarding the confounding factors inherent in these studies, particularly the interplay between ad creatives and platform targeting algorithms.

Despite their ecological validity, online platform studies exhibit low internal validity due to non-random treatment delivery, making them unsuitable for theory-testing. The authors advocate for these studies to complement, rather than replace, traditional lab-based experiments, serving as a “proof of concept” for interventions developed in controlled settings. They emphasize the importance of accurately positioning and transparently reporting these studies, particularly in relation to divergent delivery phenomena, which warrants further investigation. The authors also call for advertising platforms to improve communication about the non-experimental nature of their tools and to consider options for truly random user assignment in future research endeavors.

Introduction

The introduction highlights the significant shift in consumer behavior towards digital platforms, with individuals spending nearly 7 hours daily on screen-based activities. This trend has led to the rise of online survey-based experiments and online platform studies in marketing research, which utilize cloud-based software and crowdsourcing platforms. These studies, particularly those employing A/B testing on platforms like Facebook and Google Ads, aim to assess the effectiveness of various advertisements by comparing metrics such as click-through rates (CTR). However, the authors caution that these studies often lack true randomization due to post-randomization targeting algorithms, leading to skewed delivery and questioning their internal validity.

The manuscript aims to critically review the use and misuse of online platform studies in academic research, addressing their methodological limitations and providing guidance for researchers. It outlines the structure of the paper, which includes an analysis of 133 published studies to evaluate how internal validity issues have been acknowledged, a case study demonstrating the impact of divergent delivery on causal inference, and recommendations for ethical reporting and appropriate usage of these studies. The authors emphasize the need for careful consideration of study design and the implications of using online platform studies in marketing research.

Methods

The methodology section of the study outlines a systematic approach to investigate user preferences for services provided by robots versus humans in embarrassing contexts, specifically through Facebook advertisements for weight loss advice. The authors conducted three distinct A/B tests over a three-day period, optimizing ad delivery for different metrics: “link clicks,” “impressions,” and “reach” with a frequency cap. Each test aimed to assess how these optimization strategies influenced user engagement, with a total budget of USD 200 allocated per test. The results of these tests are summarized in Table 1, highlighting the effectiveness of robot versus human ads in generating clickthrough rates.

In addition to the experimental design, the authors employed a multi-stage search methodology to compile a database of studies utilizing A/B testing tools on platforms like Meta and Google. This process began with an initial list of 344 articles, from which 26 studies were selected for inclusion based on specific criteria. Further enrichment of the database involved targeted searches across academic platforms and citation tracking, ultimately yielding a final collection of 133 studies. The exclusion criteria for studies not meeting the defined parameters are detailed, ensuring a focused analysis on relevant online platform studies, with core characteristics summarized in Tables 2 and 3.

Discussion

The discussion section of the research paper highlights significant internal validity issues in online platform studies, particularly concerning the randomization process and the phenomenon of “divergent delivery.” Researchers often cede control over randomization to digital platforms, which utilize machine-learning algorithms to optimize ad delivery based on user characteristics. This results in different user groups being exposed to distinct ads, leading to confounding variables that compromise causal inference. For instance, an ad’s performance may be influenced not solely by its content but also by the demographic characteristics of the users it reaches, raising concerns about the validity of conclusions drawn from such studies.

The paper argues against categorizing online platform studies as true field experiments, as they lack the necessary control over randomization. Instead, they exhibit characteristics more akin to organic data collection methods, such as web scraping. The authors also differentiate between online platform studies and lift tests, noting that while lift tests allow for causal inference by comparing exposed versus unexposed users, they do not facilitate comparisons between different ads. The case study presented in the paper further illustrates how divergent delivery affects clickthrough rates (CTRs) and emphasizes the need for transparency regarding optimization methods and user reactions. Ultimately, the authors call for a reevaluation of how online platform studies are conducted and reported, advocating for clearer acknowledgment of their limitations and the complexities of measuring ad effectiveness.