التعلم الآلي في توقع رفاهية الإنسان Machine learning in the prediction of human wellbeing

المجلة: Scientific Reports، المجلد: 15، العدد: 1
DOI: https://doi.org/10.1038/s41598-024-84137-1
PMID: https://pubmed.ncbi.nlm.nih.gov/39794488
تاريخ النشر: 2025-01-10
المؤلف: Ekaterina Oparina وآخرون
الموضوع الرئيسي: الرفاه النفسي ورضا الحياة

نظرة عامة

تناقش هذه القسم تطبيق خوارزميات التعلم الآلي (ML) المعتمدة على الأشجار لتعزيز النمذجة التنبؤية لبيانات الرفاهية الذاتية، والتي تُستخدم بشكل متزايد في العلوم الاجتماعية. تحلل الدراسة أكثر من مليون مستجيب من ألمانيا والمملكة المتحدة والولايات المتحدة، باستخدام بيانات تم جمعها بين عامي 2010 و2018.

يقدم المؤلفون ثلاث مساهمات رئيسية: أولاً، يظهرون أن خوارزميات التعلم الآلي تتفوق على الأساليب التقليدية في النمذجة في توقع درجات الرفاهية، مما يحدد حدًا أعلى على قابلية التنبؤ بهذه الدرجات بناءً على بيانات الاستطلاع. ثانيًا، تحدد الدراسة المحركات المهمة للرفاهية التقييمية، مؤكدة أن المتغيرات التي تم تسليط الضوء عليها في الأدبيات النظرية الحالية ذات صلة أيضًا في تحليلات التعلم الآلي. أخيرًا، تعرض الأبحاث قدرة التعلم الآلي على تقييم الأشكال الوظيفية، كاشفة عن رؤى مثل وجود نقاط الشبع في تأثيرات الدخل والعلاقة على شكل U بين العمر والرفاهية.

الطرق

في هذا القسم، يحدد المؤلفون المنهجيات المستخدمة في أبحاثهم، مع التأكيد على الالتزام بالإرشادات واللوائح ذات الصلة. استخدمت الدراسة بيانات رصد ثانوية، مما ألغى الحاجة إلى موافقة أخلاقية رسمية. تم تصميم الأساليب لضمان نزاهة وموثوقية النتائج مع الامتثال للمعايير الأخلاقية المتعلقة باستخدام البيانات الموجودة.

النتائج

في هذا القسم، يقدم المؤلفون النتائج المستمدة من ثلاثة استطلاعات تمثيلية وطنية: اللوحة الاجتماعية والاقتصادية الألمانية (SOEP)، واستطلاع الأسر الطويل الأمد في المملكة المتحدة (UKHLS)، واستطلاع غالوب اليومي في الولايات المتحدة (Gallup). يحللون البيانات باستخدام مجموعتين من المتغيرات التفسيرية: مجموعة مقيدة، مستندة إلى أبحاث سابقة، تشمل العمر والجنس وحالة الصحة وتركيب الأسرة وعوامل اجتماعية واقتصادية وديموغرافية متنوعة، ومجموعة موسعة تشمل جميع المتغيرات المتاحة في كل استطلاع، باستثناء تلك التي تقيس الرفاهية الذاتية والصحة العقلية. تسهل هذه المقاربة المزدوجة تحليلًا مقارنًا لخوارزميات التعلم الآلي (ML) مقابل الانحدار العادي (OLS) ضمن إطار تقدير تقليدي.

يستخدم المؤلفون أربع طرق تقدير: انحدار OLS، وأقل انكماش مطلق واختيار مشغل (LASSO) لاختيار المتغيرات، واثنين من خوارزميات التعلم الآلي المعتمدة على شجرة الانحدار—الغابات العشوائية (RF) وتعزيز التدرج (GB). بينما تستفيد مجموعات المتغيرات الموسعة من قدرات خوارزميات التعلم الآلي للتعامل مع العديد من المتغيرات التفسيرية، يشير المؤلفون إلى أن الاختلافات في البيانات المتاحة عبر الاستطلاعات قد تحد من قابلية المقارنة المباشرة للنتائج عبر البلدان الثلاثة. يتم تقديم مزيد من التفاصيل حول مصادر البيانات، واختيار المتغيرات، والخوارزميات المستخدمة في قسم الطرق.

المناقشة

في هذا القسم، يناقش المؤلفون أداء خوارزميات التعلم الآلي (ML) في توقع الرفاهية مقارنةً بالانحدار العادي (OLS) التقليدي. يجدون أن طرق التعلم الآلي، وخاصة تعزيز التدرج (GB)، تتفوق باستمرار على OLS عبر مجموعات بيانات متنوعة، مع تحسينات في قيم R-squared تشير إلى أن نماذج التعلم الآلي تلتقط العلاقات غير الخطية في متنبئات الرفاهية بشكل أكثر فعالية. على وجه التحديد، كانت قيم R-squared لخوارزميات التعلم الآلي أعلى من OLS بمقدار 0.024 إلى 0.034، وهو أمر مهم عند مقارنته بالقوة التنبؤية لمعلومات الصحة. يؤكد المؤلفون أنه حتى مع مجموعة موسعة من المتغيرات، لا يزال الحد الأقصى للتباين المفسر في الرفاهية حوالي 0.3، مما يشير إلى أن هناك تباينًا كبيرًا غير مفسر.

كما تكشف التحليلات أن كل من ML وOLS تحدد عوامل رئيسية مشابهة للرفاهية، مثل الصحة والعلاقات الشخصية، مما يؤكد النتائج السابقة في الأدبيات. تشير ارتباطات الرتبة بين أهمية المتغيرات من ML وOLS إلى توافق قوي، على الرغم من وجود بعض الاختلافات عبر البلدان، خاصة فيما يتعلق بالعوامل المالية. علاوة على ذلك، يستكشف المؤلفون العلاقات بين العمر والدخل مع الرفاهية، مؤكدين وجود علاقة على شكل U للعمر واقترحوا وجود نقطة شبع للدخل بعده لا تزيد الرفاهية بشكل كبير. يخلصون إلى أنه بينما تعزز أساليب ML الأداء التنبؤي، فإنها تتطلب أيضًا موارد حسابية أكبر بكثير، مما يبرز التوازن بين الدقة والكفاءة. تشمل اتجاهات البحث المستقبلية تطبيق تقنيات ML في سياقات عالمية متنوعة واستكشاف العلاقات السببية بين متنبئات الرفاهية.

Journal: Scientific Reports, Volume: 15, Issue: 1
DOI: https://doi.org/10.1038/s41598-024-84137-1
PMID: https://pubmed.ncbi.nlm.nih.gov/39794488
Publication Date: 2025-01-10
Author(s): Ekaterina Oparina et al.
Primary Topic: Psychological Well-being and Life Satisfaction

Overview

This section discusses the application of tree-based Machine Learning (ML) algorithms to enhance the predictive modeling of subjective wellbeing data, which is increasingly utilized in social sciences. The study analyzes over one million respondents from Germany, the UK, and the United States, using data collected between 2010 and 2018.

The authors present three key contributions: First, they demonstrate that ML algorithms outperform traditional modeling approaches in predicting wellbeing scores, establishing an upper limit on the predictability of these scores based on survey data. Second, the study identifies significant drivers of evaluative wellbeing, confirming that variables highlighted in existing theoretical literature are also relevant in ML analyses. Lastly, the research showcases the capability of ML to assess functional forms, revealing insights such as the presence of satiation points in income effects and the U-shaped relationship between age and wellbeing.

Methods

In this section, the authors outline the methodologies employed in their research, emphasizing adherence to relevant guidelines and regulations. The study utilized secondary observational data, which negated the need for formal ethical approval. The methods are designed to ensure the integrity and reliability of the findings while complying with ethical standards pertinent to the use of existing data.

Results

In this section, the authors present the results derived from three nationally-representative surveys: the German Socio-Economic Panel (SOEP), the UK Longitudinal Household Survey (UKHLS), and the US Gallup Daily Poll (Gallup). They analyze the data using two sets of explanatory variables: a restricted set, informed by prior research, which includes age, sex, health status, household composition, and various socioeconomic and demographic factors, and an extended set that encompasses all available variables in each survey, excluding those measuring subjective wellbeing and mental health. This dual approach facilitates a comparative analysis of machine learning (ML) algorithms against Ordinary Least Squares (OLS) regression within a conventional estimation framework.

The authors employ four estimation methods: OLS regression, Least Absolute Shrinkage and Selection Operator (LASSO) for variable selection, and two regression tree-based ML algorithms—Random Forests (RF) and Gradient Boosting (GB). While the extended variable sets leverage the capabilities of ML algorithms to handle numerous explanatory variables, the authors note that variations in available data across surveys may limit direct comparability of findings across the three countries. Further details regarding the data sources, variable selection, and algorithms utilized are provided in the Methods Section.

Discussion

In this section, the authors discuss the performance of machine learning (ML) algorithms in predicting wellbeing compared to traditional Ordinary Least Squares (OLS) regression. They find that ML methods, particularly gradient boosting (GB), consistently outperform OLS across various datasets, with improvements in R-squared values indicating that ML models capture non-linear relationships in wellbeing predictors more effectively. Specifically, the R-squared values for ML algorithms were higher than OLS by 0.024 to 0.034, which is significant when compared to the predictive power of health information. The authors emphasize that even with an expanded set of variables, the maximum explained variance in wellbeing remains around 0.3, suggesting that substantial unexplained variance persists.

The analysis also reveals that both ML and OLS identify similar key determinants of wellbeing, such as health and interpersonal relationships, validating previous findings in the literature. The rank correlations between variable importance from ML and OLS indicate a strong agreement, although some differences exist across countries, particularly regarding financial factors. Furthermore, the authors explore the relationships between age and income with wellbeing, confirming a U-shaped relationship for age and suggesting a satiation point for income beyond which wellbeing does not significantly increase. They conclude that while ML approaches enhance predictive performance, they also require significantly more computational resources, highlighting a trade-off between accuracy and efficiency. Future research directions include applying ML techniques in diverse global contexts and exploring causal relationships among wellbeing predictors.