الانحدار اللوجستي: القيود في تقدير مقاييس الارتباط مع نتائج الصحة الثنائية Logistic Regression: Limitations in the Estimation of Measures of Association with Binary Health Outcomes

المجلة: Acta Médica Portuguesa، المجلد: 37، العدد: 10
DOI: https://doi.org/10.20344/amp.21435
PMID: https://pubmed.ncbi.nlm.nih.gov/39366365
تاريخ النشر: 2024-10-01
المؤلف: Lara Pinheiro-Guedes وآخرون
الموضوع الرئيسي: تقنيات الاستدلال السببي المتقدمة

نظرة عامة

تدرس هذه الدراسة ملاءمة نماذج الانحدار اللوجستي، والنماذج اللوغاريتمية الثنائية، ونماذج بواسون القوية لتقدير الارتباطات في الدراسات المقطعية ذات النتائج الثنائية المتكررة (انتشار > 10%). تتكون الأبحاث من دراستين: مسح وطني حول تأثير تلوث الهواء على الصحة النفسية ودراسة محلية حول وصول المهاجرين إلى خدمات الطوارئ. تكشف النتائج أن نسب الأرجحية (OR) المستمدة من الانحدار اللوجستي غالبًا ما تفرط في تقدير نسب الانتشار (PR) من النماذج اللوغاريتمية الثنائية ونماذج بواسون القوية، خاصة مع زيادة انتشار النتيجة. في الدراسة الأولى، كانت OR 1.015 (95% CI: 0.970 – 1.063) بينما أعطت نموذج بواسون القوي PR قدره 1.012 (95% CI: 0.979 – 1.045). أظهرت الدراسة الثانية OR قدرها 1.584 (95% CI: 1.026 – 2.446) وPRs قدرها 1.217 (95% CI: 0.978 – 1.515) للنموذج اللوغاريتمي الثنائي و1.130 (95% CI: 1.013 – 1.261) لنماذج بواسون القوية.

تخلص الدراسة إلى أنه بينما يمكن أن تشير كل من OR وPR إلى اتجاه الارتباط، فإن الاعتماد على OR في النتائج غير النادرة قد يؤدي إلى تفسيرات خاطئة. تظهر نماذج بواسون القوية كبديل قابل للتطبيق للانحدار اللوجستي، خاصة في تجنب مشاكل التقارب المرتبطة بالنماذج اللوغاريتمية الثنائية. يجب أن يأخذ اختيار النموذج الإحصائي الأكثر ملاءمة في الاعتبار عوامل متنوعة، بما في ذلك طبيعة متغير النتيجة، وأهداف الدراسة، والافتراضات الأساسية للاستدلال السببي. يدعو المؤلفون إلى نهج شامل لاختيار النماذج، مقترحين أن تدعم مقاييس إحصائية متعددة، مثل الأخطاء المعيارية، وفترات الثقة، ومعايير معلومات أكايكي (AIC)، هذه العملية. يتم تشجيع الأبحاث المستقبلية لاستكشاف نماذج بديلة وتأسيس إرشادات موحدة لتقدير الارتباطات في الدراسات ذات النتائج الثنائية المتكررة.

مقدمة

في مجال البحث الطبي، تُستخدم نماذج الانحدار اللوجستي بشكل شائع لتقييم الارتباط بين التعرضات أو العلاجات والنتائج الثنائية، بشكل أساسي في الدراسات المقطعية ودراسات الحالة والشاهد. تُقدّر هذه النماذج نسب الأرجحية (OR)، التي تعكس احتمال حدوث المرض بين الأفراد المعرضين مقابل غير المعرضين. بينما يُفضل الانحدار اللوجستي لسهولته، إلا أن له قيودًا، خاصة عندما تكون النتيجة متكررة (أكثر من 10%)، حيث يمكن أن يؤدي ذلك إلى تقديرات متحيزة لنسب الانتشار (PR) أو نسب المخاطر (RR). تشير الدراسات إلى أن نسبة كبيرة من الأبحاث التي تستخدم الانحدار اللوجستي تُبلغ عن OR التي تنحرف بشكل كبير عن RR المقابلة، غالبًا بسبب سوء تفسير الأرجحية كمخاطر.

لمعالجة هذه التحديات، تم اقتراح طرق إحصائية بديلة مثل نماذج الانحدار اللوغاريتمي الثنائي ونماذج الانحدار بواسون المعدلة لتقدير PR/RR المعدلة الصحيحة في سياق النتائج الثنائية المتكررة. أظهر نموذج الانحدار بواسون المعدل، الذي يستخدم أخطاء معيارية قوية لتصحيح تشتت البيانات، وعدًا في تقديم تقديرات دقيقة حتى عندما لا تتوافق البيانات تمامًا مع افتراضات النموذج. نظرًا للتداعيات الحرجة للاختيارات المنهجية على الممارسة السريرية والصحة العامة، تهدف هذه الدراسة إلى تقييم دقة التقدير وجودة الملاءمة لنماذج الانحدار اللوجستي، والنماذج اللوغاريتمية الثنائية، ونماذج بواسون القوية بشكل خاص في الدراسات المقطعية ذات النتائج الثنائية المتكررة.

الطرق

في هذا القسم، يصف المؤلفون المنهجيات المستخدمة في دراستين مقطعتين تهدفان إلى تقدير الارتباطات بين تعرضين متميزين ونتيجتين ثنائيتين مع انتشار يتجاوز 10%. أخذ التحليل في الاعتبار العوامل المربكة المحتملة التي تم تحديدها من خلال الرسوم البيانية الدائرية الموجهة، التي تم تطويرها بناءً على مراجعة شاملة للأدبيات. استخدم الباحثون نماذج الانحدار اللوجستي لحساب نسب الأرجحية (OR) ونماذج الانحدار اللوغاريتمي الثنائي أو نماذج بواسون القوية لاشتقاق نسب الانتشار (PR)، مع مقارنة حجم وأهمية هذه التقديرات، بالإضافة إلى فترات الثقة، والمدى، والأخطاء المعيارية.

تم تطبيق طريقة تقدير الاحتمالية القصوى عبر جميع النماذج، بما في ذلك تلك التي حللت بيانات المسح المعقدة في الدراسة 1، حيث تم دمج أوزان العينة في التحليل. بالنسبة للدراسة 2، التي تضمنت بيانات غير معقدة، تم استخدام معيار معلومات أكايكي (AIC) لاختيار النموذج، حيث تم تحديد النموذج الأمثل على أنه الذي لديه أقل قيمة AIC. تم اعتبار النماذج التي كانت قيم AIC الخاصة بها ضمن وحدتين من الحد الأدنى أيضًا مرشحين قابلين للتطبيق. تم إجراء جميع التحليلات الإحصائية باستخدام Stata® الإصدار 15، مع تحديد عتبة الدلالة عند 5%.

النتائج

يقدم قسم “النتائج” من ورقة البحث النتائج المستمدة من التجارب أو التحليلات التي تم إجراؤها. يوضح نتائج الدراسة، مع تسليط الضوء على النقاط البيانية الرئيسية والاتجاهات الملحوظة في النتائج. قد يتضمن القسم تحليلات إحصائية، ومقارنات بين مجموعات أو ظروف مختلفة، وأي ارتباطات أو أنماط مهمة تم تحديدها.

عادةً ما تكون النتائج مدعومة بمساعدات بصرية مثل الرسوم البيانية أو الجداول، التي توضح البيانات بوضوح. بالإضافة إلى ذلك، غالبًا ما يتم وضع النتائج في سياق إطار عمل فرضيات الدراسة، مما يشير إلى ما إذا كانت التوقعات الأولية قد تم تأكيدها أو دحضها. بشكل عام، يخدم هذا القسم لتقديم نظرة شاملة على الأدلة التجريبية التي تم جمعها، مما يمهد الطريق للنقاشات والتفسيرات اللاحقة في الورقة.

المناقشة

تقيّم قسم المناقشة من ورقة البحث هذه دراستين تحققان في الارتباطات بين العوامل البيئية ونتائج الصحة، مع التركيز بشكل خاص على تأثير التعرض الطويل الأمد للجسيمات الدقيقة (PM10) على الاضطرابات النفسية الشائعة (CMD) والعلاقة بين حالة الهجرة واستخدام الرعاية العاجلة في السكان الأطفال. وجدت الدراسة 1، التي حللت عينة تمثيلية وطنية من 2398 فردًا، عدم وجود ارتباط ذو دلالة إحصائية بين التعرض لـ PM10 وتكرار تشخيصات CMD المحتملة، على الرغم من استخدام نماذج انحدار متنوعة. سلطت الدراسة الضوء على التحديات المرتبطة باستخدام الانحدار اللوجستي للنتائج المتكررة، حيث يميل إلى المبالغة في تقدير نسب الأرجحية (OR) مقارنة بنسب الانتشار (PR) المستمدة من نماذج بواسون القوية.

على النقيض من ذلك، كشفت الدراسة 2، التي ركزت على عينة محلية من 410 أطفال، أن 48% استخدموا خدمات الرعاية العاجلة، مع نسبة أعلى بين الأطفال المهاجرين. أشار التحليل إلى وجود ارتباطات مهمة بين حالة الهجرة واستخدام الرعاية العاجلة، على الرغم من أن النموذج اللوغاريتمي الثنائي فشل في التقارب، مما يبرز قيود بعض الأساليب الإحصائية. تؤكد المناقشة على الحاجة إلى اختيار نموذج دقيق في الدراسات الوبائية، داعيةً إلى نموذج بواسون القوي كبديل مفضل للانحدار اللوجستي عند تقدير مقاييس الارتباط للنتائج الثنائية المتكررة. يدعو المؤلفون إلى اتباع نهج موحد لاختيار النماذج، مع الأخذ في الاعتبار نوع متغير النتيجة وأهداف الدراسة، لتعزيز دقة وموثوقية نتائج الأبحاث الوبائية.

القيود

تقدم هذه الدراسة نقاط قوة ملحوظة، بما في ذلك دورها في إحياء النقاش حول صلاحية نسبة الأرجحية (OR) كمقياس لاستنتاج المخاطر النسبية (RR) ونسبة الانتشار (PR) في النتائج الثنائية المتكررة ضمن الدراسات المقطعية. من خلال استخدام مجموعات بيانات حقيقية عبر سياقات متنوعة – مثل تأثيرات الصحة لتلوث الهواء والوصول إلى خدمات الرعاية الصحية – تسهل الأبحاث فحصًا شاملاً ذا صلة بمجالات علمية متنوعة. علاوة على ذلك، التزمت الدراسات بالأطر المنهجية المعتمدة، مما ساعد في تقليل التحيزات والعوامل المربكة، وبالتالي تعزيز المساهمة في الأدبيات الحالية.

ومع ذلك، يجب الاعتراف بعدة قيود. من الجدير بالذكر أن الدراسة لم تقم بإجراء محاكاة لتحديد المخاطر النسبية الدقيقة، مما كان يمكن أن يوفر تقديرات أكثر دقة. بالإضافة إلى ذلك، أظهرت الارتباطات التي تم فحصها أحجام تأثير محدودة، مما أدى إلى تقديرات كانت أقرب إلى الواحد وأكثر تشابهًا عبر النماذج بسبب تقليل التباين. كانت المقارنة بين نماذج الانحدار اللوجستي مع نماذج الانحدار اللوغاريتمي الثنائي ونماذج بواسون القوية انتقائية، حيث ركزت على ملاءمتها وسهولة استخدامها بدلاً من تضمين جميع النماذج المحتملة. يجب أخذ الافتراضات والقيود الجوهرية لكل نموذج في الاعتبار عند تفسير النتائج.

Journal: Acta Médica Portuguesa, Volume: 37, Issue: 10
DOI: https://doi.org/10.20344/amp.21435
PMID: https://pubmed.ncbi.nlm.nih.gov/39366365
Publication Date: 2024-10-01
Author(s): Lara Pinheiro-Guedes et al.
Primary Topic: Advanced Causal Inference Techniques

Overview

This study investigates the appropriateness of logistic regression, log-binomial, and robust Poisson models for estimating associations in cross-sectional studies with frequent binary outcomes (prevalence > 10%). The research comprises two studies: a national survey on air pollution’s impact on mental health and a local study on immigrant access to emergency services. The findings reveal that odds ratios (OR) derived from logistic regression often overestimate prevalence ratios (PR) from log-binomial and robust Poisson models, particularly as the outcome’s prevalence increases. In the first study, the OR was 1.015 (95% CI: 0.970 – 1.063) while the robust Poisson model yielded a PR of 1.012 (95% CI: 0.979 – 1.045). The second study showed an OR of 1.584 (95% CI: 1.026 – 2.446) and PRs of 1.217 (95% CI: 0.978 – 1.515) for log-binomial and 1.130 (95% CI: 1.013 – 1.261) for robust Poisson models.

The study concludes that while both OR and PR can indicate the direction of association, reliance on OR in non-rare outcomes may lead to misinterpretations. Robust Poisson models emerge as a viable alternative to logistic regression, particularly in avoiding convergence issues associated with log-binomial models. The selection of the most suitable statistical model should consider various factors, including the nature of the outcome variable, the study’s objectives, and the assumptions underlying causal inference. The authors advocate for a comprehensive approach to model selection, suggesting that multiple statistical measures, such as standard errors, confidence intervals, and Akaike Information Criteria (AIC), should inform this process. Future research is encouraged to explore alternative models and establish standardized guidelines for estimating associations in studies with frequent binary outcomes.

Introduction

In the realm of medical research, logistic regression models are commonly utilized to assess the association between exposures or treatments and binary outcomes, primarily in cross-sectional and case-control studies. These models estimate odds ratios (OR), which reflect the likelihood of disease occurrence among exposed versus unexposed individuals. While logistic regression is favored for its simplicity, it has limitations, particularly when the outcome is frequent (greater than 10%), as it can lead to biased estimates of prevalence ratios (PR) or risk ratios (RR). Studies indicate that a significant proportion of research employing logistic regression report OR that deviate substantially from corresponding RR, often due to misinterpretation of odds as risk.

To address these challenges, alternative statistical methods such as log-binomial and modified Poisson regression models have been proposed for estimating valid adjusted PR/RR in the context of frequent binary outcomes. The modified Poisson regression, which utilizes robust standard errors to correct for data overdispersion, has shown promise in providing accurate estimates even when the data do not perfectly meet model assumptions. Given the critical implications of methodological choices on clinical practice and public health, this study aims to evaluate the estimation accuracy and goodness-of-fit of logistic, log-binomial, and robust Poisson regression models specifically in cross-sectional studies with frequent binary outcomes.

Methods

In this section, the authors describe the methodologies employed in two cross-sectional studies aimed at estimating the associations between two distinct exposures and two binary outcomes with a prevalence exceeding 10%. The analysis accounted for potential confounders identified through directed acyclic graphs, which were developed based on a comprehensive literature review. The researchers utilized logistic regression models to calculate odds ratios (OR) and log-binomial or robust Poisson regression models to derive prevalence ratios (PR), comparing the magnitude and significance of these estimates, as well as their confidence intervals, ranges, and standard errors.

The maximum likelihood estimation method was applied across all models, including those that analyzed complex survey data in Study 1, where sampling weights were incorporated into the analysis. For Study 2, which involved non-complex data, the Akaike Information Criterion (AIC) was utilized for model selection, with the optimal model being identified as the one with the lowest AIC value. Models with AIC values within two units of the minimum were also considered viable candidates. All statistical analyses were performed using Stata® version 15, with a significance threshold set at 5%.

Results

The “Results” section of the research paper presents the findings derived from the conducted experiments or analyses. It details the outcomes of the study, highlighting key data points and trends observed in the results. The section may include statistical analyses, comparisons between different groups or conditions, and any significant correlations or patterns identified.

The findings are typically supported by visual aids such as graphs or tables, which illustrate the data clearly. Additionally, the results are often contextualized within the framework of the study’s hypotheses, indicating whether the initial predictions were confirmed or refuted. Overall, this section serves to provide a comprehensive overview of the empirical evidence gathered, setting the stage for subsequent discussions and interpretations in the paper.

Discussion

The discussion section of this research paper evaluates two studies that investigate the associations between environmental factors and health outcomes, specifically focusing on the impact of long-term exposure to particulate matter (PM10) on common mental disorders (CMD) and the relationship between immigration status and urgent care use in a pediatric population. Study 1, which analyzed a nationally representative sample of 2398 individuals, found no statistically significant association between PM10 exposure and the frequency of probable CMD diagnoses, despite employing various regression models. The study highlighted the challenges of using logistic regression for frequent outcomes, as it tends to overestimate odds ratios (OR) compared to prevalence ratios (PR) derived from robust Poisson models.

In contrast, Study 2, which focused on a local sample of 410 children, revealed that 48% utilized urgent care services, with a higher percentage among immigrant children. The analysis indicated significant associations between immigration status and urgent care use, although the log-binomial model failed to converge, underscoring the limitations of certain statistical approaches. The discussion emphasizes the need for careful model selection in epidemiological studies, advocating for the robust Poisson model as a preferable alternative to logistic regression when estimating measures of association for frequent binary outcomes. The authors call for a standardized approach to model selection, considering the type of outcome variable and the study’s objectives, to enhance the accuracy and reliability of epidemiological research findings.

Limitations

This study presents notable strengths, including its role in revitalizing the debate on the validity of the odds ratio (OR) as a measure for inferring relative risk (RR) and prevalence ratio (PR) in frequent binary outcomes within cross-sectional studies. By utilizing real-life datasets across diverse contexts—such as the health impacts of air pollution and access to healthcare services—the research facilitates a comprehensive examination relevant to various scientific disciplines. Furthermore, the studies adhered to established methodological frameworks, which helped mitigate biases and confounding factors, thereby enhancing the contribution to existing literature.

However, several limitations must be acknowledged. Notably, the study did not conduct a simulation to determine the exact relative risk, which could have provided more precise estimates. Additionally, the associations examined exhibited limited effect magnitudes, resulting in estimates that were closer to one and more similar across models due to reduced variability. The comparison of logistic regression models with log-binomial and robust Poisson regression models was selective, focusing on their appropriateness and user-friendliness rather than encompassing all potential models. Each model’s inherent assumptions and limitations should be considered when interpreting the findings.