الشيطان يكمن في الذيل: النسب المئوية لاختبار مانشستر لتقييم جودة الحياة (مانسا) ودرجات T المعيارية The Devil Is in the Tail: Manchester Short Assessment of Quality of Life (MANSA) test percentiles and normalized T-scores

المجلة: Journal of Psychopathology and Behavioral Assessment، المجلد: 48، العدد: 1
DOI: https://doi.org/10.1007/s10862-025-10266-0
PMID: https://pubmed.ncbi.nlm.nih.gov/41613632
تاريخ النشر: 2026-01-28
المؤلف: Edwin de Beurs وآخرون
الموضوع الرئيسي: أنظمة الصحة، التقييمات الاقتصادية، جودة الحياة

نظرة عامة

تناقش هذه الفقرة أهمية استخدام المقاييس الشائعة، وبشكل خاص درجات T ودرجات النسبة المئوية (PR)، في الرعاية المعتمدة على القياس ومراقبة النتائج الروتينية لتعزيز تفسير استبيانات التقرير الذاتي في العلاج. قامت الدراسة بتحليل بيانات من تقييم مانشستر القصير (MANSA) عبر عينتين معياريتين – واحدة من السكان العامين الهولنديين (N = 11,789) وأخرى من عينة سريرية من المرضى الذين يعانون من اضطراب تعاطي المواد (N = 9,987). وُجد أن درجات T المستمدة من تحويلات خطية بسيطة كانت متحيزة في الطرف الأدنى من المقياس، بينما قدمت درجات T المعيارية التي تم الحصول عليها من خلال تطبيع Rankit أو طرق نظرية استجابة العناصر (IRT) تمثيلات أكثر دقة. كما أن الدراسة وضعت قيم قطع لمؤشرات التغيير الموثوقة (RCI) والتغيير السريري المهم لكل من الدرجات الخام ودرجات T.

تؤكد النتائج على أن استخدام درجات T المعيارية ودرجات PR يمكن أن يسهل فهم المرضى لنتائج الاختبارات، مما يجسر الفجوة بين الخبرة السريرية وتجربة المريض. تساعد هذه الطريقة في تخصيص التدخلات العلاجية وتقييم متى يتم تحقيق أهداف العلاج. تقترح الدراسة أن يتم تفضيل درجات T ودرجات PR غير المعدلة، حيث أن الدرجات المعدلة حسب العمر تتوسط حوالي 50 لكلا الفئتين العمريتين. علاوة على ذلك، تسلط الضوء على الحاجة إلى تقييمات موحدة في العلاج، على الرغم من استخدامها المحدود، وتدعو إلى اعتماد مقاييس قياس معيارية لتحسين موثوقية الأحكام السريرية وتعزيز ممارسات الرعاية المعتمدة على الأدلة.

مقدمة

ت outlines المقدمة أهمية المقاييس الشائعة في الرعاية الصحية والطب الوقائي، مع التأكيد على دورها في تحسين نتائج الصحة وجودة الحياة (Kaplan & Hays, 2022). تسلط الضوء على ضرورة وجود درجات معيارية، مثل درجات IQ ودرجات T، لتفسير نتائج الاختبارات بفعالية عبر سياقات متنوعة، وهو أمر ذو صلة خاصة في البحث التعليمي. ومع ذلك، تشير إلى أن علم النفس السريري لم يعتمد بالكامل هذه المقاييس الشائعة، مما يعيق تنفيذ الرعاية المعتمدة على القياس (Fortney et al., 2017). يتم الاعتراف بالاهتمام الأخير بالمقاييس الشائعة، بما في ذلك النسب المئوية ودرجات T، خاصة في المبادرات مثل نظام قياس نتائج المرضى (PROMIS) (Rothrock et al., 2020).

تستخدم الدراسة بيانات من تقييم مانشستر القصير لجودة الحياة (MANSA) (Priebe et al., 1999) عبر عينتين: السكان العامين الهولنديين ومرضى اضطرابات تعاطي المواد (SUDS). يُعتبر MANSA أداة مستخدمة على نطاق واسع في الطب النفسي لتقييم الرفاهية، مع التركيز على التعافي كما يقاس من خلال التحسينات في درجات MANSA بدلاً من تقليل الأعراض فقط (Kilbourne et al., 2018). يهدف المؤلفون إلى تقديم الخصائص النفسية، والمقاييس الشائعة، والمعايير، وقيم القطع للتغيير الموثوق والتعافي المرتبطة بـ MANSA، مع مناقشة أيضًا اختيار مجموعات مرجعية مناسبة لهذه المقاييس.

النتائج

تستكشف قسم النتائج في الدراسة أحادية البعد لتقييم مانشستر القصير لجودة الحياة (MANSA) باستخدام نظرية استجابة العناصر (IRT). أظهرت التحليلات المتوازية وجود عامل رئيسي بقيمة ذاتية قدرها 4.49، مما يدعم فرضية أحادية البعد. تم تأكيد ملاءمة نموذج ذو عامل واحد مع إحصائية كاي تربيع قدرها $\chi^2(54) = 3325.20$، $p < .001$، ومؤشرات ملاءمة بما في ذلك CFI = 0.98 وTLI = 0.97، على الرغم من أن RMSEA كانت أعلى قليلاً عند 0.07. كما قارن التحليل نموذج ذو عاملين، كما اقترح Petkari et al. (2020)، الذي ميز العناصر 11-13 كعامل منفصل. كشفت تحليلات إضافية باستخدام مخططات بلاند-ألتمن عن تناقضات بين طرق مختلفة للحصول على درجات T. على وجه التحديد، كانت درجات T الخطية أقل باستمرار من درجات T IRT، خاصة في نطاق الدرجات المنخفضة، حيث اختلفت 4.88% من الحالات بأكثر من 5 نقاط من درجات T. بالمقابل، أظهرت درجات T Rankit توافقًا أفضل مع درجات T IRT، حيث تجاوزت 1.5% فقط من الحالات عتبة الفرق البالغة 5 نقاط. كما أبرزت الدراسة أن درجات MANSA تختلف حسب العمر، حيث تزداد جودة الحياة عمومًا بعد سن 45. تم تأسيس بيانات معيارية لكل من السكان العامين والسريريين، مما يشير إلى أن درجة خام قدرها 63 تتوافق مع درجة T تبلغ حوالي 50 في السكان العامين.

المناقشة

في قسم المناقشة، يقارن البحث بين استخدام النسب المئوية ودرجات T لتفسير نتائج الاختبارات، مع تسليط الضوء على مزايا وعيوب كل منهما. توفر النسب المئوية تفسيرًا مباشرًا لدرجة المستجيب بالنسبة لمجموعة مرجعية ولكن تعاني من نقاط مقياس غير متساوية، مما يمكن أن يشوه إدراك الفروق عند الأطراف القصوى من التوزيع. بالمقابل، توفر درجات T، المستمدة من درجات Z، مقياس فاصل متسق يخفف من هذه المشكلة، مما يسمح بمقارنات أكثر دقة عبر نطاق الدرجات بالكامل. يؤثر اختيار المجموعة المرجعية – السكان العامين مقابل العينات السريرية – أيضًا بشكل كبير على تفسير الدرجات، حيث توفر المعايير السريرية غالبًا رؤية أكثر شمولاً لشدة المرضى ونتائج العلاج.

تناقش الورقة أيضًا تداعيات توزيعات الدرجات الخام غير الطبيعية في القياسات السريرية، مشيرة إلى أن التوزيعات المنحرفة يمكن أن تؤدي إلى تفسيرات غير دقيقة لدرجات T عند استخدام تحويلات خطية بسيطة. تقترح طرق تطبيع بديلة، مثل تطبيع Rankit ونظرية استجابة العناصر (IRT)، لتحقيق درجات T أكثر دقة. يهدف المؤلفون إلى التحقيق في التحيز الناتج عن التحويلات الخطية ومقارنة فعالية تطبيع Rankit والتحويلات المعتمدة على IRT، مع وضع قيم قطع للتغيير الموثوق والأهمية السريرية. تؤكد النتائج على أهمية اختيار طرق التسجيل المناسبة والمجموعات المرجعية لتعزيز صلاحية وموثوقية التقييمات النفسية.

القيود

تقدم الدراسة نقاط قوة ملحوظة، خاصة استخدام عينات كبيرة لتناسب نموذج نظرية استجابة العناصر (IRT) وتأسيس معايير لمجموعتين عمريتين متميزتين (18-44 و45+). ومع ذلك، تشمل القيود تمثيلًا مفرطًا للنساء في عينة السكان العامين وتمثيلًا ناقصًا في العينة السريرية للمرضى الذين يعانون من اضطراب تعاطي المواد. على الرغم من هذا الاختلال، لم تُلاحظ أي اختلافات كبيرة بين الجنسين في درجات MANSA عبر كلا العينتين، مما يشير إلى أن التعديلات حسب الجنس في درجات T ودرجات PR قد لا تكون ضرورية.

بالإضافة إلى ذلك، تعترف الدراسة بوجود متغيرات غير معالجة قد تؤثر على درجات الاختبار، مثل الحالة الاجتماعية والاقتصادية والتحصيل العلمي. بسبب استخدام بيانات المرضى المجهولة، لم يتمكن الباحثون من استكشاف هذه العوامل ضمن العينة السريرية. يسهل تحويل الدرجات الخام إلى درجات T التفسير، حيث تشير الدرجات التي تقل عن 50 إلى رضا عن جودة الحياة دون المتوسط؛ ومن الجدير بالذكر أن 16% من السكان يسجلون أقل من 40 و16% فوق 60.

Journal: Journal of Psychopathology and Behavioral Assessment, Volume: 48, Issue: 1
DOI: https://doi.org/10.1007/s10862-025-10266-0
PMID: https://pubmed.ncbi.nlm.nih.gov/41613632
Publication Date: 2026-01-28
Author(s): Edwin de Beurs et al.
Primary Topic: Health Systems, Economic Evaluations, Quality of Life

Overview

The section discusses the importance of utilizing common metrics, specifically T-scores and Percentile Rank (PR) scores, in measurement-based care and routine outcome monitoring to enhance the interpretation of self-report questionnaires in therapy. The study analyzed data from the Manchester Short Assessment (MANSA) across two normative samples—one from the Dutch general population (N = 11,789) and another from a clinical sample of patients with substance use disorder (N = 9,987). It was found that T-scores derived from simple linear conversions were biased at the lower end of the scale, whereas normalized T-scores obtained through Rankit normalization or Item Response Theory (IRT) methods provided more accurate representations. The study also established cut-off values for reliable change indices (RCI) and clinically significant change for both raw scores and T-scores.

The findings emphasize that using normalized T-scores and PR scores can facilitate patient understanding of test results, bridging the gap between clinical expertise and patient experience. This approach aids in tailoring therapeutic interventions and assessing when treatment goals are met. The study suggests that non-adjusted T-scores and PR scores should be preferred, as age-adjusted scores average around 50 for both age groups. Furthermore, it highlights the need for standardized assessments in therapy, despite their underutilization, and calls for the adoption of standard measurement scales to improve the reliability of clinical judgments and enhance evidence-based care practices.

Introduction

The introduction outlines the significance of common metrics in health care and preventive medicine, emphasizing their role in improving health outcomes and quality of life (Kaplan & Hays, 2022). It highlights the necessity of standardized scores, such as IQ-scores and T-scores, for effectively interpreting test results across diverse contexts, which is particularly relevant in educational research. However, it notes that clinical psychology has not fully adopted these common metrics, hindering the implementation of measurement-based care (Fortney et al., 2017). Recent interest in common metrics, including percentiles and T-scores, is acknowledged, particularly in initiatives like the Patient-Reported Outcomes Measurement Information System (PROMIS) (Rothrock et al., 2020).

The study utilizes data from the Manchester Short Assessment of Quality of Life (MANSA) (Priebe et al., 1999) across two samples: the Dutch general population and patients with Substance Use Disorders (SUDS). The MANSA is a widely used tool in psychiatry for assessing well-being, with a focus on recovery as measured by improvements in MANSA scores rather than solely symptom reduction (Kilbourne et al., 2018). The authors aim to present the psychometric properties, common metrics, norms, and cut-off values for reliable change and recovery associated with the MANSA, while also discussing the selection of appropriate reference groups for these metrics.

Results

The results section of the study investigates the unidimensionality of the Manchester Short Assessment of Quality of Life (MANSA) using Item Response Theory (IRT). A parallel analysis indicated a primary factor with an eigenvalue of 4.49, supporting the unidimensionality hypothesis. The fit of a single-factor model was confirmed with a chi-square statistic of $\chi^2(54) = 3325.20$, $p < .001$, and fit indices including CFI = 0.98 and TLI = 0.97, although the RMSEA was slightly higher at 0.07. The analysis also compared a two-factor model, as suggested by Petkari et al. (2020), which distinguished items 11-13 as a separate factor. Further analysis using Bland-Altman plots revealed discrepancies between different methods of obtaining T-scores. Specifically, T Linear scores were consistently lower than T IRT scores, particularly in the lower score range, with 4.88% of cases differing by more than 5 T-score points. In contrast, T Rankit scores showed better correspondence with T IRT, with only 1.5% of cases exceeding the 5-point difference threshold. The study also highlighted that MANSA scores varied by age, with quality of life generally increasing after age 45. Normative data were established for both general and clinical populations, indicating that a raw score of 63 corresponds to a T-score of approximately 50 in the general population.

Discussion

In the discussion section, the paper contrasts the use of percentiles and T-scores for interpreting test results, highlighting the advantages and disadvantages of each. Percentiles provide a straightforward interpretation of a respondent’s score relative to a reference group but suffer from non-equidistant scale points, which can distort the perception of differences at the extremes of the distribution. In contrast, T-scores, derived from Z-scores, offer a consistent interval scale that mitigates this issue, allowing for more accurate comparisons across the entire score range. The choice of reference group—general population versus clinical samples—also significantly impacts the interpretation of scores, with clinical norms often providing a more comprehensive view of patient severity and treatment outcomes.

The paper further discusses the implications of non-normal raw score distributions in clinical measures, noting that skewed distributions can lead to inaccurate T-score interpretations when using simple linear transformations. It suggests alternative normalization methods, such as Rankit normalization and Item Response Theory (IRT), to achieve more accurate T-scores. The authors aim to investigate the bias introduced by linear transformations and compare the efficacy of Rankit normalization and IRT-based transformations, while also establishing cut-off values for reliable change and clinical significance. The findings underscore the importance of selecting appropriate scoring methods and reference groups to enhance the validity and reliability of psychological assessments.

Limitations

The research presents notable strengths, particularly the use of sizable samples to fit an Item Response Theory (IRT) model and establish norms for two distinct age groups (18-44 and 45+). However, limitations include an overrepresentation of women in the general population sample and underrepresentation in the clinical sample of patients with substance use disorder. Despite this imbalance, no significant gender differences were observed in the MANSA scores across both samples, indicating that adjustments for gender in T-scores and PR scores may not be necessary.

Additionally, the study acknowledges unaddressed variables that could influence test scores, such as socioeconomic status and educational attainment. Due to the use of anonymized patient data, the researchers were unable to explore these factors within the clinical sample. The transformation of raw scale scores into T-scores facilitates interpretation, with scores below 50 indicating sub-average quality of life satisfaction; notably, 16% of the population scores below 40 and 16% above 60.