دليل عملي لتقييم حساسية سلاسل البحث الأدبي للمراجعات المنهجية باستخدام الاسترجاع النسبي A practical guide to evaluating sensitivity of literature search strings for systematic reviews using relative recall

المجلة: Research Synthesis Methods، المجلد: 16، العدد: 1
DOI: https://doi.org/10.1017/rsm.2024.6
PMID: https://pubmed.ncbi.nlm.nih.gov/41626904
تاريخ النشر: 2025-01-01
المؤلف: Malgorzata Lagisz وآخرون
الموضوع الرئيسي: تحليل البيانات الشامل والمراجعات المنهجية

نظرة عامة

تناقش هذه القسم الدور الحاسم للبحث المنهجي في الأدبيات في المراجعات المنهجية، مشددة على أن سلاسل البحث غير الحساسة يمكن أن تؤدي إلى إغفال الدراسات ذات الصلة، مما يسبب تحيزًا في تجميع الأدلة. يبرز المؤلفون اتجاهًا مقلقًا: التقييمات لحساسية سلاسل البحث نادرًا ما تُجرى وتُبلغ عنها في الأدبيات الحالية. قد يكون هذا ناتجًا عن تعقيد وعدم إمكانية الوصول إلى طرق تقييم الحساسية. لمعالجة هذه الفجوة، يوضح المؤلفون المفاهيم الأساسية المتعلقة بتقييم سلاسل البحث ويقترحون إجراءً بسيطًا لتقدير الاسترجاع النسبي لسلسلة البحث، والذي يقيس حساسيتها من خلال مقارنتها بمجموعة مرجعية من المنشورات.

تسمح الطريقة المقترحة للمعايرة للباحثين بتقييم التداخل بين سلاسل بحثهم ومجموعة محددة مسبقًا من الدراسات ذات الصلة، مما يحدد الحاجة إلى تحسين استراتيجيات البحث عندما يكون الاسترجاع منخفضًا. يقدم المؤلفون إرشادات عملية من خلال خمسة دروس تعليمية لمصادر الأدبيات المستخدمة بشكل شائع عبر الإنترنت، بهدف تعزيز ممارسات التقييم والتقرير في المراجعات المنهجية. بشكل عام، يبرز هذا العمل ضرورة وجود تجميع أدلة أكثر شفافية وقوة من خلال تحسين تقييم سلاسل البحث في المراجعات المنهجية.

مقدمة

ت outlines مقدمة ورقة البحث أهمية الدراسة ضمن مجالها، موضحة السياق والدوافع للتحقيق. تبرز الفجوات الموجودة في الأدبيات وتوضح الأسئلة البحثية الرئيسية التي تهدف الدراسة إلى معالجتها. يؤكد المؤلفون على أهمية نتائجهم لكل من الأطر النظرية والتطبيقات العملية، مما يمهد الطريق للأقسام التالية من الورقة.

علاوة على ذلك، قد تلخص المقدمة بإيجاز المنهجية المستخدمة في الدراسة، مما يوفر نظرة ثاقبة حول كيفية إجراء البحث. تعمل هذه القسم على جذب القارئ وتأكيد مساهمة العمل في تعزيز المعرفة في مجال الاهتمام. بشكل عام، تضع المقدمة أساسًا فعالًا للتحليل التفصيلي والنقاشات التي تليها.

الطرق

في هذا القسم، يصف المؤلفون منهجيتهم لتقييم جودة المراجعات المنهجية عبر مختلف التخصصات وداخل مراجعات كوكران. شملت الاستطلاع الأول عينة تمثيلية من 100 مراجعة منهجية نُشرت في عام 2022، بهدف تقييم تكرار وطرق الإبلاغ عن إجراءات تقييم سلاسل البحث، والتي غالبًا ما تعتبر غير كافية. ركز الاستطلاع الثاني على 100 بروتوكول مراجعة كوكران من نفس العام، حيث تحتوي هذه الوثائق عادةً على معلومات مفصلة حول تطوير وتقييم استراتيجيات البحث، على عكس التقارير النهائية التي قد تفتقر إلى مثل هذه التفاصيل.

توقع المؤلفون أن توفر بروتوكولات كوكران رؤى حول الدوافع وراء اختيارات استراتيجيات البحث، نظرًا لسمعة المنظمة في الدقة. ومع ذلك، أشارت النتائج الأولية من عينة من مراجعات كوكران المنشورة إلى نقص المعلومات المتعلقة بتطوير استراتيجيات البحث. تم تقديم أوصاف مفصلة لطرق العينة، والمتغيرات المستخرجة، وإجراءات التحقق من البيانات في الملف التكميلي 2، لضمان الشفافية وقابلية التكرار في نهجهم البحثي.

النتائج

تشير النتائج إلى وجود تباينات كبيرة في الإبلاغ عن تطوير وتقييم استراتيجيات البحث في المراجعات المنهجية وبروتوكولات كوكران. تفشل الغالبية العظمى من هذه الوثائق في توضيح كيفية صياغة سلاسل البحث أو تقييمها من حيث الحساسية، مما يبرز مشكلة واسعة الانتشار في الأدبيات. من الجدير بالذكر أن بروتوكولات مراجعة كوكران من المرجح أن تتضمن متخصصين في المعلومات (اختبار فيشر الدقيق، $p < 0.001$، نسبة الأرجحية = 18.25، 95% CI = [8.65، 40.65]) وأن تقدم سلسلة بحث نهائية على الأقل لمصدر أدبي واحد (p < 0.001، نسبة الأرجحية = 7.22، 95% CI = [8.65، 31.95]). تستخدم كل من المراجعات المنهجية العامة وبروتوكولات كوكران عادةً مصادر متعددة لعمليات البحث الأدبي، بمتوسط خمسة مصادر. ومع ذلك، تختلف قواعد البيانات الأكثر استخدامًا بين النوعين من المراجعات، على الرغم من أن Ovid Embase وPubMed وWeb of Science وScopus شائعة عبر ما لا يقل عن 40% من المراجعات المنهجية عبر التخصصات. قد يكون الإبلاغ المحدود عن تقييمات حساسية البحث ناتجًا عن نقص التوثيق لعملية تطوير البحث، والتي ليست مفروضة من قبل توقعات كوكران المنهجية لمراجعات التدخل (MECIR) وقد تم تناولها مؤخرًا فقط في إرشادات تقرير PRISMA2020.

المناقشة

تؤكد قسم المناقشة في الورقة على الدور الحاسم للبحث المنهجي في المراجعات المنهجية، وخاصة تطوير وتقييم سلاسل البحث المستخدمة لاسترجاع الأدبيات الأكاديمية ذات الصلة. تبرز أنه لا توجد قاعدة بيانات واحدة تشمل جميع الأبحاث المنشورة، مما يستلزم استخدام قواعد بيانات متعددة لتعزيز الشمولية. ومع ذلك، فإن فعالية هذه العملية البحثية تعتمد على الصياغة الصحيحة لسلاسل البحث، والتي هي تعبيرات منطقية تقوم بتصفية السجلات الببليوغرافية. يمكن أن تؤدي سلاسل البحث غير الكافية إلى عينات غير تمثيلية، مما يزيد من التحيزات مثل تحيز النشر. تحدد الورقة فجوة في الإرشادات لاختبار هذه السلاسل وتقترح نهجًا عمليًا لتقييمها أثناء التطوير.

يحدد المؤلفون التحديات التي تواجه تنفيذ تقييمات موضوعية لحساسية سلاسل البحث، بما في ذلك التباينات النظرية مقابل العملية في حسابات الحساسية والمصطلحات غير المتسقة المستخدمة عبر التخصصات. يقدمون نوعين رئيسيين من التقييمات: المفاهيمية، التي تعتمد على مراجعة الأقران الخبراء، والموضوعية، التي تقيم أداء سلاسل البحث بشكل كمي. كما تقدم الورقة نتائج من استطلاعات المراجعات المنهجية، تكشف عن نقص مقلق في الشفافية في الإبلاغ عن تطوير وتقييم سلاسل البحث. لمعالجة هذه القضايا، يقدم المؤلفون توصيات منهجية لإجراء وتقرير تقييمات حساسية البحث، مؤكدين على أهمية إشراك متخصصين في المعلومات والحفاظ على توثيق شامل طوال عملية تطوير استراتيجية البحث. تهدف هذه التوصيات إلى تعزيز قوة وشفافية المراجعات المنهجية، مما يحسن في النهاية من صحة استنتاجاتها.

القيود

ت outlines قسم القيود خمسة قيود رئيسية على سير العمل المقترح لتقييمات حساسية البحث. أولاً، يمثل الاعتماد على الاسترجاعات النسبية بدلاً من الاسترجاع المطلق تحديًا، حيث لا توجد حاليًا طريقة موضوعية لتحديد تمثيل الأوراق المرجعية أو دقة تقديرات الاسترجاع النسبي. يمكن التغلب على هذه القيود إذا كانت جميع الأدلة ذات الصلة معروفة، ولكن مثل هذه الحالات نادرة. ثانيًا، يتطلب تجميع الدراسات المرجعية جهدًا إضافيًا، بما في ذلك مراجعة الخبراء وفحوصات فهرسة قواعد البيانات، على الرغم من أنه يمكن دمج ذلك في النطاق الأولي لمراجعة منهجية.

ثالثًا، تتناقص دقة تقديرات حساسية سلاسل البحث مع مجموعات المعايرة الأصغر، كما يتضح من التباين في عدد الدراسات المستخدمة في المنشورات الحالية (يتراوح من 15 إلى 1,347). يثير هذا مخاوف بشأن التحيز المحتمل إذا لم تمثل الدراسات المرجعية قاعدة الأدلة الأوسع بشكل كافٍ. رابعًا، يجب أن تدعم قواعد البيانات التي تم تقييمها سلاسل بحث معقدة، وعلى الرغم من وجود حلول بديلة لقواعد البيانات التي لا تفعل ذلك، إلا أنها غالبًا ما تكون أكثر استهلاكًا للوقت وأقل وصولاً. أخيرًا، لا يوجد حد حساسية مقبول عالميًا لتعديل استراتيجيات البحث، حيث تقترح الإرشادات نطاقًا من الشمولية الكاملة إلى نهج أكثر عملية لالتقاط الغالبية العظمى من الأدلة ذات الصلة. يبقى تحديد متى يجب التوقف عن التعديل قرارًا ذاتيًا لكل فريق مراجعة.

Journal: Research Synthesis Methods, Volume: 16, Issue: 1
DOI: https://doi.org/10.1017/rsm.2024.6
PMID: https://pubmed.ncbi.nlm.nih.gov/41626904
Publication Date: 2025-01-01
Author(s): Malgorzata Lagisz et al.
Primary Topic: Meta-analysis and systematic reviews

Overview

The section discusses the critical role of systematic literature searches in systematic reviews, emphasizing that non-sensitive search strings can lead to the omission of relevant studies, thereby biasing the evidence synthesis. The authors highlight a concerning trend: evaluations of search string sensitivity are infrequently conducted and reported in existing literature. This may stem from the complexity and inaccessibility of sensitivity evaluation methods. To address this gap, the authors clarify key concepts related to search string evaluation and propose a straightforward procedure for estimating the relative recall of a search string, which measures its sensitivity by comparing it to a benchmark set of publications.

The proposed benchmarking approach allows researchers to assess the overlap between their search strings and a predefined set of relevant studies, thereby identifying the need for improvements in search strategies when recall is low. The authors provide practical guidance through five tutorials for commonly used online literature sources, aiming to enhance the evaluation and reporting practices in systematic reviews. Overall, this work underscores the necessity for more transparent and robust evidence synthesis by improving the evaluation of search strings in systematic reviews.

Introduction

The introduction of the research paper outlines the significance of the study within its field, establishing the context and rationale for the investigation. It highlights the existing gaps in the literature and articulates the primary research questions that the study aims to address. The authors emphasize the relevance of their findings to both theoretical frameworks and practical applications, setting the stage for the subsequent sections of the paper.

Furthermore, the introduction may briefly summarize the methodology employed in the study, providing insight into how the research was conducted. This section serves to engage the reader and underscore the contribution of the work to advancing knowledge in the area of interest. Overall, the introduction effectively lays the groundwork for the detailed analysis and discussions that follow.

Methods

In this section, the authors describe their methodology for assessing the quality of systematic reviews across various disciplines and within Cochrane Reviews. The first survey involved a representative sample of 100 systematic reviews published in 2022, aiming to evaluate the frequency and methods of reporting search string evaluation procedures, which are often deemed inadequate. The second survey focused on 100 Cochrane Review protocols from the same year, as these documents typically contain detailed information about the development and evaluation of search strategies, unlike the final reports that may lack such specifics.

The authors anticipated that the Cochrane protocols would provide insights into the rationale behind search strategy choices, given the organization’s reputation for rigor. However, preliminary findings from a sampling of published Cochrane reviews indicated a lack of information regarding the development of search strategies. Detailed descriptions of the sampling methods, extracted variables, and data validation procedures are provided in Supplementary File 2, ensuring transparency and reproducibility in their research approach.

Results

The results indicate significant discrepancies in the reporting of search strategy development and evaluation in systematic reviews and Cochrane protocols. A majority of these documents fail to detail how search strings were formulated or assessed for sensitivity, highlighting a widespread issue in the literature. Notably, Cochrane review protocols are more likely to involve information specialists (Fisher’s exact test, $p < 0.001$, odds ratio = 18.25, 95% CI = [8.65, 40.65]) and to present a definitive search string for at least one literature source (p < 0.001, odds ratio = 7.22, 95% CI = [8.65, 31.95]). Both general systematic reviews and Cochrane protocols typically utilize multiple sources for literature searches, with a median of five sources. However, the most frequently used databases differ between the two types of reviews, although Ovid Embase, PubMed, Web of Science, and Scopus are common across at least 40% of cross-disciplinary systematic reviews. The limited reporting on search sensitivity evaluations may stem from inadequate documentation of the search development process, which is not mandated by Cochrane's Methodological Expectations of Cochrane Intervention Reviews (MECIR) and has only recently been addressed in the PRISMA2020 reporting guidelines.

Discussion

The discussion section of the paper emphasizes the critical role of systematic searching in systematic reviews, particularly the development and evaluation of search strings used to retrieve relevant academic literature. It highlights that no single database encompasses all published research, necessitating the use of multiple databases to enhance comprehensiveness. However, the effectiveness of this search process is contingent upon the proper formulation of search strings, which are logical expressions that filter bibliographic records. Inadequate search strings can lead to non-representative samples, exacerbating biases such as publication bias. The paper identifies a gap in guidance for testing these search strings and proposes a practical approach to evaluating them during development.

The authors outline the challenges faced in implementing objective evaluations of search string sensitivity, including the theoretical versus practical discrepancies in sensitivity calculations and the inconsistent terminology used across disciplines. They present two main types of evaluations: conceptual, relying on expert peer review, and objective, which quantitatively assesses the performance of search strings. The paper also reports findings from surveys of systematic reviews, revealing a concerning lack of transparency in reporting search string development and evaluation practices. To address these issues, the authors provide methodological recommendations for conducting and reporting search sensitivity evaluations, emphasizing the importance of involving information specialists and maintaining thorough documentation throughout the search strategy development process. These recommendations aim to enhance the robustness and transparency of systematic reviews, ultimately improving the validity of their conclusions.

Limitations

The section on limitations outlines five key constraints of the proposed benchmarking workflow for search sensitivity evaluations. First, the reliance on relative recalls instead of absolute recall presents a challenge, as there is currently no objective method to determine the representativeness of benchmark papers or the accuracy of relative recall estimates. This limitation could be circumvented if all relevant evidence were known, but such cases are rare. Second, assembling benchmark studies necessitates additional effort, including expert review and database indexing checks, although this can be integrated into the initial scoping of a systematic review.

Third, the precision of search string sensitivity estimates diminishes with smaller benchmarking sets, as demonstrated by the variability in the number of studies used in existing publications (ranging from 15 to 1,347). This raises concerns about potential bias if the benchmark studies do not adequately represent the broader evidence base. Fourth, the evaluated databases must support complex search strings, and while workarounds exist for databases that do not, they are often more time-consuming and less accessible. Finally, there is no universally accepted sensitivity threshold for refining search strategies, with guidelines suggesting a range from full comprehensiveness to a more pragmatic approach of capturing the majority of relevant evidence. The determination of when to cease refinement remains a subjective decision for each review team.