تحديد الظهور بين التخصصات في علم العلوم: دمج تحليل الشبكات وBERTopic Identifying interdisciplinary emergence in the science of science: combination of network analysis and BERTopic

عربي
English

المجلة: Humanities and Social Sciences Communications، المجلد: 11، العدد: 1
DOI: https://doi.org/10.1057/s41599-024-03044-y
تاريخ النشر: 2024-05-10

تحديد الظهور بين التخصصات في علم العلوم: دمج تحليل الشبكات وBERTopic

كيونغوي كيم®, ديتر ف. كوجلر © & سيرا ماليبوهول

الملخص

يتوسع الإنتاج العلمي العالمي بشكل متسارع، مما يتطلب فهمًا أفضل لعلم العلوم وخاصة كيفية توسع حدود المجالات العلمية من خلال عمليات الظهور. تقترح الدراسة الحالية تطبيق تقنيات نمذجة الموضوعات المدمجة لتحديد العلوم الناشئة الجديدة من خلال أنشطة إعادة تركيب المعرفة كما يتضح من خلال تحليل بيانات النشر البحثي. أولاً، يتم بناء مجموعة بيانات من البيانات الوصفية المشتقة من قاعدة بيانات مجموعة ويب العلوم الأساسية. ثم تُستخدم مجموعة البيانات هذه لإنشاء خريطة عالمية تمثل شبكة تداخل علمية تصنيفية. يتم تعريف مجال البحث على أنه متعدد التخصصات عندما يتم إدراج فئات علمية متعددة في وصفه. ثانيًا، تتم مقارنة الشبكات المتداخلة بين الفترات لتحديد أنماط التأثير المتغيرة في ضوء التخصصات المتعددة. ثالثًا، تمكّن نمذجة الموضوعات المدمجة من الربط غير المراقب للتصنيف متعدد التخصصات. نقدم نتائج التحليل لإظهار ظهور العلوم العالمية متعددة التخصصات، وعلاوة على ذلك، نقوم بإجراء تحقق نوعي على النتائج لتحديد مصادر المجالات الناشئة. بناءً على هذه النتائج، نناقش التطبيقات المحتملة لتحديد الظهور من خلال دمج المجالات العالمية متعددة التخصصات.

المقدمة

أصبحت إنتاجية البحث المدفوعة بالعلوم وعمليات الابتكار المرتبطة بها أكثر تعقيدًا لعدد من الأسباب (بلوم وآخرون 2020؛ بويك وآخرون 2017؛ تشين 2006؛ تشو وإيفانز 2021؛ جونز 2009؛ كوزلو 2023). على مستوى العالم، تم نشر أكثر من 2.6 مليون مقال علمي في عام 2018 وحده (وايت 2019). مع زيادة الإنتاج العلمي بمرور الوقت، كان هناك أيضًا تنوع متزايد في مصادر الموضوعات الناشئة نتيجة لإعادة تركيب الموضوعات والمجالات. من المتوقع أن تكون الموضوعات الناشئة التي تعبر مجالات العلوم أقل اعتمادًا على المسارات مقارنة بأنماط إنتاج المعرفة العلمية السابقة. تماشيًا مع ذلك، يؤكد فورتوناتو وآخرون (2018) على الحاجة إلى فهم علم العلوم، خاصة مع انهيار الحدود التخصصية.

العلم المعاصر هو نظام ديناميكي من المساعي المدفوعة بتفاعلات معقدة بين الهياكل الاجتماعية، وتمثيلات المعرفة، والعالم الطبيعي. تتكون المعرفة العلمية من مفاهيم وعلاقات متجسدة في الأوراق البحثية، والكتب، وبراءات الاختراع، والبرمجيات، وغيرها من القطع العلمية، المنظمة في تخصصات علمية ومجالات أوسع. هذه العناصر الاجتماعية والمفاهيمية والمادية مرتبطة من خلال تدفقات رسمية وغير رسمية من المعلومات، والأفكار، وممارسات البحث، والأدوات، والعينات. وبالتالي، يمكن وصف العلم بأنه شبكة معقدة، ذات تنظيم ذاتي، ومتطورة باستمرار على مقاييس متعددة. (فورتوناتو وآخرون 2018، ص. 1)
بينما ارتفع إنتاج البحث، انخفضت إنتاجية العلوم – أو القيمة المستمدة من ذلك الإنتاج – عبر المجالات (بلوم وآخرون 2020). تباطأ معدل الابتكار لأن مستوى التخصص (جونز 2009) وحجم الفرق (كوزلو 2023) اللازمة لإجراء العلوم قد زاد. مرتبطًا بالتخصص وحجم الفريق، ارتفعت تكاليف البحث والتطوير بشكل حاد، مما قلل من معدل إنتاجية العلوم (بلوم وآخرون 2020). سبب آخر هو كيفية قياس الظهور. على سبيل المثال، مع زيادة حجم الإنتاج العلمي، تقل القدرة على تقييم الموضوعات البحثية الناشئة لأن الأدبيات الكنسية من المرجح أن يتم الاستشهاد بها (تشو وإيفانز 2021). “هل يمكن أن نكون نفتقد نماذج جديدة خصبة لأننا محاصرون في مجالات دراسة مفرطة العمل؟” (تشو وإيفانز 2021، ص.5). علاوة على ذلك، هل يمكن أن نكون نخطئ في تحديد مصدر القيمة الناشئة من العلوم؟

لذلك، فإن لهذا آثارًا مهمة بالنظر إلى أهمية التنبؤ العلمي لفهم وتطوير مبادرات سياسة العلوم والتكنولوجيا والابتكار (STI) الفعالة التي تهدف إلى دعم العلوم وتوقع مسارات الابتكار (بورنر وآخرون 2018). في الأساس، غالبًا ما تكون النتائج الابتكارية نتيجة لتقنيات متقاربة تعتمد بشكل كبير على المدخلات العلمية متعددة التخصصات (كوجلر وآخرون 2022). وبالتالي، وربما ليس من المستغرب، فإن المحاولات المعاصرة لمعالجة والتصدي للتحديات الكبرى العالمية موجهة نحو البحث متعدد التخصصات حيث يُعتقد أن التكامل العميق للتخصصات التي تجمع بين أنواع مختلفة من النماذج العلمية والتكنولوجية في علم الجينوم/ التكنولوجيا الحيوية، وتكنولوجيا النانو، وتكنولوجيا المعلومات (مثل، البلوكشين، وأجهزة الاستشعار، والذكاء الاصطناعي، والبيانات الضخمة) هي غالبًا أكثر الطرق الواعدة التي يجب متابعتها (بيترسن وآخرون 2021). تؤكد الأمثلة الحديثة، مثل لقاح mRNA لفيروس COVID-19، هذه الفكرة حيث أنها عادة ما تكون نتيجة لعدة عقود من البحث العلمي الذي قد يصبح فعالًا للغاية فقط عندما يتم دمج التقدم في مجالات علمية مختلفة في حل تكنولوجي أو ابتكار قابل للتطبيق واحد. تأتي التقارب الماضي

من مجالات متعددة التخصصات الناشئة، مثل التكنولوجيا الحيوية، التي تحفز المزيد من الابتكارات من قطاعات أخرى (فيلدمان وآخرون 2015). وبالتالي، قد توفر التغييرات عند الحدود متعددة التخصصات التي تتغير رؤى إضافية حول الأنشطة المحتملة للتقارب في المستقبل.

الاكتشافات الجديدة، خاصة تلك ذات الجذور متعددة التخصصات، عادة ما تكون صعبة النسبة إلى أنظمة التصنيف الحالية (فاجربرغ وآخرون 2012)، ولكن بنفس القدر، فإنها تحدد حدود عملية الابتكار حيث تجمع بين أشكال المعرفة الموجودة في شيء جديد تمامًا (أيزنهاورد ومارتن 2000؛ لي وآخرون 2015؛ شومبيتر 1934؛ 1942). وبالتالي، يمكن استخدام مجالات العلوم متعددة التخصصات لتعريف ظهور موضوعات جديدة (تشاكرايورتى 2018؛ خان وود 2015؛ لي وآخرون 2015). باستخدام تحليل الشبكة البيبليومترية على بيانات النشر، تقترح الدراسة الحالية نهجًا قادرًا على تحديد من أين تنشأ مجالات العلوم متعددة التخصصات بناءً على خريطة علمية عالمية تشير أيضًا إلى التغيرات في نمو التأثير.

تحديدًا، تستخدم التحقيق نمذجة الموضوعات لتصنيف موضوعات البحث العلمي من كمية كبيرة من البيانات باستخدام خوارزميات غير مراقبة. ثم يمكّن نهج نمذجة الموضوعات المدمجة المقترح من تحديد موضوعات العلوم الناشئة بما يتماشى مع مفاهيم شومبيتر لعمليات إعادة تركيب المعرفة حيث من الممكن ملاحظة كيف تتكشف مجموعة من التخصصات أو فئات العلوم بمرور الوقت. على عكس التقارب التكنولوجي الذي تم دراسته بشكل أكثر منهجية (لي وآخرون 2019)، لم توجه القليل من الدراسات، على حد علمنا، جهود بحثية مماثلة نحو عمليات إعادة تركيب المعرفة متعددة التخصصات وكيف يمكن أن تؤثر هذه على التطور العام لمشهد المعرفة العلمية بأكمله ونتائج الابتكار اللاحقة. علاوة على ذلك، فإن تطبيق نمذجة الموضوعات في بيئات معالجة اللغة الطبيعية (NLP) على دراسات العلوم متعددة التخصصات الناشئة يحمل القدرة على تقديم رؤى مهمة. يمكن أن يساعد النهج الجديد الذي يجمع بين نمذجة الموضوعات المدمجة وطرق تحليل الشبكات المتداخلة عبر خرائط العلوم العالمية في تحديد موضوعات العلوم الناشئة قبل أن تتماسك في مجالات وتوقع تلك التي تحمل قيمة محتملة لإعادة تركيب المعرفة مما يؤدي إلى التقارب العالمي.

الهدف الشامل هو تحليل التعقيد والتنظيم الذاتي وتطور إنتاج المعرفة العلمية أثناء فرز كمية كبيرة من المنشورات العلمية، وفهم كيف يمكن أن يكون من الممكن توقع الابتكارات العلمية عندما تظهر من مجالات البحث المتقاربة. الهدف الرئيسي من هذه الدراسة هو تقديم نهج جديد لأدوات تحليل الببليومترية من خلال دمج تحليل الشبكات وتقنيات نمذجة الموضوعات المدمجة لتحديد الموضوعات العلمية الناشئة في البحث بين التخصصات.

علاوة على ذلك، تم تطوير واتباع مقياس جديد للموضوعات الناشئة، باستخدام مؤشر مركزية الشبكة. بالإضافة إلى ذلك، نستفيد من تقنية نمذجة الموضوعات المدمجة، وتحديدًا BERTopic (تمثيلات الترميز ثنائية الاتجاه من المحولات)، للحصول على رؤى حول الملفات الناشئة والعابرة للمجالات على مستوى العالم ضمن مجالات العلوم بين التخصصات. من خلال هذا النهج الشامل، نهدف إلى إلقاء الضوء على تطور علم العلوم من خلال التحقيق في الحدود المتغيرة للبحث بين التخصصات.

في الأقسام التالية، نقدم نظرة عامة على الأدبيات ذات الصلة في هذا الخط من الاستفسار، ونقدم المنهجية المتبعة من خلال النتائج التجريبية العامة والتفصيلية، وأخيرًا نقدم مناقشة مفصلة وبعض الأفكار الختامية.

مراجعة الأدبيات

تم تطوير خرائط العلوم لفهم الأنماط المتعلقة بعلم العلوم، والتي تشمل تحديد الموضوعات ذات الاهتمام (Zahedi و van Eck 2018)، وتحديد معدلات نمو العلوم (Bornmann و Mutz 2015)، وتحديد ظهور الموضوعات (Jung و Segev 2022a)، واكتشاف الأنماط والاتجاهات في
الأدبيات العلمية (Kim و Chen 2015)، خاصة من خلال تركيبات جديدة من مجالات العلوم والتكنولوجيا بين التخصصات (Blei و Lafferty 2007؛ Eum و Maliphol 2023؛ Khan و Wood 2015؛ Lee et al. 2015). خرائط العلوم هي تمثيلات شبكية للأدبيات العلمية التي تطورت في أساليب البحث (Chen 2006). تحت هذه الأساليب السابقة، هناك تركيز على إيجاد ابتكارات جديدة جذريًا ضمن مجال متخصص من العلوم.

بدأ تطور الأدبيات حول الظهور بتحليل الاقتباسات ويجمع حاليًا بين الأساليب التي تحدد أنماط الشبكة باستخدام تقنيات نمذجة الموضوعات (Rotolo et al. 2015). يُستخدم تحليل الشبكة عادةً لرسم الاتجاهات والأنماط في الأدبيات العلمية، على سبيل المثال، المرتبطة من خلال الاقتباسات، بما في ذلك ظهور اكتشافات جديدة رائدة تغير مسار تخصص علمي (Chen 2006). يمكن استخدام رسم خرائط العلوم التي تربط الأدبيات البحثية من خلال الاقتباسات لإظهار مراحل تطور مختلفة من التطور العلمي على مر الزمن، مما يسمح بتحديد المساهمات التحويلية من خلال التحليل التنبؤي (Chen 2017). تم تصميم نماذج لتشمل جوانب مختلفة من علم العلوم. تمثل خرائط العلوم المتداخلة مجموعات أو شبكات من المنشورات من خرائط أساسية عالمية، مميزة مستويات مختلفة من تصنيف مجالات البحث (Sjögårde 2022).

يمكن تعريف التقنيات الناشئة من العلوم من خلال الخصائص المقاسة من خلال مؤشرات ببليومترية وتحليل نصي (Rotolo et al. 2015). من خلال دمج تحليل النص الكامل ومؤشرات ببليومترية، قام Glenisson et al. (2005) بإجراء دراسة أظهرت فائدة تقنيات التنقيب عن البيانات والببليومترية التي تسهل رسم مجالات العلوم. تم نمذجة أنماط الظهور العلمي من خلال التجميع (Glänzel و Thijs 2012؛ Yau et al. 2014)، والإنتاج الوطني (Suominen و Toivanen 2016)، واستخدام الشبكات لإظهار الظهور (Khan و Wood 2015).

من المتوقع أن تنمو الموضوعات الناشئة بسرعة من مجالات البحث غير المؤكدة والغامضة وتتقارب لتحدث تأثيرًا جديدًا (Rotolo et al. 2015). تركز الدراسات السابقة حول الظهور على الخرائط المحلية أو المجالات المحددة مسبقًا للدراسة، على سبيل المثال، Curran و Leker (2011) حول صناعة المكملات الغذائية؛ Rey-Martí et al. (2016) حول ريادة الأعمال الاجتماعية؛ و Song et al. (2017) حول الطب الشخصي. تم إجراء الدراسات الحالية التي تظهر الظهور من خلال تحليلات ببليومترية باستخدام تقنيات نمذجة الموضوعات المعتمدة على التردد التي حددت موضوعات العلوم (Griffith et al. 2004)، وتماسك الموضوع (Newman et al. 2011)، و”انفجارات” الموضوع (Mane و Börner 2004)، وأنماط الاختراق العلمي (Winnink et al. 2019). غالبًا ما يتم تحديد الظهور من خلال مقياس التنوع داخل الخريطة المحلية، على سبيل المثال، تنوع Rao-Stirling والتنوع النسبي (Leydesdorff و Rafols 2011؛ Leydesdorff et al. 2019؛ Rafols و Meyer 2010).

تكون دراسات العلوم الناشئة محدودة في نطاقها من خلال تقييد مجالات الدراسة من خلال مجلات أو مقالات أو مؤلفين محددين. بمجرد إنشاء خريطة العلوم، يتم تحليل نمذجة الموضوعات بناءً على قيم الشبكة الناتجة عن الخريطة. يتم تحديد المصطلحات ذات التردد الأعلى في النص ككتل موضوعات ناشئة. وبالتالي، تفحص هذه الدراسات علم العلوم الناتج ضمن موضوع علمي أو فئة أو مجموعة مجلات بناءً على مقاييس التردد والتنوع داخل خريطة محلية. تعرف هذه الأساليب مسافة التداخل بين التخصصات من خلال مقاييس نسبية ضمن مجال العلوم. من خلال الاعتماد على التردد، تكون الأساليب السابقة أكثر عرضة للتحيز الكنسي وقد تتجاهل السياق. وبالتالي، فإن تأثير أو أهمية زوج العلوم بين التخصصات في خريطة العلوم تقدم نهجًا بديلاً لتحديد الظهور.

الجدّة ضرورية أيضًا لتعريف الظهور (Rotolo et al. 2015). يمكن تحديد الجدّة من خلال دمج “تيارات البحث” أو مجالات العلوم التي كانت منفصلة سابقًا (Day و Schoemaker 2000؛ Shin et al. 2022؛ Small et al. 2014). وبالتالي،
مقياس آخر للتنظيم الناشئ هو التداخل السريع النمو بين مجالات أو تقنيات متعددة (Bornmann 2013؛ Bornmann و Marx 2014؛ Lee et al. 2021؛ Leydesdorff et al. 2013). على مر الزمن، أصبحت الأبحاث أكثر تداخلًا بين التخصصات (Chakraborty 2018). تمر مجالات البحث بثلاث مراحل: النمو، النضج، والتداخل بين التخصصات (Chakraborty 2018).

ومع ذلك، لا يزال تصنيف التخصصات وتفريقها غير مستقر ولا يزال بحاجة إلى التشغيل (Sugimoto و Weingart 2015). إحدى طرق تعريف التخصصات هي استخدام مؤشرات النشر المعتمدة على البيانات مثل فئات Web of Science (WoS) (Sugimoto و Weingart 2015). بعد ذلك، يمكن نمذجة التداخل بين التخصصات باستخدام الكلمات الرئيسية، ومجالات دراسة المؤلفين، والاقتباسات التي تعبر مجالات متعددة (Chakraborty 2018؛ Xu et al. 2018، 2019). تم استخدام توقع الموضوع باستخدام تحليل الشبكة للعثور على أنماط ناشئة عبر المجالات التي تم تعريفها مسبقًا وترتبط من خلال تكرار التواجد المشترك (Jung و Segev 2022b).

يجب أن يوازن مقياس التداخل بين التخصصات بين التنوع والتشابه (Leydesdorff 2018). عند المقارنة مع البيانات العالمية، فإن تقييد اكتشاف الموضوعات ضمن تخصص واحد يتجاهل الطبيعة المتزايدة التداخل التي يتم بها إجراء العلوم (Boyack 2017). يؤدي استخدام الخرائط العالمية إلى تقسيمات أكثر دقة وتماسك نصي أعلى للموضوعات لأن السياق الكامل محفوظ. (Klavans و Boyack 2011). علاوة على ذلك، تميل المسافات الطويلة بين الموضوعات بين التخصصات إلى أن يكون لها تأثير علمي أكبر (Larivière et al. 2015). عندما تتضمن الأبحاث العلمية أفكار تكنولوجية جديدة، فإن العلوم المتقاربة تميل إلى أن يكون لها تأثير أكبر (Kwon et al. 2019). علاوة على ذلك، تميل أبحاث العلوم الإنسانية والاجتماعية إلى أن يكون لها كثافة اقتباس أقل مما يؤدي إلى مقاييس أقل للتداخل بين التخصصات (Larivière et al. 2015).

بينما تستخدم العديد من التحقيقات مقاييس متعددة التخصصات للظهور، غالبًا ما تقتصر الدراسات السابقة على تحليل خرائط العلوم المحلية التي تركز على مجال ضيق من العلوم باستخدام مقاييس نسبية للظهور. علاوة على ذلك، تم نمذجة تشكيل البحث متعدد التخصصات في الأدبيات ذات الصلة بشكل رئيسي من خلال تطور تزامن الكلمات الرئيسية (Xu et al. 2018). وبالتالي، فإن إحدى القيود الكبيرة للدراسات الحالية المتعلقة بتحديد الهياكل الموضوعية والأنماط الديناميكية هي أن الباحثين قاموا ببناء خرائط علمية حول مواضيع محددة مسبقًا (Gläser et al. 2017). من خلال تقييد نطاق الموضوع، لجأت الأساليب إلى استخدام مقاييس قائمة على التكرار لتحديد الجدة النسبية، والسرعة لتعريف الظهور. ومع ذلك، يمكن أن يقيّد تطور الكلمات الرئيسية القائم على التكرار فهمنا للتعددية، ويتجاهل السياق، ويعزز التحيز الكنسي. بالمقابل، يمكن أن توفر خرائط العلوم العالمية نتائج غير متحيزة إذا كان حجم الوثائق كبيرًا بما فيه الكفاية (Rafols et al. 2010). بينما تميز بعض الدراسات بين التخصصات المتعددة، والتخصصات المشتركة، والتخصصات العابرة (Chakraborty 2018؛ Leydesdorff et al. 2018)، لا تزال عملية تفعيل هذه التمييزات محدودة. وبالتالي، تميز هذه الدراسة مفهوم العلوم المتزايدة والمهيمنة التي تركز على تحديد أهمية التعددية عبر شبكات مجالات العلوم والتكنولوجيا والهندسة والرياضيات.

المنهجية

تجمع الدراسة الحالية بين تحليل الشبكات وBERTopic وتطبقه لفهم مجالات الموضوعات عبر المجالات. BERTopic هي تقنية نمذجة موضوعات متكاملة تستخدم متجهات التضمين وc-TF-IDF لإنشاء مجموعات كثيفة تسمح بموضوعات قابلة للتفسير من بيانات النص. يعد تحليل النص التقليدي نشاطًا يتطلب جهدًا كبيرًا مما يحد من أحجام العينات إلى السرعات التي يمكن للباحثين البشريين قراءتها، حتى الدراسات الطموحة تقتصر على بضع مئات. لهذا السبب، نمذجة الموضوعات

الشكل 1 عملية البحث العامة. تتم عملية البحث العامة على مرحلتين: (i) تعريف شبكة من الوثائق بناءً على أزواج موضوعات العلوم و(ii) تحديد الموضوعات من بيانات الشبكة.

تم تقديم تقنيات تعتمد على النهج القائم على التكرار (مثل التحليل الدلالي الكامن، تخصيص ديريشلي الكامن، نموذج الموضوع الديناميكي) لاشتقاق موضوعات غير مرئية من عدد كبير جدًا من النصوص. ومع ذلك، تزيل الأساليب القائمة على التكرار السياق من خلال الاعتماد فقط على تكرارات المصطلحات. تسمح لنا الأساليب الجديدة القائمة على التضمين مثل BERTopic، بأخذ المعرفة السياقية لمجموعات بيانات النصوص الكبيرة في الاعتبار. بيانات الويب للعلوم (WoS)

مع أكثر من 63 مليون سجل نشر موجود في 12,500 مجلة عالية الجودة، تعتبر هدفًا شائعًا للتحليل البيبليومتري.

تم تقديم البيانات والأساليب المستخدمة في التحليل التجريبي وفقًا لعملية البحث العامة الموصوفة في مرحلتين (الشكل 1): جمع البيانات والمعالجة المسبقة، تحليل الشبكة لمجموعة بيانات العلوم متعددة التخصصات، ونمذجة الموضوعات لمجموعة البيانات التي تم إنشاؤها حديثًا. تجمع المرحلة الأولى البيانات وتحضرها من بيانات النشر الخاصة بالمجلة لتحليل الشبكة ونمذجة الموضوعات. في المرحلة 1، يتم إجراء تحليل شبكة فئة العلوم-الموضوع لبناء شبكة علوم متعددة التخصصات. في هذه الشبكة العلمية متعددة التخصصات التي تم بناؤها، يتم تعريف فئات العلوم-الموضوعات التي لديها مركزية شبكة أكبر، أي تلك التي لديها قيمة محتملة أكبر من حيث إعادة تركيب المعرفة. هنا، يتم تقسيم مجموعة البيانات إلى فترتين متتاليتين لإنشاء شبكتين للعلوم متعددة التخصصات. من خلال مقارنة قيم الشبكة في فترتين، يتم اختيار فئات العلوم-الموضوعات التي من المرجح أن تنمو (مجال العلوم الناشئة) والتي من المرجح أن يكون لديها تكرار أكبر (مجال العلوم المهيمنة) في الفترة التالية لتصفية مجموعة البيانات النصية النهائية لنمذجة الموضوعات. من خلال هذه الخطوة، يمكن استخراج بيانات أكثر دقة ووضوحًا حول المنشورات عن طريق تصفية تلك التي تشمل مثل هذه الفئات العلمية-الموضوعات لتقييد البيانات إلى ‘مجالات العلوم الناشئة’. باستخدام القائمة المصفاة من المنشورات، في القسم الفرعي التالي (الشكل 1، المرحلة 2)، يتم إجراء نمذجة الموضوعات لاستكشاف الموضوعات الناشئة في كل مجال علمي متعدد التخصصات. تشمل هذه المرحلة جميع العمليات المطلوبة لتشغيل تحليل نموذج BERTopic. من خلال هذه العملية، يتم اشتقاق الموضوعات الكامنة التي تمثل كل علم متعدد التخصصات. للتحقق النوعي، يتم تحليل المنشورات التي تمثل أكثر الموضوعات الناشئة – التي تم تحديدها من خلال عملية التعلم غير المراقب – لتحديد ما هي الموضوعات ذات الاهتمام للفئات متعددة التخصصات المعطاة.

جمع البيانات. من أجل التحليل التجريبي، يتم جمع البيانات الوصفية من قاعدة بيانات ويب العلوم. توفر قاعدة البيانات معلومات بيبليومترية عن المنشورات العلمية بما في ذلك عنوان النشر، السنة، عنوان المجلة، المؤلف، المؤسسة، عنوان المؤسسة، الفئة العامة، مجال الموضوع، التمويل، الاقتباسات، إلخ. يجب أن تتضمن البيانات الوصفية أيضًا حقولًا تتيح التمييز حسب نوع الوثيقة (مثل: مقال، مادة تحريرية، مراجعة،
عنصر سيرة ذاتية، رسالة، ببليوغرافيا، تصحيح، مراجعة كتاب، ملخص اجتماع، أو ورقة إجراءات) ونوع النشر (مجلة، كتاب في سلسلة، أو كتاب). تسمح هذه المعايير بتقييد عيّنتنا إلى المنشورات التي كُتبت لنفس الغرض، للحفاظ على جودة المقالات، وتجنب التكرار. تقتصر مجموعة البيانات المستخدمة هنا على مقالات المجلات من خلال تصفية أنواع الوثائق والنشر.

ثم يتم اختيار قائمة المنشورات التي تلبي تعريف العلوم متعددة التخصصات وتقسيمها إلى فترات ثلاث سنوات، مما يساعد على استقرار تصنيفات مجموعة البيانات (Archambault et al. 2009). حسب التعريف، تشير العلوم متعددة التخصصات إلى الحالات التي يعتمد فيها الناتج العلمي على مجالات بحث مختلفة. في قاعدة بيانات WoS، يتم تعريف مجالات البحث من خلال التصنيفات العلمية، والعناوين الفرعية، والمواضيع. تشير الفئة العلمية العالمية العامة (‘العنوان الفرعي’ في WoS) إلى التصنيف الأعلى للمجالات العلمية بما في ذلك علوم الحياة والطب الحيوي (LSB)، التكنولوجيا (TE)، العلوم الفيزيائية (PS)، الفنون والعلوم الإنسانية، والعلوم الاجتماعية. هذه الفئات متبادلة الحصرية. يشير مجال الموضوع إلى تصنيف أدنى للعلوم يتم تعيينه إلى عنوان فرعي متوافق. هنا، يتم توفير جميع التصنيفات من قبل WoS، حيث يتم تصنيف جميع المجلات والكتب المدرجة في WoS وفقًا لذلك. في هذه الدراسة، يتم تعريف مجال العلوم متعددة التخصصات على أنه الناتج العلمي القائم على ما لا يقل عن عنوانين فرعيين، وهما فئات العلوم.

في مجموعة بيانات منشورات WoS الخاصة بنا، يتم استخدام المنشورات ذات العناوين الفرعية القائمة على التكنولوجيا والعلوم (LSB وTE وPS) للحفاظ على اتساق المجالات العلمية. تم جمع ما مجموعه 7,453,987 منشورًا (من 10,138 مجلة) مع 226 موضوعًا أولاً على مدار فترة الإشارة من 2012 إلى 2017. من مجموعة البيانات هذه، يتم تصفية المنشورات العلمية العالمية متعددة التخصصات، مما يمنحنا

منشورات (من 1137 مجلة) مع 172 موضوعًا. يتم تقييد عيّنتنا النهائية إلى المنشورات التي تم تصنيفها كمقالة مجلة (doc_type

‘مقالة’ وpub_type = ‘مجلة’) دون ملخصات مفقودة. تقدم الجدول 1 الإحصائيات الوصفية الأساسية حول عدد المنشورات والمواضيع والمجلات لكل مجال علمي متعدد التخصصات مدرج في عيّنتنا النهائية. من بين جميع العلوم متعددة التخصصات، تمتلك PS-TE أكبر عدد من المنشورات والمواضيع والمجلات، مما يظهر أنها أكثر مجالات العلوم متعددة التخصصات نشاطًا. تعكس الزيادات في المنشورات من جميع الأنشطة العلمية متعددة التخصصات الاتجاه العالمي لتقارب التكنولوجيا حيث يتم استخدام المزيد من التقنيات المتنوعة والحقول الصناعية معًا بمرور الوقت.

تحليل شبكة تزامن فئات العلوم-الموضوع

مجموعة أزواج فئة الموضوعات العلمية. قبل إجراء تحليل شبكة التزامن لفئة الموضوعات العلمية، يتم إنشاء مجموعة أزواج تزامن فئة الموضوعات العلمية. في مجموعة بيانات العلوم متعددة التخصصات، يتم تعيين قائمة بفئات الموضوعات العلمية ذات الصلة بالعناوين الفرعية لكل منشور. تمثل كل فئة موضوع علمي عقدة في الشبكة مرتبطة بالمنشورات. لإجراء تحليل شبكة التزامن، يتم تحويل تركيبات فئات الموضوعات لكل منشور إلى مجموعة بيانات على شكل أزواج لكل مجال علمي متعدد التخصصات يحدد الحواف بين العقد. نحن نوضح فئات الموضوعات العلمية من خلال الإشارة إلى فئاتها بحرف كبير (A، B، أو C) ورقم.

) لتمييز مواضيع فئة العلوم ضمن الفئات. إذا كانت المنشورة X تحتوي على ثلاثة مواضيع من فئة العلوم A3 و B6 و C9، فسيكون لديها ثلاثة صفوف من مجموعات الأزواج: A-B و B-C و A-C. إذا كانت المنشورة Y تحتوي على ثلاثة مواضيع من فئة العلوم A1 و A2 و B5، فسيكون لديها صفين مكررين من مجموعات الأزواج بين التخصصات: A-B و A-B. بمجرد تحويل مجموعة البيانات، يتم تجميع أعداد أزواج مواضيع فئة العلوم عن طريق العد

الجدول 1 الإحصائيات الوصفية لمجموعات استكشاف العلوم متعددة التخصصات.

	2012-2014			2015-2017
	نشر	موضوع	مجلة	نشر	موضوع	مجلة
LSB-TE	٦٨٧٦٨	٨٠	162	79,112	81	١٧٥
LSB-PS	١١٥,٤٩٩	67	228	١٢٠,١٦١	67	٢٤٨
بي إس-تي إي	٣٤٥,٥٢٠	85	584	414,010	86	637
LSB-PS-TE	٢٥٤٤٧	43	40	٢٥,٨٠٥	43	43

الشكل 2 شبكة تداخل الفئات والمواضيع في فئة العلوم. تُظهر شبكة تداخل الفئات والمواضيع في فئة العلوم مثالاً على شبكة من عقد المنشورات، مثل المنشور 1، مرتبطة بالمواضيع المدرجة، مثل A1.

عدد المنشورات بما في ذلك أزواج فئات العلوم والمواضيع. وبالتالي، فإن مجموعة أزواج فئات العلوم والمواضيع المجمعة تعرض عدد المنشورات لأزواج فئات العلوم والمواضيع في كل علم متعدد التخصصات في الفترة المعنية.

تحليل شبكة تداخل الموضوعات في فئة العلوم. باستخدام مجموعات أزواج الموضوعات، يتم إجراء تحليل شبكة تداخل الموضوعات للحقول العلمية متعددة التخصصات في كل فترة. تعتبر شبكة التداخل وسيلة فعالة لتحليل العلاقة الهيكلية بين العناصر. تم استخدام نهج مشابه مع بيانات براءات الاختراع لتحليل تلاقي التكنولوجيا (Curran وLeker 2011؛ Kogler وآخرون 2017؛ Kim وآخرون 2018، 2019). في هذا الصدد، يمكن أن توفر شبكة التداخل باستخدام بيانات النشر فهماً أكبر لكيفية استخدام فئات الموضوعات العلمية وعلاقتها ببعضها البعض عبر الحقول العلمية متعددة التخصصات. في شبكة تداخل الموضوعات، تُستخدم فئات الموضوعات العلمية كعقد، وتُستخدم المنشورات كحواف. بالنسبة لقواعد الربط، يتم اعتماد الشبكات غير الموجهة والموزونة. كما هو موضح في الشكل 2، يتم ربط فئات الموضوعات العلمية فقط إذا تم استخدامها في نفس المنشور. على سبيل المثال، تحتوي الموضوعات A وC على إجمالي حافتين لأنهما تم استخدامهما في المنشورات 1 و2.

بمجرد بناء خريطة الشبكة العالمية للتداخل بين التخصصات، يتم قياس قيم مركزية Eigenvector (EIG) لجميع العقد (في هذه الشبكة، فئات العلوم). في شبكة تداخل فئات العلوم، يمكن اعتبار فئة علمية أكثر أهمية أو تأثيرًا كفئة علمية رئيسية في مجال العلوم التداخلية، ويجب تسليط الضوء على تلك التي تمتلك قيمة شبكة أكبر لأنها هي التي تقود تداخل فئات العلوم. هنا، يقيس EIG تأثير عقد الشبكة بما يتجاوز مجرد حساب التكرار من خلال النظر في مركزية العقد المتصلة (West et al. 2013). على سبيل المثال، تعتبر فئة علمية مرتبطة بفئات علمية مهمة ذات تأثير أكبر في الشبكة. بدلاً من افتراض أهمية متساوية، يميز هذا القياس وزن الحواف حسب أهمية العقد المتصلة. على عكس مركزية الدرجة، التي تركز فقط على عدد الاتصالات، يقيم EIG أهمية العقد من خلال تقييم أهمية اتصالاتها. تلتقط هذه الطريقة الجانب النوعي لعلاقات الشبكة. علاوة على ذلك، بينما تم تصميم PageRank خصيصًا للشبكات الموجهة، فإن مرونة EIG تسمح بتطبيقه بفعالية على الشبكات غير الموجهة أيضًا. في هذا الجانب، يمكن استخدام EIG كمؤشر لقياس أهمية أو تأثير المجال الناشئ.

الشكل 3 مفهوم التخصصات متعددة التخصصات النامية والمهيمنة. توضح الرسوم البيانية كيف يختلف العلم الناشئ عن العلم المهيمن كما تقيسه مركزية Eigenvector ومعدل نمو مركزية Eigenvector.

التداخل بين التخصصات (Heo & Lee، 2019؛ Qian et al.، 2017؛ Rapach et al.، 2015). مع EIG، وهو مؤشر شبكي يقيس تأثير عقدة في شبكة من خلال تخصيص أوزان لكل اتصال بناءً على مركزية العقدة المتصلة (Bonacich 2007)، يمكن عزل الفئة العلمية الرئيسية – الموضوع من حيث كونها أكثر أهمية بشكل نسبي.

باستخدام EIG، يتم اقتراح الإطار المفاهيمي للحقول العلمية السائدة والناشئة للأغراض التالية. أولاً، من خلال استخدام EIG ومعدل نموه (EIG.GRيمكن تحديد العلوم السائدة أو المتنامية من حيث إعادة تركيب المعرفة. يتم تحديد العتبة للعلوم البينية السائدة والمتنامية عند القمة.

من فئة العلوم – المواضيع. في الأساس، يتم اختيار فقط أولئك الذين يتم تصنيفهم في أعلى 10% في كل مقياس ويتم تسميتهم كعلوم مهيمنة ومتنامية، على التوالي. اختيار الأفضل

عتبة لـ EIG وEIG.GRكمعيار لتحديد المواضيع العلمية السائدة أو الناشئة، يُعتبر هذا القرار منهجياً متعمداً. تم تصميم هذا العتبة لتسليط الضوء بشكل انتقائي على أكثر المجالات تأثيراً أو تطوراً بسرعة، مع الأخذ في الاعتبار التوزيع المنحرف للشبكات العلمية حيث تتجمع بعض العقد في الغالبية العظمى من الاتصالات. يسمح ذلك بتحديد كل من المجالات الراسخة والناشئة، مما يعكس الطبيعة الديناميكية للبحث العلمي. تقلل مثل هذه المقاربة المحافظة من الإيجابيات الكاذبة بسبب التقلبات الإحصائية، مما يضمن أن يتم اعتبار المواضيع التي تتمتع بمقاييس عالية باستمرار فقط. علاوة على ذلك، فإن وضع معيار واضح يسهل التحليل المقارن على مر الزمن وعبر التخصصات، مما يوفر طريقة متسقة وموثوقة لتتبع التغيرات في المشهد العلمي. تؤكد هذه الاختيار على نهج استراتيجي للاعتراف بالاتجاهات والتحولات المهمة ضمن مجال البحث العلمي، مما يبرز أهمية كل من التأثير المستمر والنمو الملحوظ في تحديد بروز المواضيع العلمية.

كما هو موضح في الشكل 3، إذا كانت EIG (أو EIG.GRتتراوح قيمة فئة-موضوع العلوم ضمن الأعلى

يعتبر علمًا سائدًا (أو ناشئًا). إذا كانت قيم كل من EIG وEIG.GRضمن الأعلى

، فإن فئة العلوم يمكن تصنيفها على أنها سائدة وناشئة في نفس الوقت، مما يدل على تأثيرها الحالي وزيادة كبيرة في تأثيرها. وعلى العكس، إذا لم يقع أي من القيم ضمن الأعلى

فئة العلوم – الموضوع لا يُعتبر أيضًا

الشكل 4 عملية نمذجة BERTopic. تتضمن عملية نمذجة BERTopic تحويل بيانات الوثائق إلى بيانات متجهة، وتقليل الأبعاد، وتنظيم البيانات في مجموعات ومواضيع.

مهيمنة أو ناشئة. هذا يسمح لنا بالتركيز على القائمة المحددة من المنشورات التي تكون أكثر قيمة في النشاط العلمي بين التخصصات. أيضًا، يساهم ذلك في تحسين عملية الحساب لتحليل النصوص من خلال تقليل حجم العينة. بدلاً من إجراء تحليل نصي على العينة الكاملة، يمكن أن يؤدي التركيز على المنشورات المختارة التي يمكن افتراض أن لديها إمكانات أكبر وأن تكون متسقة من حيث مواضيع العلوم إلى تحسين دقة تحليلنا. في هذا الصدد، يمكن استخدام مواضيع العلوم بين التخصصات المختارة التي تنمو كمرجع للمواضيع المحتملة في المستقبل. نظرًا للطبيعة المعتمدة على المسار للمعرفة، غالبًا ما يُلاحظ ميل قوي أو تفضيل لمتابعة مثل هذا المسار، خاصة في الأنشطة المعتمدة على المعرفة. بعبارة أخرى، من المحتمل جدًا أن تكون إما وضع الشبكة الحالي أو نمو الشبكة الحالي متسقًا أيضًا في الفترة التالية. سيتم مناقشة ذلك بمزيد من التفصيل مع النتائج التجريبية في القسم التالي.

نظرًا لأن الاهتمام الرئيسي لهذه الدراسة هو استكشاف مواضيع جديدة ناشئة في مجالات العلوم بين التخصصات، فإننا نركز على العلوم النامية بدلاً من العلوم المهيمنة. في الخطوة التالية، يتم تصفية المنشورات التي تمثل مواضيع فئة العلوم بين التخصصات النامية.

نمذجة الموضوعات المدمجة

BERTopic. لاشتقاق الموضوعات لعلوم النمو لكل وثيقة علمية بين التخصصات، يتم استخدام نموذج BERTopic. BERT، المعروف أيضًا باسم تمثيلات الترميز ثنائية الاتجاه من المحولات، هو نموذج لغوي يعتمد على التعلم العميق مبني على بنية المحولات التي طورتها جوجل (Devlin et al. 2019). كما هو موضح في الشكل 4، فإن BERTopic هو تقنية نمذجة موضوعات متكاملة تضم تمثيلات BERT، وتقريب وتوقع متعدد الأبعاد الموحد (UMAP)، والتجميع المكاني القائم على الكثافة الهرمية للتطبيقات مع الضوضاء (HDBSCAN)، وتكرار المصطلحات القائم على الفئة (c-TF-IDF) (Grootendorst 2022).

الخطوة الأولى هي تحويل التضمين إلى متجهات، والتي تحول الوثائق المستهدفة إلى متجهات. على عكس طرق نمذجة الموضوعات التقليدية التي تعتمد على نهج حقيبة الكلمات (BoW) التي تركز فقط على تكرار المصطلحات، يستخدم BERTopic متجهات التضمين. تمثل هذه التضمينات الوثائق في فضاء، بينما تكون أقل في الأبعاد مقارنة بالمفردات المحتملة الواسعة لـ BoW، غنية في التقاط المعلومات الدلالية العميقة الكامنة في النص. وهذا يسمح بفهم سياقي أعلى للوثائق. من خلال الاستفادة من تمثيلات الكلمات المدربة مسبقًا، يمكّن BERTopic تحليل الوثائق مع رؤى دقيقة حول معانيها السياقية، متجاوزًا قيود طرق تحويل التضمين التقليدية. هنا، استخدمنا نموذج تمثيل النص الافتراضي، “all-MiniLM-L6-v2″، لتحليلنا. تم تصميم هذا النموذج كنموذج متعدد الأغراض، ويعمل عن طريق تحويل الجمل والفقرات إلى فضاء متجه كثيف بعدد أبعاد 384. إنه متعدد الاستخدامات، مناسب لمهام مثل التجميع أو البحث الدلالي، خاصة لنصوص اللغة الإنجليزية. مقارنةً بنموذج “all-mpnet-base-v2″، المعروف بتقديم أفضل جودة، فإنه
يعمل أسرع بخمس مرات دون المساس بالجودة

، وقد أدت فعاليته إلى اعتماده في دراسات ذات صلة متعددة (Samsir et al. 2023; Wang et al. 2023).

الخطوة الثانية في BERTopic تتضمن تقليل الأبعاد. هذه خطوة حاسمة لأن خوارزميات التجميع، التي تعد جزءًا لا يتجزأ من نمذجة الموضوعات، تعمل بشكل أفضل مع بيانات منخفضة الأبعاد. التحدي الرئيسي الذي يتم معالجته هنا هو ‘لعنة الأبعاد’، حيث يمكن أن تؤثر الفضاءات عالية الأبعاد سلبًا على كفاءة وفعالية خوارزميات التجميع. من خلال تقليل أبعاد فضاء التضمين، يخفف BERTopic بشكل فعال من هذه المشكلة، مما يسهل تكوين مجموعات موضوعات أكثر تماسكًا ودقة. يبرز هذا النهج أهمية تخصيص خطوات معالجة البيانات لتعزيز أداء الخوارزميات المحددة المستخدمة في عملية نمذجة الموضوعات. لهذا السبب، يتم استخدام خوارزمية UMAP لتقليل تعقيد متجه التضمين مع الحفاظ على هيكله الأساسي. بافتراض أن البيانات عالية الأبعاد تقع في بعد أقل، تقوم UMAP بتخطيط البيانات المعقدة للغاية إلى فضاء أبسط بكفاءة من خلال الحفاظ على المسافة النسبية والكثافة، مما يسهل تحديد مجموعة الوثائق المتشابهة (McInnes et al. 2016).

الخطوة التالية هي تجميع الوثائق باستخدام HDBSCAN، الذي يولد مجموعات بناءً على كثافة نقاط البيانات باستخدام طريقة الشجرة الهرمية. واحدة من نقاط القوة في HDBSCAN هي أنه يمكنه تحديد الضوضاء والتعامل معها بشكل فعال، مما يساعد على اشتقاق مجموعات أكثر معنى. بالإضافة إلى ذلك، يظهر الجمع بين UMAP وHDBSCAN أداءً أفضل في تجميع النصوص (Asyaky and Mandala 2021)، ويمكن تعديل نتائج التجميع من خلال ضبط المعلمات الفائقة المتعلقة بتوليد المجموعات.

الخطوة الأخيرة هي توليد الموضوعات باستخدام c-TF-IDF. c-TF-IDF هو تعديل لـ TF-IDF، الذي تم تصميمه لالتقاط المصطلحات التمثيلية من الوثائق لكل موضوع. يُعرف TF-IDF بأنه مقياس فعال للعثور على المصطلحات التمثيلية من خلال دمج تكرار المصطلحات وتكرار الوثائق العكسي (Salton and Buckley 1988). بناءً على الافتراض بأن المصطلح التمثيلي لوثيقة يجب أن يكون مميزًا يمثل الوثيقة، يلتقط هذا المقياس ببساطة المصطلحات التي لا تحدث بشكل متكرر في الوثيقة فحسب، بل تحدث أيضًا بشكل أقل تكرارًا في وثائق أخرى. باستخدام c-TF-IDF

(Eq. 1)، يمكن العثور على أهمية مصطلح ضمن فئة معينة.

التحقق النوعي من النتائج. بمجرد تحليل خرائط العلوم بين التخصصات، يمكن توليد قائمة بالمنشورات التمثيلية لكل فئة بين التخصصات بناءً على الموضوعات المحددة من خلال BERTopic. ومع ذلك، يمكن أن يؤدي الاعتماد على التعلم الآلي إلى تصنيف خاطئ (Lyutov et al. 2021)، لذا نقوم بفحص نتائج نمذجة الموضوعات لتحديد من أين تنشأ الموضوعات الناشئة الجديدة ووصفها. العديد من الدراسات الحديثة التي تطبق BERTopic قد أجرت تحققًا نوعيًا أو يدويًا من النتائج (Balcı et al. 2023; Capra, 2024; de Lima et al. 2023; Kasperiuniene et al. 2020; Wang et al. 2023). باستخدام التحليل النوعي، نقوم بمراجعة نتائج عملية BERTopic للتحقق منها. أولاً، يتم اعتبار كلمات الموضوع لتحديد ما إذا كانت توفر موضوعًا مشتركًا للمقالات تحت الموضوعات. يتم استخدام نهج نوعي لفحص الموضوعات لتحديد خصائص الموضوعات الناشئة. بعد إجراء BERTopic على مجموعات البيانات، تظهر قائمة بكلمات الموضوع والمقالات التمثيلية من خلال العملية غير المراقبة، على سبيل المثال، الموضوع-1. بالإضافة إلى ذلك، تتطلب القابلية للتتبع البساطة بحيث تكون التمثيلات معقدة بشكل غير ضروري بحيث يجب أن يكون حتى غير الخبراء قادرين على تفسيرها.

الشكل 5 نتيجة تحليل شبكة تزامن الموضوعات. أ LSB-TE. ب LSB-PS. ج PS-TE. د LSB-PS-TE. ملاحظة: مواضيع العلوم بين التخصصات النامية مكتوبة بخط عريض.

(Rafols et al. 2010). يتم مقارنة النتائج للتحقق من أنها عقلانية أو “منطقية” لغير الخبراء. بالإضافة إلى ذلك، يتم تقييم قوائم المجلات لتمييز خصائص الموضوعات. من المتوقع أن تكون الموضوعات غير المنطقية عشوائية أو لا تتناسب مع تعريفنا للعلوم العالمية بين التخصصات.

دراسة حالة حول العلوم بين التخصصات في شبكة العلوم

إعداد مجموعة بيانات العلوم بين التخصصات. بعد الدراسات البيبليومترية السابقة التي استخدمت تقنيات نمذجة الموضوعات (Suominen and Toivanen 2016; Velden et al. 2017; Yau et al. 2014)، نستخدم مجموعة بيانات Web of Science Core Collection (WoS)،

وهي قاعدة بيانات للمجلات العلمية التي تمت مراجعتها من قبل الأقران والتي تنشر في جميع أنحاء العالم. توفر قاعدة بيانات WoS البيانات الوصفية اللازمة المطلوبة للمعالجة المسبقة، مثل اختيار مقالات المجلات التي تمت مراجعتها من قبل الأقران.

نتائج تحليل شبكة التداخل بين فئات العلوم والمواضيع. في هذا القسم، يتم تقديم نتائج تحليل شبكة التداخل بين فئات العلوم والمواضيع. توضح الشكل 5 العلوم البينية السائدة والمتنامية باستخدام الإطار المفاهيمي المقدم في الشكل 3، ويقدم الجدول 2 القائمة الكاملة للعلوم السائدة والمتنامية. تمثل جميع العقد فئات العلوم والمواضيع المدرجة في كل مجال علمي بيني، والعلوم السائدة (الموجودة أبعد إلى اليمين على المحور السيني) والعلوم المتنامية (الموجودة أعلى على)

تم وضع تسميات على المحور (س). نقطة مثيرة للاهتمام هي أنه يتم ملاحظة تمييز واضح بين العلوم السائدة والعلوم البينية المتنامية في جميع الحالات. بالنظر إلى الطبيعة المعتمدة على المسار للمعرفة، من المحتمل أن تظل العلوم السائدة مهيمنة في الفترة التالية. ومع ذلك، تركز توقعات الاتجاهات الرئيسية الناشئة على دمج فئات مواضيع العلوم البينية الجديدة.
الذي من المتوقع أن يكون له تأثير أكبر، بدلاً من تلك المعروفة بالفعل. الفجوة بين نوعين من فئات العلوم – الموضوعات تبرر نهجنا في تمييز الموضوعات العلمية الواعدة في المستقبل عن تلك التي تسود بالفعل، والأهم من ذلك، تشير إلى أن التركيز على الموضوعات الناشئة يتناسب أكثر مع هدف هذا البحث.

تركز هذه الدراسة على التأثير المتزايد للعلوم متعددة التخصصات للتحقيق في الموضوعات الرئيسية التي من المحتمل أن تزداد في المستقبل القريب. في هذا الصدد، يتم استخدام المنشورات التي تشمل العلوم متعددة التخصصات المتزايدة للخطوة التالية من التحليل. كما هو موضح في الجدول 2 والشكل 6، تميل قيم EIG لفئة مواضيع العلوم متعددة التخصصات المتزايدة في الفترة التالية إلى أن تكون أكبر من تلك الخاصة بمجالات العلوم الأخرى. وهذا يعكس أن مواضيع فئة العلوم متعددة التخصصات المتزايدة في الفترة الحالية لديها أكبر الزيادات في الفترة التالية. مع بعض الاستثناءات، فإن هذه المواضيع تختلف عن تلك الموجودة في مجالات العلوم السائدة. لذلك، يتم استخدام مجموعة من المنشورات متعددة التخصصات التي تشمل العلوم المتزايدة لنموذج BERTopic.

التصنيف غير المراقب لموضوعات العلوم البينية الناشئة

إعداد BERTopic. بينما تعتبر الأساليب التقليدية لنمذجة الموضوعات عدد الموضوعات كمعامل فرعي مهم لإجراء التحليل، فإن BERTopic لا يتطلب ذلك بالضرورة لأن UMAP و HDBSCAN يسهلان تحسين عملية التجميع، ويولدان تلقائيًا قائمة بالموضوعات. ومع ذلك، فإن تحديد عدد الموضوعات لا يزال مهمًا لأن عملية التعلم الآلي التلقائية بالكامل قد تؤدي إلى نتيجة غير مفهومة. على سبيل المثال، إذا تم إجراء BERTopic باستخدام إعداداته الافتراضية وخوارزميات تحسين HDBSCAN، فسوف يولد تلقائيًا قائمة بالموضوعات، لكن هذا لا يضمن

الجدول 2 قائمة بالفئات العلمية السائدة والنامية في مجالات العلوم متعددة التخصصات.

علم	فئات العلوم السائدة – المواضيع	مواضيع فئة العلوم المتزايدة
LSB-TE	علوم البيئة	الغابات
	الهندسة، البيئية	علوم المواد، الأنسجة
	العلوم والتكنولوجيا الخضراء والمستدامة	الأدوات والأجهزة
	الطاقة والوقود	علم الأدوية والصيدلة
	الهندسة الكيميائية	العلوم والتكنولوجيا الخضراء والمستدامة
	علم البيئة	الطب، البحث والتجريب
	الصحة العامة والبيئية والمهنية	الهندسة، البيئية
	الأشعة، الطب النووي وتصوير الطب	علم البيئة
LSB-PS	الكيمياء، التطبيقية	علوم الأعصاب
	الكيمياء الحيوية وعلم الأحياء الجزيئي	علوم وخدمات الرعاية الصحية
	علوم وتكنولوجيا الغذاء	علم المناعة
	الكيمياء، التحليلية	علوم البوليمرات
	طرق البحث البيوكيميائية	علم الحفريات
	الكيمياء، متعددة التخصصات	الميكروبيولوجيا
	الكيمياء، الطبية	المصايد
بي إس-تي إي	علوم المواد، متعددة التخصصات	الهندسة، الطيران والفضاء
	الفيزياء التطبيقية	العلوم والتكنولوجيا الخضراء والمستدامة
	علوم النانو والتكنولوجيا النانوية	الهندسة البحرية
	الكيمياء، الفيزيائية	الجغرافيا، الفيزيائية
	الفيزياء، المادة المكثفة	الموارد المائية
	الكيمياء، متعددة التخصصات	الهندسة، الميكانيكية
	الهندسة، الكهربائية والإلكترونية	الصوتيات
	الطاقة والوقود	الهندسة، المحيط
	علوم المواد، الطلاءات والأفلام	أنظمة الأتمتة والتحكم
LSB-PS-TE	علوم البيئة	الاستشعار عن بُعد
	الموارد المائية	علوم التصوير والتكنولوجيا الفوتوغرافية
	الهندسة، البيئية	علوم الأرض، متعددة التخصصات
	علوم الحاسوب، التطبيقات متعددة التخصصات	علم البلورات
	الإحصاء والاحتمالات

ملاحظة: تم ترتيب قائمة مواضيع فئة العلوم بترتيب تنازلي.

الشكل 6 مقارنة بين EIG في الفترة التالية بين فئة العلوم المتعددة التخصصات النامية وغيرها. ملاحظة: في المتوسط، فإن مركزية المتجه الذاتي في الفترة التالية لفئة العلوم المتعددة التخصصات النامية (0.348) أعلى من غيرها (0.093).

الجدول 3 اختبار المعلمات الفائقة لـ BERTopic.

	نشر	نطاق n-gram	عدد المواضيع	حجم الموضوع الأدنى
علم LSB-TE المتنامي	٢٦,١٦٤	أو أو	50 ~ 1000	١٣٠-٧٨٠
علم LSB-PS المتنامي	10,577			50-300
علم PS-TE المتنامي	٤٩٠٤٢			240-1440
علم نمو LSB-PS-TE	904			٥-٥٠

أن النتيجة مقبولة أيضًا من حيث التطبيق والحصول على رؤى.

لهذا السبب، تم اختبار ثلاثة معلمات فرعية وهي نطاق n-gram، عدد الموضوعات، والحجم الأدنى للموضوع ضمن نطاقات للعثور على أفضل نتائج نموذج BERTopic (الجدول 3). يحدد نطاق n-gram ما إذا كان يجب أن يغطي المصطلح الأحادية أو الثنائية أو الثلاثية، ويحدد عدد الموضوعات العدد الأولي للموضوعات عند تشغيل BERTopic، ويحدد الحجم الأدنى للموضوع الحد الأدنى لعدد الوثائق التي يجب أن يحتوي عليها كل موضوع. بينما تم اختبار قيم المعلمتين الفرعيتين الأوليين بنفس النطاق (نطاق n-gram: أحادية، ثنائية، ثلاثية؛ عدد الموضوعات: 5-1000)، تم استخدام قيم الحجم الأدنى للموضوع بشكل متناسب مع العدد الإجمالي للمنشورات. يمكن أن تتأثر قيم الحجم الأدنى للموضوع بشدة بحجم الوثائق، مما قد يؤدي إلى أحجام موضوعات واسعة جدًا أو ضيقة جدًا لحالات مختلفة. يؤثر هذا بشكل خاص على إنشاء موضوعات شاذة وعدد غير قابل للتفسير من الموضوعات. لذلك، يمكن أن يساعد تطبيق حجم موضوع أدنى متناسب في تقليل حجم الموضوعات الشاذة والحفاظ على عدد قابل للتفسير من الموضوعات. لهذا السبب، يتم استخدام قيمة صحيحة للحجم الأدنى للموضوع لكل حالة تمثل

من إجمالي المنشورات. لمساعدتنا في النظر في مجموعة من معلمات الهايبر المختلفة ذات النطاقات الواسعة، يتم استخدام طريقة البحث العشوائي للعثور على معلمة محسّنة مع تركيبات عشوائية، محدودة بعدد لا يتجاوز 100 تكرار.

لكل تكرار، يتم قياس قيمة إنتروبيا المعلومات (المعادلة (2)) (ماكاي 2003). من خلال العثور على حالات ذات توزيع غير متساوٍ للكلمات في الموضوع، يمكن العثور على مجموعة من المواضيع ذات التعبير الدلالي الواضح (وانغ وآخرون 2023). تُعرف إنتروبيا المعلومات كمقياس لعدم اليقين، وتوفر وسيلة لتحديد ما إذا كانت المواضيع يمكن تمييزها بوضوح. في هذا الصدد، يتم اختيار النموذج الذي يمتلك أدنى قيمة لإنتروبيا المعلومات (المعادلة (2)) كنموذج الأفضل.

نتائج BERTopic. بمجرد تقسيم مجموعة البيانات إلى علوم متعددة التخصصات، يقوم عملية BERTopic بتحديد المقالات التي تتناول مواضيع مشابهة، محدودة بعدد المواضيع المحددة. يتم تعريف المواضيع من خلال خوارزمية غير خاضعة للإشراف تحدد قوائم شائعة من الكلمات الرئيسية التي تصف المواضيع.

تقدم الجدول 4 مجموعات المواضيع التي تظهر في أكبر عدد من المقالات لكل زوج من العناوين الفرعية: LSB-TE، LSB-PS، PS-TE، و LSB-PS-TE. يتم استخدام قائمة الكلمات الرئيسية للمواضيع المحددة في مجموعة النصوص متعددة التخصصات لتعريف المواضيع. تُستخدم مجموعات القيم الشاذة لمنع تشكيل مجموعات مواضيع غير منطقية أو معزولة.

التحقق النوعي من النتائج. استنادًا إلى دراسات حديثة تطبق BERTopic (بالجي وآخرون 2023؛ كابرا 2024؛ دي ليما وآخرون 2023؛ كاسبيريونين وآخرون 2020؛ وانغ وآخرون 2023)، أجرت هذه الدراسة تحققًا نوعيًا أو يدويًا من النتائج. بينما قد يسمح نمذجة الموضوعات بتحليل مجموعة كبيرة من البيانات، يجب أن تظل نتائج نمذجة الموضوعات قابلة للفهم لغير الخبراء (رافولز وآخرون 2010). وبالتالي، نقوم بإجراء تحليل نوعي على نطاق صغير للتحقق من أن هذا الشرط قائم.

بينما يتم مطابقة جميع المقالات مع الموضوع الذي من المرجح أن يتناسب معه، ليست جميع المقالات التي تندرج تحت الموضوع ممثلة بشكل متساوٍ للموضوع. يتم تحديد المقالات الممثلة من خلال تقنية نمذجة الموضوع، مما يعني أن لديها أعلى احتمال للتوافق مع الموضوع. يتم تقديم أفضل 3 مقالات ممثلة تتناسب مع المواضيع المحددة من خلال نمذجة الموضوع في الجدول 5. يمكن بسهولة مطابقة جميع المقالات الممثلة مع المواضيع التي تم مطابقتها معها.

عند النظر في حالة LSB-TE، فإن “الخصائص الميكانيكية وتركيب المواد الليفية الطبيعية” يتم تمثيلها بشكل أكبر من خلال المقالات LSB-TE-0-A إلى C. تحتوي عناوين المقالات على عبارات تتناسب بشكل واضح مع الموضوع الناشئ: “لحاء الشجر”، “مواد العزل”، “التصنيع”، “لوح الخشب الملصق باللاصق الأخضر”، “مقاومة المواد المعالجة حرارياً”، “تحت ضغط شديد”، و”خشب الرماد”. علاوة على ذلك، فإن عناوين المجلات تمثل أيضًا الموضوع: مجلة منتجات الغابات ومجلة أوروبا للخشب ومنتجات الخشب (تظهر مرتين). توجد أنماط مشابهة للمواضيع الناشئة الأخرى المدرجة في الجدول 5. لذلك، نجد أن المواضيع الناشئة التي تم تعريفها تمثل موضوعًا يمكن التعرف عليه بسهولة. بشكل أوسع، ترتبط العديد من المواضيع الناشئة بالتقنيات والعلوم الخضراء، وإلى حد أقل بالتقنيات المتعلقة بالصحة.

يمكن تحديد المجلات التي تحتوي على أكبر عدد من المنشورات حول المواضيع البينية الناشئة من قائمة المواضيع المحددة (الجدول 6). ومع ذلك، فإن المجلات التي تظهر فيها هذه المواضيع تتجمع بين جزء صغير من جميع المنشورات؛ حيث إن توزيع المنشورات التي تحتوي على مواضيع بينية ناشئة مائل نحو حصة صغيرة من جميع المجلات في مجموعة البيانات. تم نشر نصف جميع المنشورات في أعلى خمس المجلات في كل مجموعة فئة بينية: النسبة المئوية 14 (LSB-TE)،

النسبة المئوية (LSB-PS)، النسبة المئوية العاشرة (PS-TE)، والنسبة المئوية الثامنة عشر (LSB-PS-TE). بالإضافة إلى ذلك، عند النظر في المجلات الرائدة التي تظهر من تصنيف نتائج التداخل بين التخصصات، تصبح الفئات أوضح عند النظر في المواضيع الناشئة. بالنسبة لـ PS-TE، يمكن رؤية المواضيع الناشئة فقط في مجلة تحلية المياه ومعالجة المياه والمجلة الدولية لطاقة الهيدروجين. العناوين الأخرى تشير إلى العلوم والتقنيات المعنية: الكيمياء الفيزيائية، المستشعرات، والمواد.

المناقشة والاستنتاج

بينما تواصل العلوم توسيع إنتاجها البحثي، توفر علم ظهور العلوم فرصة لفهم مصدر المعرفة الجديدة – مصدر الابتكار – من خلال دراسة التداخل العالمي بين التخصصات. ركزت معظم الدراسات السابقة على الاكتشافات أو تحديد الاتجاهات الشائعة ضمن مجالات ضيقة من الدراسة التي تقاس بحجم التكرار. تطبق هذه الأساليب السابقة منطق تحديد أنماط المواضيع السائدة المعتمدة على التكرار ضمن مجال علمي محدد. بالمقابل، تقدم الدراسة الحالية منظورًا بديلًا لفهم علم ظهور العلوم مع التركيز على تأثير الحدود المتغيرة لدمج العلوم عبر الفئات. المساهمات الرئيسية لأبحاثنا هي (i) توسيع تعريف التداخل بين التخصصات ليشمل العالمي.

الجدول 4 مواضيع وقوائم الكلمات الرئيسية لزوجي التواجد المشترك في فئة العلوم

فئات العلوم	موضوع	الكلمات الرئيسية	الملصقات المولدة	عدد الوثائق
LSB-TE (26,164 منشور)	شاذ	القطع، التشغيل، الجلد، الطحن، الرادون، الأسبستوس، التشحيم، الدباغة، الرصاص، MQL	–	272
	0	خشب، لجنين، خصائص، قوة، ميكانيكية، خيزران، رطوبة، سليلوز، معامل، عينات	الخصائص الميكانيكية وتركيب المواد الليفية الطبيعية	٢٢٣١٤
	1	ماء، دراسة، طاقة، نتائج، بيئية، نفايات، نموذج، استخدام، إنتاج، قائم	التقنيات البيئية المستدامة وإدارة الموارد	١٣٣٩
	2	المرضى، مجموعة، هو، المجموعات، السريرية، السيطرة، السرطان، التعبير، الخلايا	تعبير علامات الأورام السرطانية في مجموعات المرضى السريرية	٢٢٣٩
LSB-PS (10,577 منشور)	شاذ	دماغ، fnirs، تصوير، بصري، خلايا عصبية، قشرة، قشري، عصبي، نسب، تحفيز	–	١٧٦
	0	أنواع، ماء، بحر، مبكر، بيانات، متأخر، بحري، تشكيل، مناخ، جديد	دراسات التنوع البيولوجي البحري وتأثير المناخ	5129
	1	نموذج، بيانات، نماذج، مقترح، طرق، تجربة، انحدار، طريقة، محاكاة، سريرية	تقنيات نمذجة ومحاكاة التجارب السريرية	397
	2	أظهر، نشاط، خصائص، بروتين، خلايا، كيتوزان، خلية، درجة الحموضة، حمض، دواء	نشاط الكيتوزان الحيوي وتطبيقات توصيل الأدوية	4785
PS-TE (49,042 منشور)	شاذ	منتدى، مجلة، آراء، قراء، مقالات، تكهنات، تحرير، مثير، أسس، مؤسس	–	11
	0	الامتزاز، الإزالة، الغشاء، الرقم الهيدروجيني، العملية، التركيز، الماء، المعالجة، ملغ، الحمض	عمليات الامتزاز والغشاء لمعالجة المياه	10,873
	1	حرارة، نموذج، تدفق، نتائج، بيانات، مستند، ماء، طريقة، درجة حرارة، نقل	نمذجة وتحليل انتقال الحرارة في أنظمة السوائل	٣٨,١٥٨
LSB-PS-TE (905 منشورات)	0	بيانات، دراسة، أرض، منطقة، استخدام، فيضان، مكاني، قائم، نموذج، مستخدم	تقييم مخاطر الفيضانات والنمذجة المكانية	٣٣٤
LSB-PS-TE (905 منشورات)	1	الارتباط، الجزيئي، البروتين، الطاقة، التفاعلات، التثبيت، الهيكل، الديناميات، الجزيئات، النتائج	تفاعل وتوصيل جزيئات البروتين	570

الجدول 5: مقالات تمثيلية لكل موضوع ناشئ متعدد التخصصات.
فئات العلوم – موضوع ناشئ	مقالة تمثيلية
LSB-TE
الخصائص الميكانيكية و	0-1: كاين وآخرون (2015)
تركيب الألياف الطبيعية	0-2: لافاليت وآخرون (2016)
المواد	0-3: كانديليير وآخرون (2017)
البيئة المستدامة	1-1: إيغلي وآخرون (2015)
التقنيات والموارد	1-2: بالما-روخاس وآخرون (2017)
الإدارة	1-3: هاريجاني وآخرون (2017)
تعبير علامات الأورام السرطانية في السريرية	2-1: ليو وآخرون (2017)
مجموعات المرضى	2-2: ليو ولي (2017)
	2-3: تشي وآخرون (2017)
LSB-PS
التنوع البيولوجي البحري وتأثير المناخ	0-1: تشين وآخرون (2016)
دراسات	0-2: باتاي و آخرون (2016)
	0-3: لاوري وآخرون (2017)
نمذجة المحاكاة للتجارب السريرية	1-1: فرنسي وآخرون (2016)
تقنيات	1-2: ليو وآخرون (2016)
	1-3: لو (2017)
نشاط الكيتوزان الحيوي وتوصيل الأدوية	2-1: تشاو وآخرون (2016)
التطبيقات	2-2: بره وآخرون (2017)
	2-3: غوميز وآخرون (2017)
بي إس-تي إي
	0-1: أحمد (2016)
الامتزاز وعمليات الغشاء لمعالجة المياه	0-2: تشانغ وآخرون (2016)
	0-3: سعدتي وآخرون (2017)
نمذجة وتحليل انتقال الحرارة في أنظمة السوائل	1-1: كولومبو وفيرويذر (2016)
	1-2: وو وآخرون (2017).
	1-3: دابو وآخرون (2017)
LSB-PS-TE
تقييم مخاطر الفيضانات والمكانية	0-1: دينغ وآخرون (2017)
النمذجة	0-2: شيان وويلكنسون (2015)
	0-3: ريزئي وآخرون (2016)
ربط جزيئات البروتين و	1-1: شميم وآخرون (2015)
ديناميات التفاعل	1-2: خان وآخرون (2017)
	1-3: بوبوفسكا وآخرون (2016)
ملاحظة: تسبق المقالات التمثيلية رقم الموضوع ورقم الفهرس، على سبيل المثال، “O.

فئات العلوم المتقاطعة، (ii) استخدام مركزية Eigenvector كمقياس للتأثير على المواضيع الناشئة، و (iii) إظهار استخدام نمذجة الموضوعات المدمجة على مجموعة بيانات تمثل خريطة علمية عالمية. توفر هذه الدراسة محاولة مبكرة لتطبيق التصنيف غير المراقب باستخدام نمذجة BERTopic على مجموعات بيانات العلوم متعددة التخصصات. هذه الطريقة هي واحدة من الدراسات المعاصرة القليلة التي تطبق تقنيات نمذجة الموضوعات المعتمدة على تضمين النصوص في علم ظهور العلوم، والوحيدة التي تركز على تأثير المواضيع العلمية الحالية على الظهور.

علاوة على ذلك، توفر التحقيق الحالي نموذجًا بسيطًا لتحقيق التحليل المطلوب، بالإضافة إلى ذلك، يوضح أن الموضوعات الأصلية للمواضيع متعددة التخصصات يمكن تحديدها باستخدام نمذجة الموضوعات المدمجة. باستخدام تعريف شومبيتر لخلق المعرفة بناءً على عمليات إعادة التركيب، يفحص النموذج تقاطع العلوم متعددة التخصصات لتحديد الموضوعات الأكثر تأثيرًا المتعلقة بالمعرفة العلمية الناشئة بناءً على مواضيع العلوم التي يتم إسقاطها على خريطة العلوم العالمية. يمكن استخدام النتائج لتحديد ملفات تعريف الاتجاهات لمصادر المواضيع الناشئة متعددة التخصصات بمرور الوقت.

نظرًا لأن العلوم السائدة تخضع لتحيز الحجم والحقول الكنسية، فإن العلوم الناشئة بناءً على تأثير مجالات العلوم المتزامنة توفر مقياسًا بديلاً. ال

يمكن استخدام قيمة مركزية المتجه الذاتي كمقياس لنمو التعددية التخصصية التي تختلف عن الأساليب التي تركز على العلوم السائدة في شبكة التزامن لظهور التخصصات المتعددة. تختلف موضوعات العلوم السائدة عن الموضوعات المتعلقة بنمو العلوم متعددة التخصصات، مما يميز نتائج هذه الدراسة عن الدراسات السابقة التي تؤكد على العلوم السائدة المعتمدة على التكرار. تتيح لنا الطريقة التي استخدمناها الاحتفاظ بالمعرفة السياقية في تحليل النصوص. ومع ذلك، قد تشير تلك الموضوعات العلمية التي تظهر في كل من العلوم الناشئة السائدة والمتعددة التخصصات مثل “العلوم والتكنولوجيا الخضراء والمستدامة” إلى تأثير أكبر على البحث من أجل المجتمع ولها إمكانيات أكبر للتطبيقات.

تقترح هذه الدراسة أن تحديد الموضوعات الناشئة قد يساعدنا في فهم كيفية توجيه واستخدام البحث الابتكاري بشكل أفضل. اكتشفت هذه الدراسة أن الموضوعات المتعلقة بالبيئة والصحة هي موضوعات ناشئة عبر العديد من فئات العلوم متعددة التخصصات العالمية. مع ظهور التحديات العالمية، هناك حاجة إلى وسائل أكثر كفاءة وفعالية لتحديد الأبحاث الناشئة لمعالجتها؛ ومع ذلك، أصبح من الصعب بشكل متزايد تحقيق هذا الهدف (بيترسن وآخرون 2021). يزعم بلوم وآخرون (2020) أنه إذا كانت الشركات تتحول نحو أنشطة البحث الدفاعية، فيجب على سياسة الحكومة إعادة النظر في كيفية تمويل الأبحاث بشكل علني. من أجل زيادة الإنتاجية الاقتصادية، يجب الكشف عن مصادر (وعوائق) الابتكار داخل القطاعات والأفراد. على الرغم من أن هذا قد يساعد عند التركيز على التحديات الاقتصادية، قد تكون هناك حاجة إلى مقاييس إضافية لإنتاجية البحث عند النظر في مطالب الابتكار الاجتماعي. وبالتالي، فإن تفسيرًا بديلاً لانخفاض إنتاجية العلوم هو أن الابتكار الاجتماعي قد يكون هو المحرك للبحث بدلاً من الدوافع الاقتصادية.

على الرغم من أن الدراسة الحالية قد ابتعدت عن الدراسات السابقة في عدة جوانب، إلا أن هناك حاجة إلى مزيد من البحث لمعالجة قيودها. أولاً، كان عدد الموضوعات التي تم إنشاؤها تلقائيًا صغيرًا، مما يعني أنه من المحتمل أن تكون هناك موضوعات ناشئة إضافية يمكن تحديدها في الدراسات اللاحقة. ومع ذلك، اعتمد التحقيق الحالي نهجًا محافظًا لضمان أن الموضوعات المحددة كانت ذات مغزى، خاصة عند النظر في أن التوزيعات مشوهة بشدة. يجب أن تأخذ الأبحاث المستقبلية أيضًا في الاعتبار كيفية تحسين المستوى الذي لا تزال فيه الموضوعات الناشئة محددة بشكل مقبول، على سبيل المثال، التجميع التكراري على بيانات بيبليومترية واسعة النطاق (انظر ميخيا وكاجيكاوا 2020) مع موازنة تنوع المجالات وتشابه الموضوعات الناشئة. بالإضافة إلى ذلك، يتطلب نهج معالجة اللغة الطبيعية المعتمد هنا كمية كبيرة من قوة الحوسبة، مما قد يشكل تحديًا للتطبيقات اليومية العالمية والأغراض السياسية.

قيود أخرى هي أن بياناتنا مقيدة بمقالات المجلات العلمية في WoS. قد لا يتم اشتقاق جميع الابتكارات – خاصة الابتكارات الاجتماعية – من مجالات العلوم والتكنولوجيا. قد يتجاهل هذا النهج أيضًا التخصصات التي تميل إلى إنتاج أنواع أخرى من المنشورات. قد يوفر نهج أوسع يأخذ في الاعتبار هذه الأنواع من التعددية التخصصية مصادر بديلة لتحديد الابتكار الاجتماعي. أخيرًا، بينما ركزت هذه الدراسة على خصائص محددة للظهور المحددة من خلال التعددية التخصصية في WoS، يجب أن تأخذ تقييمات الأبحاث المستقبلية في الاعتبار “قيمة وتأثير جميع مخرجات البحث” و”النظر في مجموعة واسعة من مقاييس التأثير”، كما هو مذكور في إعلان سان فرانسيسكو حول تقييم البحث (كاجان 2013). بدلاً من إعادة تعريف الظهور من خلال خرائط العلوم، كانت هذه الدراسة تهدف إلى استكشاف نهج مختلف لفهم الظهور من خلال تقديم منظور بديل حول الظهور.

يمكن لعلم العلوم ربط خزانات المعرفة الموجودة لتطوير التكنولوجيا، خاصة مع تأثير التحديات العالمية على اتجاه ظهور العلوم التي يمكن تطبيقها على ابتكار تقنيات جديدة. يمكن أن يساعد فهم أفضل للموضوعات الموجودة التي تعبر المجالات، وبالتالي، تولد نتائج وحلول جديدة مبتكرة في تطبيق

الجدول 6 أفضل 10 مجلات حسب أزواج الفئات متعددة التخصصات.

الفئات متعددة التخصصات	عنوان المجلة	عدد الوثائق	النسبة
LSB-TE	مجلة الإنتاج النظيف	5589	21.4%
	علوم البيئة والتكنولوجيا	4524	17.3%
	مجلة المواد الخطرة	2417	9.2%
	البحث الطبي الحيوي – الهند	1933	7.4%
	الهندسة البيئية	1615	6.2%
	إدارة النفايات	1368	5.2%
	نمذجة البيئة والبرمجيات	735	2.8%
	التقدم البيئي والطاقة المستدامة	652	2.5%
	الحفاظ على الموارد وإعادة التدوير	541	2.1%
	التقنيات النظيفة والسياسة البيئية	512	2.0%
LSB-PS	المجلة الدولية للجزيئات الحيوية الكبيرة	3347	31.6%
	علم الجغرافيا القديمة وعلم المناخ القديم وعلم البيئة القديمة	1257	11.9%
	البيومكرومولكولات	1250	11.8%
	مجلة ICES لعلوم البحار	698	6.6%
	البحث في العصر الطباشيري	586	5.5%
	البحث في البحار والمياه العذبة	501	4.7%
	طرق إحصائية في البحث الطبي	399	3.8%
	مجلة المياه والصحة	282	2.7%
	علم المحيطات القديمة	268	2.5%
	المناعة الغذائية والزراعية	259	2.4%
PS-TE	إزالة الملح ومعالجة المياه	5622	11.5%
	الهندسة الحرارية التطبيقية	5040	10.3%
	المجلة الدولية لنقل الحرارة والكتلة	3791	7.7%
	الكيمياء والهندسة المستدامة ACS	2468	5.0%
	مجلة الهيدرولوجيا	2206	4.5%
	التقدم في الهندسة الميكانيكية	1931	3.9%
	هندسة المحيطات	1639	3.3%
	مجلة IEEE للمواضيع المختارة في الملاحظات الأرضية التطبيقية والاستشعار عن بعد	1447	3.0%
	الاحتراق واللهب	1093	2.2%
	الموجات فوق الصوتية وكيمياء الصوت	1089	2.2%
LSB-PS-TE	مجلة الرسوم الجزيئية والنمذجة	570	63.0%
	جيوكارتو الدولية	212	23.4%
	مراجعة المخاطر الطبيعية	115	12.7%
	جيوكارتو الدولية	8	0.9%

علم العلوم إلى مبادرات سياسة STI القابلة للتطبيق والفعالة التي تتضمن أهداف الابتكار الاجتماعي أيضًا.

توفر البيانات

البيانات التي تدعم نتائج هذه الدراسة متاحة من Web of Science ولكن هناك قيود على توفر هذه البيانات، التي تم استخدامها بموجب ترخيص للدراسة الحالية، وبالتالي فهي غير متاحة للجمهور. ومع ذلك، تتوفر البيانات من المؤلفين عند الطلب المعقول وبإذن من Web of Science.

تاريخ الاستلام: 22 يونيو 2023؛ تاريخ القبول: 12 أبريل 2024؛
تم النشر عبر الإنترنت: 10 مايو 2024

ملاحظة

1 للتوضيح، في هذه الورقة نشير إلى “التقارب” على أنه التقارب التكنولوجي فيما يتعلق بتحقيق تقنيات جديدة ما لم يُذكر خلاف ذلك.

clarivate.libguides.com/c.php?g

.
3 https://www.sbert.net/docs/pretrained_models.html.

هو المصطلح و

يشير إلى الفئة،

هو تكرار المصطلح

المستخرج من الفئة

هو العدد الإجمالي للمصطلحات من الفئة

هو العدد الإجمالي للوثائق.
5 القائمة الكاملة لتصنيف العلوم: https://support.clarivate.com/ ScientificandAcademicResearch/s/article/Web-of-Science-List-of-Subject-Classifications-for-All-Databases?language=en_US.
6 تم استخدام الموجه التالي مع ChatGPT (GPT-4): لدي موضوع يحتوي على المنشورات العلمية المتعلقة بـ [“اسم العلوم متعددة التخصصات”]. يتم وصف الموضوع بالكلمات الرئيسية التالية: [“قائمة الكلمات الرئيسية”] بناءً على المعلومات أعلاه، هل يمكنك إعطاء تسمية قصيرة للموضوع؟

References

Ahmed SA (2016) Removal of lead and sodium ions from aqueous media using natural wastes for desalination and water purification. Desalination Water Treat. 57(19):8911-8926
Archambault É, Campbell D, Gingras Y, Larivière V (2009) Comparing bibliometric statistics obtained from the Web of Science and Scopus. J Am Soc Inf Sci Technol. 60(7):1320-1326
Asyaky MS, Mandala R (2021) Improving the Performance of HDBSCAN on Short Text Clustering by Using Word Embedding and UMAP. Proc 2021 8th Int Conf Adv Inform Concepts Theory Appl 2021:1-6. https://doi.org/10.1109/ ICAICTA53211.2021.9640285
Balcı U, Sirivianos M, Blackburn J (2023) A data-driven understanding of left-wing extremists on social media. Preprint. arXiv preprint arXiv:2307.06981
Bataille CP, Watford D, Ruegg S, Lowe A, Bowen GJ (2016) Chemostratigraphic age model for the Tornillo Group: A possible link between fluvial stratigraphy and climate. Palaeogeogr Palaeoclimatol Palaeoecol 457:277-289
Berah R, Ghorbani M, Moghadamnia AA (2017) Synthesis of a smart pH responsive magnetic nanocomposite as high loading carrier of pharmaceutical agents. Int J Biol Macromol 99:731-738
Blei DM, Lafferty J (2007) A correlated topic model of science. Annals Appl Stat 1(1). https://doi.org/10.1214/07-aoas114
Bloom N, Jones CI, Van Reenen J, Webb M (2020) Are ideas getting harder to find? Am Econ Rev 110(4):1104-1144
Bobovská A, Tvaroška I, Kóňa J (2016) Using DFT methodology for more reliable predictive models: Design of inhibitors of Golgi

-mannosidase II. J Mol Graph Model 66:47-57
Bonacich P (2007) Some unique properties of eigenvector centrality. Soc Netw 29(4):555-564. https://doi.org/10.1016/j.socnet.2007.04.002
Börner K, Rouse WB, Trunfio P, Stanley HE (2018) Forecasting innovations in science, technology, and education. Proc Natl Acad Sci 115(50):12573-12581
Bornmann L (2013) What is societal impact of research and how can it be assessed? A literature survey. J Am Soc Inf Sci Technol 64(2):217-233

Bornmann L, Marx W (2014) How should the societal impact of research be generated and measured? A proposal for a simple and practicable approach to allow interdisciplinary comparisons. Scientometrics 98:211-219
Bornmann L, Mutz R (2015) Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. J Assoc Inf Sci Technol 66(11):2215-2222
Boyack K, Glänzel W, Gläser J, Havemann F, Scharnhorst A, Thijs B, van Eck NJ, Velden T, Waltmann L (2017) Topic identification challenge. Scientometrics 111:1223-1224
Boyack KW (2017) Investigating the effect of global data on topic detection. Scientometrics 111(2):999-1015
Cagan R (2013) The San Francisco declaration on research assessment. Dis Models Mech 6(4):869-870
Candelier K, Hannouz S, Thévenon MF, Guibal D, Gérardin P, Pétrissans M, Collet R (2017) Resistance of thermally modified ash (Fraxinus excelsior L.) wood under steam pressure against rot fungi, soil-inhabiting micro-organisms and termites. Eur J Wood Wood Prod 75:249-262
Capra L (2024) A computational linguistic approach to study border theory at scale. ACM Trans Comput-Hum Interaction 37(4):1-23
Chakraborty T (2018) Role of interdisciplinarity in computer sciences: quantification, impact and life trajectory. Scientometrics 114:1011-1029
Chen C (2006) CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. J Am Soc Inf Sci Technol 57(3):359-377
Chen C (2017) Science mapping: a systematic review of the literature. J Data Inf Sci 2(2):1-40
Chen J, Shen SZ, Li XH, Xu YG, Joachimski MM, Bowring SA, Mu L (2016) Highresolution SIMS oxygen isotope analysis on conodont apatite from South China and implications for the end-Permian mass extinction. Palaeogeogr Palaeoclimatol Palaeoecol 448:26-38
Chian SC, Wilkinson SM (2015) Feasibility of remote sensing for multihazard analysis of landslides in Padang Pariaman during the 2009 Padang earthquake. Nat Hazards Rev 16(1):05014004
Chu JS, Evans JA (2021) Slowed canonical progress in large fields of science. Proc Natl Acad Sci 118(41):e2021636118
Colombo M, Fairweather M (2016) Accuracy of Eulerian-Eulerian, two-fluid CFD boiling models of subcooled boiling flows. Int J Heat Mass Transf 103:28-44
Curran CS, Leker J (2011) Patent indicators for monitoring convergence – examples from NFF and ICT. Technol Forecast Soc Change 78(2):256-273. https://doi. org/10.1016/j.techfore.2010.06.021
Daabo AM, Al Jubori A, Mahmoud S, Al-Dadah RK (2017) Development of threedimensional optimization of a small-scale radial turbine for solar powered Brayton cycle application. Appl Therm Eng 111:718-733
Day GS, Schoemaker PJ (2000) Avoiding the pitfalls of emerging technologies. Calif Manag Rev 42(2):8-33
de Lima BC, Baracho RMA, Mandl T, Porto PB (2023) Reactions to science communication: discovering social network topics using word embeddings and semantic knowledge. Soc Netw Anal Min 13(1):119
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies – Proceedings of the Conference, 1(Mlm), 4171-4186
Ding Q, Chen W, Hong H (2017) Application of frequency ratio, weights of evidence and evidential belief function models in landslide susceptibility mapping. Geocarto Int. 32(6):619-639
Egle L, Rechberger H, Zessner M (2015) Overview and description of technologies for recovering phosphorus from municipal wastewater. Resour Conserv Recycl 105:325-346
Eisenhardt KM, Martin JA (2000) Dynamic capabilities: What are they? Strategic Manag J 21(10):1105-1121
Eum W, Maliphol S (2023) Southeast Asian catch-up through the convergence of trade structures. Asian J Technol Innov 31(2):422-446
Fagerberg J, Landström H, Martin BR (2012) Exploring the emerging knowledge base of “the knowledge society. Res Policy 41(7):1121-1131. https://doi.org/ 10.1016/j.respol.2012.03.007

Feldman MP, Kogler DF, Rigby DL (2015) rKnowledge: The spatial diffusion and adoption of rDNA methods. Regional Stud 49(5):798-817. https://doi.org/10. 1080/00343404.2014.980799
Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B, Vespignani A, Waltman L, Wang D, Barabási AL (2018) Science of science. Science 359(6379). https://doi.org/10. 1126/science.aao0185
French B, Saha-Chaudhuri P, Ky B, Cappola TP, Heagerty PJ (2016) Development and evaluation of multi-marker risk scores for clinical prognosis. Stat Methods Med Res 25(1):255-271
Glänzel W, Thijs B (2012) Using “core documents” for detecting and labelling new emerging topics. Scientometrics 91(2):399-416. https://doi.org/10.1007/ s11192-011-0591-7

Gläser J, Glänzel W, Scharnhorst A (2017) Same data-different results? Towards a comparative approach to the identification of thematic structures in science. Scientometrics 111:981-998
Glenisson P, Glänzel W, Janssens F, De Moor B (2005) Combining full text and bibliometric information in mapping scientific disciplines. Inf Process Manag 41(6):1548-1572. https://doi.org/10.1016/j.ipm.2005.03.021
Gomes S, Rodrigues G, Martins G, Henriques C, Silva JC (2017) Evaluation of nanofibrous scaffolds obtained from blends of chitosan, gelatin and polycaprolactone for skin tissue engineering. Int J Biol Macromol 102:1174-1185
Griffith R, Redding S, Van Reenen J (2004) Mapping the two faces of R&D: Productivity growth in a panel of OECD industries. Rev Econ Stat 86(4):883-895
Grootendorst M (2022) BERTopic: Neural topic modeling with a class-based TFIDF procedure. http://arxiv.org/abs/2203.05794
Harijani AM, Mansour S, Karimi B, Lee CG (2017) Multi-period sustainable and integrated recycling network for municipal solid waste-A case study in Tehran. J. Clean. Prod. 151:96-108
Heo PS, Lee DH (2019) Evolution patterns and network structural characteristics of industry convergence. Struct Change Econ Dyn 51:405-426. https://doi. org/10.1016/j.strueco.2019.02.004
Jones BF (2009) The burden of knowledge and the “death of the renaissance man”: Is innovation getting harder? Rev. Econ Stud. 76(1):283-317
Jung S, Segev A (2022a) Analyzing the generalizability of the network-based topic emergence identification method. Semantic Web 13(3):423-439
Jung S, Segev A (2022b) Identifying a common pattern within ancestors of emerging topics for pan-domain topic emergence prediction. Knowl Based Syst 258:110020
Kain G, Barbu MC, Richter K, Plank B, Tondi G, Petutschnigg A (2015) Use of tree bark as insulation material. For Products J 65(3-4):S16-S16
Kasperiuniene J, Briediene M, Zydziunaite V (2020) Automatic content analysis of social media short texts: scoping reviewof methods and tools. In Costa. A.P., Reis, L.P., & Moreira, A. (eds.) Computer Supported Qualitative Research: New Trends on Qualitative Research(WCQR2019) 4, 89-101
Khan AM, Shawon J, Halim MA (2017) Multiple receptor conformers based molecular docking study of fluorine enhanced ethionamide with mycobacterium enoyl ACP reductase (InhA). J Mol Graph Model 77:386-398
Khan GF, Wood J (2015) Information technology management domain: emerging themes and keyword analysis. Scientometrics 105(2):959-972. https://doi.org/ 10.1007/s11192-015-1712-5

Kim K, Jung S, Hwang J (2019) Technology convergence capability and firm innovation in the manufacturing sector: an approach based on patent network analysis. RD Manag 49(4):595-606. https://doi.org/10.1111/radm. 12350
Kim K, Jung S, Hwang J, Hong A (2018) A dynamic framework for analyzing technology standardisation using network analysis and game theory. Technol Anal Strat Manag 30(5):540-555. https://doi.org/10.1080/09537325.2017. 1340639
Kim MC, Chen C (2015) A scientometric review of emerging trends and new developments in recommendation systems. Scientometrics 104:239-263
Klavans R, Boyack KW (2011) Using global mapping to create more accurate document-level maps of research fields. J Am Soc Inf Sci Technol 62(1):1-18
Kogler DF, Essletzbichler J, Rigby DL (2017) The evolution of specialization in the EU15 knowledge space. J. Econ Geogr 17(2):345-373. https://doi.org/10. 1093/jeg/lbw024
Kogler DF, Whittle A, Buarque B (2022) The Science Space of Artificial Intelligence Knowledge Production. In: Kurz HD, Schütz M, Strohmaier R, Zilian SS (eds) The Routledge Handbook of Smart Technologies: An Economic and Social Perspective. Routledge, London, pp 241-268 https://doi.org/10.4324/ 9780429351921
Kozlow M (2023) “Disruptive” science has declined-even as papers proliferate. Springe Nat 613:225
Kwon S, Liu X, Porter AL, Youtie J (2019) Research addressing emerging technological ideas has greater scientific impact. Res Policy 48(9):103834. https:// doi.org/10.1016/j.respol.2019.103834
Larivière V, Haustein S, Börner K (2015) Long-distance interdisciplinarity leads to higher scientific impact. Plos One 10(3):e0122565
Lavalette A, Cointe A, Pommier R, Danis M, Delisée C, Legrand G (2016) Experimental design to determine the manufacturing parameters of a greenglued plywood panel. Eur J Wood Prod 74:543-551
Lee C, Kogler DF, Lee D (2019) Capturing information on technology convergence, international collaboration, and knowledge flow from patent documents: A case of information and communication technology. Inf Process Manag 56:1576-1591
Lee C, Hong S, Kim J (2021) Anticipating multi-technology convergence: a machine learning approach using patent information. Scientometrics 126(3):1867-1896. https://doi.org/10.1007/s11192-020-03842-6
Lee WS, Han EJ, Sohn SY (2015) Predicting the pattern of technology convergence using big-data technology on large-scale triadic patents. Technol Forecast Soc Change 100:317-329. https://doi.org/10.1016/j.techfore.2015.07.022

Leydesdorff L (2018) Diversity and interdisciplinarity: how can one distinguish and recombine disparity, variety, and balance? Scientometrics 116:2113-2121
Leydesdorff L, Rafols I (2011) Indicators of the interdisciplinarity of journals: Diversity, centrality, and citations. J Informetr 5(1):87-100. https://doi.org/ 10.1016/j.joi.2010.09.002

Leydesdorff L, Rafols I, Chen C (2013) Interactive overlays of journals and the measurement of interdisciplinarity on the basis of aggregated journal-journal citations. J Am Soc Inf Sci Technol 64(12):2573-2586
Leydesdorff L, Wagner CS, Bornmann L (2018) Betweenness and diversity in journal citation networks as measures of interdisciplinarity-A tribute to Eugene Garfield. Scientometrics 114:567-592
Leydesdorff L, Wagner CS, Bornmann L (2019) Interdisciplinarity as diversity in citation patterns among journals: Rao-Stirling diversity, relative variety, and the Gini coefficient. J Informetr 13(1):255-269
Liu HQ, Li XL (2017) Effect of nursing intervention on liver cancer patients undergoing interventional therapy. Biomed Res 28(12):5285-5288
Liu D, Zhao H, Liu B, Zhang X, Ma Q (2017) Analysis on the expression level of serum MMP-7 in patients with abdominal aortic aneurysm accompanied by hypertension and clinical efficacy of endovascular graft exclusion. Biomed Res (0970-938X), 28(3)
Lowery CM, Cunningham R, Barrie CD, Bralower T, Snedden JW (2017) The northern Gulf of Mexico during OAE2 and the relationship between water depth and black shale development. Paleoceanography 32(12):1316-1335
Lu T (2017) Bayesian nonparametric mixed-effects joint model for longitudinalcompeting risks data analysis in presence of multiple data features. Stat Methods Med Res 26(5):2407-2423
Luo S, Lawson AB, He B, Elm JJ, Tilley BC (2016) Bayesian multiple imputation for missing multivariate longitudinal data from a Parkinson’s disease clinical trial. Stat Methods Med Res 25(2):821-837
Lyutov A, Uygun Y, Hütt MT (2021) Machine learning misclassification of academic publications reveals non-trivial interdependencies of scientific disciplines. Scientometrics 126(2):1173-1186. https://doi.org/10.1007/s11192-020-03789-8
MacKay DJ (2003) Information theory, inference and learning algorithms. Cambridge University Press
Mane KK, Börner K (2004) Mapping topics and topic bursts in PNAS. Proc Natl Acad Sci USA 101(SUPPL. 1):5287-5290. https://doi.org/10.1073/pnas. 0307626100
McInnes L, Healy J, Melville J (2016) UMAP: Uniform manifold approximation and projection for dimension reduction. http://arxiv.org/abs/1802.03426
Mejia C, Kajikawa Y (2020) Emerging topics in energy storage based on a largescale analysis of academic articles and patents. Appl Energy 263:114625. https://doi.org/10.1016/j.apenergy.2020.114625
Newman D, Bonilla EV, Buntine W(2011) Improving topic coherence with regularized topic models. Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011:1-9
Palma-Rojas S, Caldeira-Pires A, Nogueira JM (2017) Environmental and economic hybrid life cycle assessment of bagasse-derived ethanol produced in Brazil. Int J Life Cycle Assess 22:317-327
Petersen AM, Ahmed ME, Pavlidis I (2021) Grand challenges and emergent modes of convergence science. Human Soc Sci Commun 8(1):1-15
Qian Y, Härdle WK, Chen C (2017) Industry Interdependency Dynamics in a Network Context. SFB 649 Discussion Paper 2017-012, Humboldt University of Berlin. https://doi.org/10.2139/ssrn. 2961703
Qi Y, Hao S, Zhang J, Zhao C, Lian Y (2017) Effects of comprehensive nursing on the pain and joint functional recovery of patients with hip replacements. Biomed Res India 28:12
Rafols I, Meyer M (2010) Diversity and network coherence as indicators of interdisciplinarity: case studies in bionanoscience. Scientometrics 82(2):263-287. https://doi.org/10.1007/s11192-009-0041-y
Rafols I, Porter AL, Leydesdorff L (2010) Science overlay maps: A new tool for research policy and library management. J Am Soc Inf Sci Technol 61(9):1871-1887
Rapach DE, Strauss JK, Tu J, Zhou G (2015) Industry interdependencies and crossindustry return predictability. Working paper 12-2015. Singapore Management University, Lee Kong Chian School of Business
Rey-Martí A, Ribeiro-Soriano D, Palacios-Marqués D (2016) A bibliometric analysis of social entrepreneurship. J Bus Res 69(5):1651-1655. https://doi.org/ 10.1016/j.jbusres.2015.10.033

Rizeei HM, Saharkhiz MA, Pradhan B, Ahmad N (2016) Soil erosion prediction based on land cover dynamics at the Semenyih watershed in Malaysia using LTM and USLE models. Geocarto Int 31(10):1158-1177
Rotolo D, Hicks D, Martin BR (2015) What is an emerging technology? Res Policy 44(10):1827-1843
Saadati F, Rahmani M, Ghahramani F, Piri F, Shayani-Jam H, Yaftian MR (2017) Synthesis of a novel ion-imprinted polyaniline/hyper-cross-linked polystyrene nanocomposite for selective removal of lead (II) ions from aqueous solutions. Desalination Water Treat 82:210-218

Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513-523. https://doi.org/10.1163/187631286X00251
Samsir S, Saragih RS, Subagio S, Aditiya R, Watrianthos R (2023) BERTopic modeling of natural language processing abstracts: Thematic structure and trajectory. J Media Inform Budidarma 7(3):1514-1520
Schumpeter JA (1942) Capitalism, socialism and democracy. Harper and Row, New York
Schumpeter JA (1934) The Theory of Economic Development. Harvard Univeristy Press
Shamim A, Abbasi SW, Azam SS (2015) Structural and dynamical aspects of Streptococcus gordonii FabH through molecular docking and MD simulations. J Mol Graph Model 60:180-196
Shin H, Kim K, Kogler DF (2022) Scientific collaboration, research funding, and novelty in scientific knowledge. PLoS ONE 17(7):e0271678. https://doi.org/ 10.1371/journal.pone. 0271678

Sjögårde P (2022) Improving overlay maps of science: Combining overview and detail. Quant Sci Stud 3(4):1097-1118
Small H, Boyack KW, Klavans R (2014) Identifying emerging topics in science and technology. Res Policy 43(8):1450-1467
Song CH, Han JW, Jeong B, Yoon J (2017) Mapping the patent landscape in the field of personalized medicine. J Pharm Innov 12(3):238-248. https://doi.org/ 10.1007/s12247-017-9283-z

Sugimoto CR, Weingart S (2015) The kaleidoscope of disciplinarity. J Documentation 71(4):775-794. https://doi.org/10.1108/JD-06-2014-0082
Suominen A, Toivanen H (2016) Map of science with topic modeling: Comparison of unsupervised learning and human-assigned subject classification. J Assoc Inf Sci Technol 67(10):2464-2476. https://doi.org/10.1002/asi
Velden T, Boyack KW, Gläser J, Koopman R, Scharnhorst A, Wang S (2017) Comparison of topic extraction approaches and their results. Scientometrics 111(2):1169-1221. https://doi.org/10.1007/s11192-017-2306-1
Wang Y, Bashar MA, Chandramohan M, Nayak R (2023) Exploring topic models to discern cyber threats on Twitter: A case study on Log4Shell. Intell Syst Appl 20:200280
Wang Z, Chen J, Chen J, Chen H (2023) Identifying interdisciplinary topics and their evolution based on BERTopic. Scientometrics, 0123456789. https://doi. org/10.1007/s11192-023-04776-5
West JD, Jensen MC, Dandrea RJ, Gordon GJ, Bergstrom CT (2013) Author-level Eigenfactor metrics: Evaluating the influence of authors, institutions, and countries within the social science research network community. J Am Soc Inf Sci Technol 64(4):787-801
White K (2019) Publications Output: U.S. Trends and International Comparisons. In Nsb-2020-6. https://ncses.nsf.gov/pubs/nsb20206/
Winnink JJ, Tijssen RJW, van Raan AFJ (2019) Searching for new breakthroughs in science: How effective are computerised detection algorithms? Technol Forecast Soc Change 146:673-686. https://doi.org/10.1016/j.techfore.2018.05. 018
Wu W, Zhang S, Wang S (2017) A novel lattice Boltzmann model for the solid-liquid phase change with the convection heat transfer in the porous media. Int J Heat Mass Transf 104:675-687
Xu J, Bu Y, Ding Y, Yang S, Zhang H, Yu C, Sun L (2018) Understanding the formation of interdisciplinary research from the perspective of keyword evolution: A case study on joint attention. Scientometrics 117:973-995
Xu J, Ding Y, Bu Y, Deng S, Yu C, Zou Y, Madden A (2019) Interdisciplinary scholarly communication: an exploratory study for the field of joint attention. Scientometrics 119:1597-1619
Yau CK, Porter A, Newman N, Suominen A (2014) Clustering scientific documents with topic modeling. Scientometrics 100(3):767-786. https://doi.org/10.1007/ s11192-014-1321-8
Zahedi Z, van Eck NJ (2018) Exploring topics of interest of Mendeley users. J Altmetrics 1(1):1-12. https://doi.org/10.29024/joa. 7
Zhang J, Zhang G, Zhou Q, Ou L (2016) Thermodynamics, kinetics and isotherm studies on the removal of methylene blue from aqueous solution by calcium alginate. J Water Reuse Desalination 6(2):301-309
Zhao YM, Wang J, Wu ZG, Yang JM, Li W, Shen LX (2016) Extraction, purification and anti-proliferative activities of polysaccharides from Lentinus edodes. Int J Biol Macromol 93:136-144

الشكر والتقدير

تم دعم هذا العمل من قبل مؤسسة البحث الوطنية في كوريا (NRF) الممولة من قبل حكومة كوريا (MSIT) (رقم 2022R1G1A1006464) ومن قبل منح البحث في جامعة هاندونغ العالمية (رقم 202300710001). يود ديتر ف. كوجلر أن يعرب عن شكره للتمويل من المجلس الأوروبي للبحث (https://erc.europa.eu/) في إطار برنامج الأبحاث والابتكار الخاص بالاتحاد الأوروبي (اتفاقية المنحة رقم 715631، ERC TechEvo). علاوة على ذلك، يود المؤلفون، كيونغوي كيم وديتر ف. كوجلر، أن يعربوا عن شكرهم للتمويل من مؤسسة العلوم في أيرلندا (SFI؛ https://www.sfi.ie/) في إطار برنامج سياسة البحث العلمي في SFI (اتفاقية المنحة
رقم 17/SPR/5324، SciTechSpace). لم يكن للجهات الممولة أي دور في تصميم الدراسة، جمع البيانات وتحليلها، اتخاذ قرار النشر، أو إعداد المخطوطة.

مساهمات المؤلفين

يؤكد المؤلفون مساهمتهم في الورقة كما يلي: تصور أو تصميم العمل: كيونغوي كيم، سيرا ماليبول؛ الحصول على البيانات، تحليلها، أو تفسيرها: كيونغوي كيم، ديتر ف. كوجلر، سيرا ماليبول؛ إنشاء برنامج جديد مستخدم في العمل: كيونغوي كيم؛ صياغة العمل أو مراجعات جوهرية: كيونغوي كيم، ديتر ف. كوجلر، سيرا ماليبول؛ المؤلف المراسل: سيرا ماليبول، المراسلة إلى: sira.maliphol@sunykorea.ac.kr

المصالح المتنافسة

يعلن المؤلفون عدم وجود مصالح متنافسة.

الموافقة الأخلاقية

لم تكن الموافقة الأخلاقية مطلوبة حيث لم تتضمن الدراسة مشاركين بشريين.

لم تكن الموافقة المستنيرة مطلوبة حيث لم تتضمن الدراسة مشاركين بشريين.

معلومات إضافية

يجب توجيه المراسلات وطلبات المواد إلى سيرا ماليبول.
معلومات إعادة الطبع والإذن متاحة على http://www.nature.com/reprints
ملاحظة الناشر تظل Springer Nature محايدة فيما يتعلق بالمطالبات القضائية في الخرائط المنشورة والانتماءات المؤسسية.

الوصول المفتوح هذه المقالة مرخصة بموجب رخصة المشاع الإبداعي للاستخدام، المشاركة، التكيف، التوزيع وإعادة الإنتاج في أي وسيلة أو صيغة، طالما أنك تعطي الائتمان المناسب للمؤلفين الأصليين والمصدر، وتوفر رابطًا لرخصة المشاع الإبداعي، وتوضح ما إذا كانت هناك تغييرات قد أُجريت. الصور أو المواد الأخرى من طرف ثالث في هذه المقالة مشمولة في رخصة المشاع الإبداعي للمقالة، ما لم يُشار إلى خلاف ذلك في سطر الائتمان للمادة. إذا لم تكن المادة مشمولة في رخصة المشاع الإبداعي للمقالة واستخدامك المقصود غير مسموح به بموجب اللوائح القانونية أو يتجاوز الاستخدام المسموح به، ستحتاج إلى الحصول على إذن مباشرة من صاحب حقوق الطبع والنشر. لعرض نسخة من هذه الرخصة، قم بزيارة http://creativecommons.org/licenses/by/4.0/.
© المؤلفون 2024

مدرسة الذكاء الاصطناعي التطبيقي، جامعة هاندونغ العالمية، بوهانغ، كوريا الجنوبية. مختبر الديناميات المكانية، كلية العمارة، التخطيط والسياسة البيئية ومركز التحليل البياني، كلية دبلن الجامعية، دبلن، أيرلندا. قسم التكنولوجيا والمجتمع، جامعة ولاية نيويورك، سونغدو، كوريا الجنوبية. البريد الإلكتروني: sira.maliphol@sunykorea.ac.kr

Journal: Humanities and Social Sciences Communications, Volume: 11, Issue: 1
DOI: https://doi.org/10.1057/s41599-024-03044-y
Publication Date: 2024-05-10

Identifying interdisciplinary emergence in the science of science: combination of network analysis and BERTopic

Keungoui Kim® , Dieter F. Kogler © & Sira Maliphol

Abstract

Global scientific output is expanding exponentially, which in turn calls for a better understanding of the science of science and especially how the boundaries of scientific fields expand through processes of emergence. The present study proposes the application of embedded topic modeling techniques to identify new emerging science via knowledge recombination activities as evidenced through the analysis of research publication metadata. First, a dataset is constructed from metadata derived from the Web of Science Core Collection database. The dataset is then used to generate a global map representing a categorical scientific co-occurrence network. A research field is defined as interdisciplinary when multiple science categories are listed in its description. Second, the co-occurrence networks are subsequently compared between periods to determine changing patterns of influence in light of interdisciplinarity. Third, embedded topic modeling enables unsupervised association of interdisciplinary classification. We present the results of the analysis to demonstrate the emergence of global interdisciplinary sciences and further we perform qualitative validation on the results to identify what the sources of the emergent areas are. Based on these results, we discuss potential applications for identifying emergence through the merging of global interdisciplinary domains.

Introduction

Science-driven research productivity and associated innovation processes have become increasingly complex for a number of reasons (Bloom et al. 2020; Boyack et al. 2017; Chen 2006; Chu and Evans 2021; Jones 2009; Kozlow 2023). Globally, over 2.6 million scientific articles were published in 2018 alone (White 2019). As scientific output increases over time, there has also been an increasing variety of sources of emergent topics as a result of the recombination of subjects and fields. Emergent topics that cross science fields are expected to be less path dependent than past patterns of scientific knowledge production. In line with this, Fortunato et al. (2018) emphasize the need to understand the science of science, especially as disciplinary boundaries break down.

Contemporary science is a dynamical system of undertakings driven by complex interactions among social structures, knowledge representations, and the natural world. Scientific knowledge is constituted by concepts and relations embodied in research papers, books, patents, software and other scholarly artifacts, organized into scientific disciplines and broader fields. These social, conceptual, and material elements are connected through formal and informal flows of information, ideas, research practices, tools, and samples. Science can thus be described as a complex, self-organizing, and constantly evolving multiscale network. (Fortunato et al. 2018, p. 1)
While research output has risen, scientific productivity-or the value derived from that output-has fallen across fields (Bloom et al. 2020). The rate of innovation has slowed because the level of specialization (Jones 2009) and the size of teams (Kozlow 2023) needed to conduct science has increased. Intertwined with specialization and team size, the costs of research and development have sharply risen, reducing the rate of science productivity (Bloom et al. 2020). Another reason is how emergence has been measured. For instance, as the volume of scientific output increases, the ability to evaluate emerging research topics decreases because canonical literature is more likely to be cited (Chu and Evans 2021). “Could we be missing fertile new paradigms because we are locked into overworked areas of study?” (Chu and Evans 2021, p.5). Moreover, could we be misidentifying where emerging value is derived from science?

This has important implications considering the importance of scientific forecasting for understanding and developing effective science, technology, and innovation (STI) policy initiatives that aim to support science and to predict innovation trajectories (Börner et al. 2018). Essentially, innovative outcomes are frequently the result of converging technologies that often heavily depend on interdisciplinary scientific inputs (Kogler et al. 2022). Thus, and perhaps not surprisingly, contemporary attempts to address and to meet global grand challenges are directed toward interdisciplinary research where a deep integration of disciplines that combine different types of scientific and technological paradigms in genomic/ biotechnology, nanotechnology, and information technology (e.g., blockchain, sensors, AI, and Big Data) are often believed to be the most promising avenues to pursue (Petersen et al. 2021). Recent examples, such as the mRNA vaccine for COVID-19, confirm this notion as they are usually the result of several decades of scientific research that might only become highly effective once the advances in various scientific fields are combined in a single applicable technological solution or innovation. Past convergence

stems from emergent interdisciplinary fields, e.g., biotechnology, which further catalyze innovations from other sectors (Feldman et al. 2015). Thus, changes at the interdisciplinary boundaries that are in flux may provide further insights into potential future convergence activities.

New discoveries, especially those with multi-disciplinary roots, are usually difficult to attribute to existing classification schemas (Fagerberg et al. 2012), but equally, they define the frontier of the innovation process as they combine existing forms of knowledge into something entirely novel (Eisenhardt and Martin 2000; Lee et al. 2015; Schumpeter 1934; 1942). Thus, interdisciplinary fields of science can be used to define the emergence of new topics (Chakraborty 2018; Khan and Wood 2015; Lee et al. 2015). Utilizing bibliometric network analysis on publication metadata, the present study proposes an approach capable of identifying from where interdisciplinary science fields emerge based on a global scientific map that indicates also changes in the growth of influence.

Specifically, the investigation employs topic modeling to classify scientific research topics from a large amount of data using unsupervised algorithms. The suggested embedded topic modeling approach then enables identification of emerging science topics in line with Schumpeterian notions of knowledge recombination processes where it is possible to observe how the combination of multiple disciplines or science categories unfolds over time. Unlike technology convergence that has been studied more systematically (Lee et al. 2019), few studies, to the best of our knowledge, have directed similar research efforts towards interdisciplinary knowledge recombination processes and how these might impact the overall evolution of the entire scientific knowledge landscape and subsequent innovation outcomes. Moreover, the application of topic modeling in natural language processing (NLP) environs to emerging interdisciplinary science studies holds the potential to provide important insights. The novel approach of combining embedded topic modeling and cooccurrence network analysis methods across global science maps can help with identifying emerging science topics before they consolidate into fields and predict those with potential value for knowledge recombination leading to global convergence.

The overarching goal is to analyze the complexity, self-organization, and evolution of scientific knowledge production while sifting through a large volume of scientific publications, and to understand how it might be possible to anticipate scientific innovations as they emerge from converging areas of research. The main objective of the present study is then to provide a novel approach to the bibliometric analyses toolkit by combining network analysis and embedded topic modeling techniques for the identification of emergent scientific topics of research interdisciplinarity.

Further, a novel measure for emergent topics is developed and employed, utilizing the network centrality index. Additionally, we leverage an embedded topic modeling technique, specifically BERTopic (Bidirectional Encoder Representations from Transformers), to gain insights into the emergent and globally domaincrossing profiles within interdisciplinary science fields. Through this comprehensive approach, we aim to illuminate the evolution of the science of science by investigating the changing boundaries of interdisciplinary research.

In the following sections, we provide an overview of the relevant literature in this line of inquiry, introduce the methodology followed by overall and detailed empirical findings, and finally offer a detailed discussion and some concluding thoughts.

Literature review

Science maps were developed to understand patterns related to the science of science, which include identifying topics of interest (Zahedi and van Eck 2018), identifying growth rates of science (Bornmann and Mutz 2015), identifying topic emergence (Jung and Segev 2022a), and detecting patterns and trends in the
scientific literature (Kim and Chen 2015), especially through new combinations of interdisciplinary fields of science and technologies (Blei and Lafferty 2007; Eum and Maliphol 2023; Khan and Wood 2015; Lee et al. 2015). Science maps are network representations of the scientific literature that have evolved in research approaches (Chen 2006). Underlying these past approaches is an emphasis on finding radically new innovations within a specialized domain of science.

The evolution of the literature on emergence began with citation analysis and currently combines methods that identify network patterns using topic modeling techniques (Rotolo et al. 2015). Network analysis is commonly used to map the trends and patterns in the scientific literature, e.g., linked through citations, including the emergence of new seminal discoveries that change the course of a science specialization (Chen 2006). Science mapping linking research literature through citations can be used to demonstrate different evolutionary stages of scientific development over time, allowing the identification of transformative contributions through predictive analysis (Chen 2017). Models have been designed to include different aspects of the science of science. Science overlay maps represent subsets or networks of publications of global base maps, distinguishing different levels of research field categorization (Sjögårde 2022).

Emerging technologies from science can be defined by characteristics measured through bibliometric indicators and text analysis (Rotolo et al. 2015). By combining full-text analysis and bibliometric indicators, Glenisson et al. (2005) piloted a study that demonstrated the usefulness of data mining and bibliometric techniques that facilitate mapping fields of science. Patterns of scientific emergence have been modeled through clustering (Glänzel and Thijs 2012; Yau et al. 2014), national output (Suominen and Toivanen 2016), and using networks to demonstrate emergence (Khan and Wood 2015).

The emergent topics are expected to grow rapidly out of uncertain and ambiguous areas of research and converge to make a novel impact (Rotolo et al. 2015). Past studies on emergence focus on local maps or predefined areas of study, e.g. Curran and Leker (2011) on the nutraceuticals industry; Rey-Martí et al. (2016) on social entrepreneurship; and Song et al. (2017) on personalized medicine. Existing studies that demonstrate emergence have been carried out through bibliometric analyses using frequency-based topic modeling techniques that identified science topics (Griffith et al. 2004), topic coherence (Newman et al. 2011), topic “bursts” (Mane and Börner 2004), and patterns of scientific breakthrough (Winnink et al. 2019). Emergence is often identified through a measure of diversity within the local map, e.g., Rao-Stirling diversity and relative variety (Leydesdorff and Rafols 2011; Leydesdorff et al. 2019; Rafols and Meyer 2010).

The studies of emergent science are limited in scope by constraining fields of study through specific journals, articles, or authors. Once the science map is generated, topic modeling is analyzed based on network values generated from the map. The terms with higher frequency in the corpus are identified as emergent topic clusters. Thus, these studies examine the science of science generated within a science subject, category, or journal group based on measures of frequency and diversity within a local map. These approaches define the distance of interdisciplinarity through relative measures within the field of science. By relying on frequency, past approaches are more subject to canonical bias and may ignore context. Thus, the influence or importance of an interdisciplinary science pair in a science map offers an alternative approach to identifying emergence.

Novelty is also necessary to define emergence (Rotolo et al. 2015). Novelty can be identified through the merging of previously separate “streams of research” or fields of science (Day and Schoemaker 2000; Shin et al. 2022; Small et al. 2014). Thus,
another measure of emergent organization is fast-growing multiple field or technology interdisciplinarity (Bornmann 2013; Bornmann and Marx 2014; Lee et al. 2021; Leydesdorff et al. 2013). Over time, research has become increasingly interdisciplinary (Chakraborty 2018). Research fields go through three stages: growth, maturity, and interdisciplinarity (Chakraborty 2018).

How disciplines are classified and differentiated, however, is still unsettled and still needs to be operationalized (Sugimoto and Weingart 2015). One method of defining disciplines is by using data-based publication indices such as Web of Science (WoS) categories (Sugimoto and Weingart 2015). Following this, interdisciplinarity can be modeled using keywords, authors’ fields of study, and citations that cross multiple disciplines (Chakraborty 2018; Xu et al. 2018, 2019). Topic prediction using network analysis has been used to find emergent patterns across domains that are pre-defined and linked through co-occurrence frequency (Jung and Segev 2022b).

The measure of interdisciplinarity must balance variety and similarity (Leydesdorff 2018). When comparing against global data, limiting topic detection within a single discipline neglects to consider the increasingly interdisciplinary nature in which science is conducted (Boyack 2017). Using global maps leads to more accurate partitions and higher textual coherence of topics because the entire context is preserved. (Klavans and Boyack 2011). Moreover, long distances between interdisciplinary topics tend to have a higher scientific impact (Larivière et al. 2015). When scientific research incorporates new technological ideas, the convergent science tends to have a greater impact (Kwon et al. 2019). Further, humanities and social science research tends to have lower citation density which leads to lower measures of interdisciplinarity (Larivière et al. 2015).

While many investigations use interdisciplinary measures of emergence, past studies frequently restricted the analysis to local science maps that focus on a narrow field of science using relative measures for emergence. Furthermore, the formation of interdisciplinary research in the relevant literature has been mainly modeled through the evolution of keyword co-occurrence (Xu et al. 2018). Thus, one of the significant limitations of existing studies concerning the identification of thematic structures and dynamic patterns is that researchers constructed scientific maps around pre-defined topics (Gläser et al. 2017). By limiting the topic scope, the approaches resorted to using frequency-based measures of variety to determine relative novelty, and speed to define emergence. Frequency-based keyword evolution, however, can constrain our understanding of interdisciplinarity, disregard context, and intensify canonical bias. In contrast, global science maps can provide unbiased results if the size of the documents is sufficiently large (Rafols et al. 2010). While some studies differentiate between multi-, inter-, and trans-disciplinary (Chakraborty 2018; Leydesdorff et al. 2018), the operationalization of these distinctions remains limited. Thus, this study distinguishes the concept of growing and dominant sciences focused on broadly identifying the importance of interdisciplinarity across networks of STEM domains.

Methodology

The present study combines network analysis and BERTopic and applies it to understand cross-domain topic areas. BERTopic is an integrated topic modeling technique using embedding vector and c-TF-IDF to create dense clusters allowing interpretable topics from text data. Traditional text analysis is a labor-intensive activity that limits sample sizes to the speeds that human researchers are capable of reading, even ambitious studies are limited to a few hundred. For this reason, topic modeling

Fig. 1 Overall research process. The overall research process is performed in two stages: (i) defining a network of documents based on sciencesubject pairs and (ii) identifying topics from the network data.

techniques based on the frequency-based approach (ex. Latent Semantic Analysis, Latent Dirichlet Allocation, Dynamic Topic Model) were introduced to derive unobserved topics from a very large number of texts. However, frequency-based approaches remove context by relying only on term frequencies. New embedding-based approaches such as BERTopic, allow us to consider the contextual knowledge of large text data sets. The Web of Science Raw Data (WoS)

, with over 63 million publication records found in 12,500 high-quality journals, is a common target of bibliometric analysis.

The data and methods used for the empirical analysis are introduced in accordance with the overall research process described in two stages (Fig. 1): data collection and pre-processing, network analysis of an interdisciplinary science dataset, and topic modeling of the newly constructed dataset. The first stage gathers and prepares the data from the journal publication metadata for network analysis and topic modeling. In stage 1, science category-subject network analysis is conducted to construct an interdisciplinary science network. In this constructed interdisciplinary science network, the science category-subjects that have greater network centrality, i.e. those that have greater potential value in terms of knowledge recombination, are defined. Here, the dataset is divided into two consecutive periods to create two interdisciplinary science networks. Comparing network values in two periods, science category-subjects that are more likely to grow (emerging science field) and that are more likely to have greater frequency (dominant science field) in the following period are selected to filter the final text dataset for topic modeling. Through this step, more precise and accurate data on publications can be extracted by filtering ones including such science category-subjects to restrict the data to the ’emerging science fields’. Utilizing the filtered list of publications, in the following subsection (Fig. 1, stage 2), topic modeling is conducted to explore the emerging topics in each interdisciplinary science field. This stage includes all the required processes for running the BERTopic model analysis. Through this process, latent topics representing each interdisciplinary science are derived. For qualitative validation, the publications that are the most representative of the emergent topics-which have been identified through the unsupervised learning process-are analyzed to identify what the topics of interest are for the given interdisciplinary categories.

Data collection. For the empirical analysis, the metadata is collected from the Web of Science Database. The database provides bibliometric information of scientific publications including the publication title, year, journal title, author, institution, institution’s address, broad category, subject field, funding, citations, etc. The metadata should also include fields that enable differentiation by document type (ex. Article, editorial material, review,
biographical item, letter, bibliography, correction, book review, meeting abstract, or proceedings paper) and publication type (journal, book in series, or book). These criteria allow us to restrict our sample to publications that are written for the same purpose, to maintain the quality of articles, and to avoid duplication. The dataset employed here is limited to journal articles by filtering its document and publication types.

Then, the list of publications that meet the definition of interdisciplinary science is selected and divided into three-year periods, which helps to stabilize dataset rankings (Archambault et al. 2009). By definition, interdisciplinary science refers to the cases where the scientific outcome is based on different research areas. In the WoS database, the research areas are defined by the scientific classifications, subheadings, and subjects. The broad global science category (‘subheading’ in WoS) indicates the toplevel classification for the scientific fields including life-science & biomedicine (LSB), technology (TE), physical sciences (PS), arts & humanities, and social sciences. These categories are mutually exclusive. The subject field refers to a lower-tier classification of science that is assigned to an accordant category subheading. Here, all classifications are provided by WoS, as all journals and books included in WoS are categorized accordingly. In this study, an interdisciplinary science field is defined as the scientific outcome based on at least two subheadings, which are science categories.

In our WoS publication sample dataset, publications with technology- and science-based subheadings (LSB, TE, and PS) are used to maintain the consistency of the scientific fields. A total of 7,453,987 publications (from 10,138 journals) with 226 subjects are first collected over the reference period of 2012 and 2017. From this data set, global interdisciplinary science publications are filtered, which gives us

publications (from 1137 journals) with 172 subjects. Our final sample is restricted to publications that are classified as Journal Article (doc_type

‘ Article’ and pub_type = ‘Journal’) without missing abstracts. Table 1 presents the basic descriptive statistics on the number of publications, subjects, and journals for each interdisciplinary science field included in our final sample. Among all the interdisciplinary sciences, PS-TE has the greatest number of publications, subject, and journals, showing that it is the most active interdisciplinary science field. The increments of publication from all interdisciplinary science activities reflect the global trend of technology convergence as more heterogeneous technologies and industrial fields are used together over time.

Science category-subject co-occurrence network analysis

Science category-subject pair set. Prior to the science categorysubject co-occurrence network analysis, a science category-subject co-occurrence pair set is constructed. In the interdisciplinary science dataset, a list of science category-subjects that are relevant to the category subheadings are assigned for each publication. Each science category-subject represents a node in the network connected by publications. To conduct co-occurrence network analysis, the combinations of category-subjects for each publication are transformed into a pair-form dataset for each interdisciplinary science field that defines the edges between nodes. We illustrate science category-subjects by signifying their categories with a capital letter (A, B, or C) and a number (

) to differentiate the science category-subjects within the categories. If publication X contains three science category-subjects A3, B6, and C9, it will have three rows of pair sets: A-B, B-C, and A-C. If a publication Y contains three science category-subjects of A1, A2, and B5, it will have two duplicate rows of interdisciplinary pair sets: A-B, A-B. Once the data set is transformed, the numbers of science category-subject pairs are aggregated by counting the

Table 1 Descriptive Statistics of Interdisciplinary Science Exploration Sets.

	2012-2014			2015-2017
	Publication	Subject	Journal	Publication	Subject	Journal
LSB-TE	68,768	80	162	79,112	81	175
LSB-PS	115,499	67	228	120,161	67	248
PS-TE	345,520	85	584	414,010	86	637
LSB-PS-TE	25,447	43	40	25,805	43	43

Fig. 2 Science category-subject co-occurrence network. The Science category-subject co-occurrence network shows an example network of publication nodes, e.g., Publication 1, linked by listed subjects, e.g., A1.

number of publications including such science category-subject pairs. The aggregated science category-subject pair set, therefore, presents the number of publications of science category-subject pairs in each interdisciplinary science in the respective period.

Science category-subject co-occurrence network analysis. Using subject pair sets, subject co-occurrence network analysis is conducted for interdisciplinary science fields in each period. A cooccurrence network is an effective method for analyzing the structural relationship between elements. A similar approach has been used with patent data for technology convergence analysis (Curran and Leker 2011; Kogler et al. 2017; Kim et al. 2018, 2019). In this regard, a co-occurrence network using publication data can provide greater understanding of how science category-subjects are being used and related to each other across interdisciplinary science fields. In a subject co-occurrence network, science category-subjects are used as nodes, and publications are used as edges. For the linkage rule, undirected and weighted networks are adopted. As shown in Fig. 2, science category-subjects are connected only if they were used in the same publication. For instance, subjects A and C have a total of two edges because they are used in publications 1 and 2 .

Once the global network map is constructed for interdisciplinarity, the Eigenvector centrality (EIG) values of all nodes (in this network, science category-subjects) are measured. In this science category-subject co-occurrence network of interdisciplinary science, a science category-subject that is more important or influential can be regarded as a key science category-subject in an interdisciplinary science field, and those with a greater network value should be highlighted as they are the ones leading science category interdisciplinarity. Here, EIG measures the influence of network nodes beyond mere frequency counts by considering the centrality of connected nodes (West et al. 2013). For instance, a science category-subject connected to important science categorysubjects is considered to have greater influence in the network. Rather than assuming equal importance, this measure differentiates the weight of edges by the importance of connected nodes. Unlike degree centrality, which solely focuses on the number of connections, EIG assesses a node’s importance by evaluating the significance of its connections. This approach captures the qualitative aspect of network relationships. Furthermore, while PageRank is specifically tailored for directed networks, EIG’s versatility allows it to be effectively applied to undirected networks as well. In this aspect, EIG can be used as an indicator for measuring the importance or influence of the emergent field

Fig. 3 Concept of growing and dominant interdisciplinary subjects. The graphs demonstrate how emerging science differs from dominant science as measured by Eigenvector centrality and the growth rate of Eigenvector centrality.

interdisciplinarity (Heo & Lee, 2019; Qian et al., 2017; Rapach et al., 2015). With EIG, a network index that measures the influence of a node in a network by assigning weights to each connection based on the centrality of the connected node (Bonacich 2007), the key science category-subject in terms of being comparatively more important can isolated.

Using EIG, the conceptual framework of dominant and emerging science fields is proposed for the following purposes. First, by using EIG and its growth rate (EIG.GR), either dominant- or growing-sciences in terms of knowledge recombination can be determined. The threshold for dominant and growing interdisciplinary science is set to the top

of science category-subjects. Essentially, only those that are ranked in the top 10% in each measure are selected and named as dominantand growing-sciences, respectively. Choosing the top

threshold for EIG and EIG.GR as criteria for identifying dominant or emerging science subjects is a deliberate methodological decision. This threshold is designed to selectively highlight the most influential or rapidly evolving fields, accounting for the skewed distribution of scientific networks where a few nodes accumulate the majority of connections. It allows for the identification of both established and emerging fields, reflecting on the dynamic nature of scientific research. A conservative approach like this minimizes false positives due to statistical fluctuations, ensuring that only subjects with consistently high metrics are considered. Furthermore, setting a clear benchmark facilitates comparative analysis over time and across disciplines, providing a consistent and reliable method for tracking changes in the scientific landscape. This choice underscores a strategic approach to recognizing significant trends and shifts within the realm of scientific research, emphasizing the importance of both sustained influence and notable growth in determining the prominence of science subjects.

As illustrated in Fig. 3, if the EIG (or EIG.GR) value of a science category-subject falls within the top

, it is considered to be a dominant (or emerging) science. If the values of both EIG and EIG.GR are within the top

, then the science categorysubject can be classified as both dominant and emerging, signifying not only its current influence but also a significant increase in its impact. Conversely, if neither value falls within the top

, the science category-subject is not considered either

Fig. 4 Process of BERTopic modeling. The process of BERTopic modeling involves transforming document data into vectorized data, reducing the dimensionality, organizing the data into clusters and topics.

dominant or emerging. This allows us to focus on the specific list of publications that are more valuable in interdisciplinary science activity. Also, this contributes to improving the computation process for running text analysis by reducing the sample size. Rather than running text analysis for the whole sample, focusing on the selected publications that can be assumed to have more potential and to be consistent in terms of science subjects can improve the precision of our analysis. In this regard, selected growing interdisciplinary science category-subjects can be used as a reference for potential ones in the future. Due to the pathdependent nature of knowledge, a strong tendency or preference to follow such a trajectory is often observed, especially in knowledge-intensive activities. In other words, either a present network position or current network growth is very likely to be consistent also in the following period. This will be discussed in more detail with empirical findings in the following section.

Since the main interest of this study is exploring new rising topics in interdisciplinary science fields, we focus on growingsciences rather than dominant-sciences. For the following step, publications representing growing interdisciplinary science category-subjects are filtered.

Embedded topic modeling

BERTopic. To derive topics for growing-sciences of each interdisciplinary science document, the BERTopic model is used. BERT, also known as Bidirectional Encoder Representations from Transformers, is a deep learning-based language model built on Transformer architecture developed by Google (Devlin et al. 2019). As presented in Fig. 4, the BERTopic is an integrated topic modeling technique that incorporates BERT embeddings, Unified Manifold Approximation and Projection (UMAP), Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), and a class-based Term Frequency-Inverse Document Frequency (c-TF-IDF) (Grootendorst 2022).

The first step is embedding vectorization, which transforms target documents into vectors. Unlike conventional topic modeling methods that rely on Bag-of-Words (BoW) approaches, focusing solely on the frequency of terms, BERTopic utilizes embedding vectors. These embeddings represent documents in a space that, while lower in dimension compared to the vast potential vocabulary of BoW, is rich in capturing the deep semantic information inherent in the text. This allows for a higher contextual understanding of documents. By leveraging pre-trained word embeddings, BERTopic enables the analysis of documents with nuanced insights into their contextual meanings, surpassing the limitations of traditional encoding vectorization methods. Here, we utilized the default text representation model, “all-MiniLM-L6-v2”, for our analysis. This model, designed as an all-purpose model, functions by converting sentences and paragraphs into a 384 -dimensional dense vector space. It’s versatile, suitable for tasks like clustering or semantic search, especially for English language text. Compared to the “all-mpnet-base-v2” model, one that is known to provide the best quality, it
operates five times faster without compromising on quality

, and its effectiveness has led to its adoption in various relevant studies (Samsir et al. 2023; Wang et al. 2023).

The second step in BERTopic involves dimensionality reduction. This is crucial because clustering algorithms, which are integral to topic modeling, perform better with lowerdimensional data. The primary challenge addressed here is the ‘curse of dimensionality,’ where high-dimensional spaces can negatively impact the efficiency and effectiveness of clustering algorithms. By reducing the dimensionality of the embedding space, BERTopic effectively mitigates this issue, facilitating more coherent and accurate topic clusters. This approach emphasizes the importance of tailoring data preprocessing steps to enhance the performance of specific algorithms used in the topic modeling process. For this reason, the UMAP algorithm is used to reduce the complexity of the embedding vector while preserving its essential structure. Assuming that high dimensional data lies on a lower dimension, UMAP maps highly complex data onto a simpler space efficiently by preserving the comparative distance and density and makes it easier to identify the cluster of similar documents (McInnes et al. 2016).

The following step is document clustering using HDBSCAN, which generates clusters based on the density of data points by using the hierarchical tree method. One of the strengths of HDBSCAN is that it can effectively identify and handle noise, which can help to derive more meaningful clusters. In addition, the combination of UMAP and HDBSCAN shows better performance in text clustering (Asyaky and Mandala 2021), and the clustering results can be modified by adjusting the hyperparameters regarding cluster generation.

The last step is topic generation with c-TF-IDF. c-TF-IDF is an adaptation of TF-IDF, which is designed to capture the representative terms from documents for each topic. TF-IDF is known as an effective measure for finding representative terms by combining term frequency and inverse document frequency (Salton and Buckley 1988). Under the assumption that a representative term of a document should be a distinctive one that represents the document, this measure simply captures the terms that not only occur more frequently in a document but also occur less frequently in other documents. By using c-TF-IDF

(Eq. 1), the importance of a term within a specific class can be found.

Qualitative validation of results. Once the interdisciplinary science maps have been analyzed, a list of representative publications for each interdisciplinary category can be generated based on the topics defined through BERTopic. Reliance on machine learning, however, can lead to misclassification (Lyutov et al. 2021), so we examine the results of the topic modeling to identify from where the newly emergent topic stems and describe them. Many recent studies that apply BERTopic have performed qualitative or manual validation of the results (Balcı et al. 2023; Capra, 2024; de Lima et al. 2023; Kasperiuniene et al. 2020; Wang et al. 2023). Using qualitative analysis, we review the results of the BERTopic process to validate them. First, the topic keywords are considered to determine if they provide a common theme for the articles under the topics. A qualitative approach is used to examine the topics to identify characteristics of emergent topics. After BERTopic is performed on the data sets, a list of topic keywords and representative articles emerge through the unsupervised process, e.g. topic-1. Additionally, traceability requires parsimony that the representations are unnecessarily complex such that even non-experts should be able to interpret them

Fig. 5 Subject co-occurrence network analysis result. a LSB-TE. b LSB-PS. c PS-TE. d LSB-PS-TE. Note: The growing interdisciplinary science subjects are in bold.

(Rafols et al. 2010). The results are compared to check that they are rational or “make sense” to non-experts. Additionally, the journal lists are evaluated to discern the characteristics of the topics. Nonsensical topics would be expected to be random or not fit our definition of global interdisciplinary.

Case Study on Interdisciplinary Science in the Web of Science

Preparing the interdisciplinary science dataset. Following previous bibliometric studies using topic modeling techniques (Suominen and Toivanen 2016; Velden et al. 2017; Yau et al. 2014), we use the Web of Science Core Collection (WoS),

which is a database of peer-reviewed scholarly journals published worldwide. The WoS database provides the necessary metadata required for pre-processing, e.g. selecting peer-reviewed journal articles.

Results of science category-subject co-occurrence network analysis. In this section, the results of science category-subject cooccurrence network analysis are presented. Figure 5 illustrates the dominant- and growing-interdisciplinary science using the conceptual framework presented in Fig. 3, and Table 2 presents the full list of dominant- and growing-sciences. All nodes represent the science category-subjects included in each interdisciplinary science field, and dominant- (located further to the right on the xaxis) and growing-science (located higher on the

-axis) are labeled. One interesting point is that a clear distinction between dominant- and growing-interdisciplinary science is observed in all cases. Considering the path-dependent nature of knowledge, the dominant-sciences are likely to remain dominant in the following period. The prediction of key emergence trends, however, focuses on new interdisciplinary science category-subject merging
that is expected to be more influential, rather than those that are already well-known. The gap between two types of science category-subjects justifies our approach to distinguishing promising science category-subjects in the future from those that already prevail, and more importantly, indicates that focusing on the emerging topics fits more into the purpose of this research.

This study focuses on the growing influence of interdisciplinary science to investigate the key topics that are likely to rise in the near future. In this regard, the publications including growinginterdisciplinary science are used for the following step of analysis. As shown in Table 2 and Fig. 6, EIG values of growing cross-domain science category-subjects in the following period tend to be greater than that of other science fields. This reflects that growing interdisciplinary science category-subjects in the current period have the greatest increases in the following period. With few exceptions, these subjects are different than those in the dominant-science fields. For BERTopic modeling, therefore, a set of cross-domain publications including growing-science are used.

Unsupervised classification of the emergent interdisciplinary science topics

BERTopic setting. While conventional topic modeling approaches consider the number of topics as an important hyperparameter to run analysis, BERTopic does not necessarily require it because UMAP and HDBSCAN ease the optimization of the clustering process, and automatically generate the list of topics. However, setting the number of topics is still important because a fully automated learning process may end up with an incomprehensible result. For instance, if BERTopic is conducted with its default settings and HDBSCAN optimization algorithms, it will automatically generate a list of topics, but this does not guarantee

Table 2 List of dominant and growing science category-subjects in interdisciplinary science fields.

Science	Dominant science category-subjects	Growing science category-subjects
LSB-TE	Environmental Sciences	Forestry
	Engineering, Environmental	Materials Science, Textiles
	Green & Sustainable Science & Technology	Instruments & Instrumentation
	Energy & Fuels	Pharmacology & Pharmacy
	Engineering, Chemical	Green & Sustainable Science & Technology
	Ecology	Medicine, Research & Experimental
	Public, Environmental & Occupational Health	Engineering, Environmental
	Radiology, Nuclear Medicine & Medical Imaging	Ecology
LSB-PS	Chemistry, Applied	Neurosciences
	Biochemistry & Molecular Biology	Health Care Sciences & Services
	Food Science & Technology	Immunology
	Chemistry, Analytical	Polymer Science
	Biochemical Research Methods	Paleontology
	Chemistry, Multidisciplinary	Microbiology
	Chemistry, Medicinal	Fisheries
PS-TE	Materials Science, Multidisciplinary	Engineering, Aerospace
	Physics, Applied	Green & Sustainable Science & Technology
	Nanoscience & Nanotechnology	Engineering, Marine
	Chemistry, Physical	Geography, Physical
	Physics, Condensed Matter	Water Resources
	Chemistry, Multidisciplinary	Engineering, Mechanical
	Engineering, Electrical & Electronic	Acoustics
	Energy & Fuels	Engineering, Ocean
	Materials Science, Coatings & Films	Automation & Control Systems
LSB-PS-TE	Environmental Sciences	Remote Sensing
	Water Resources	Imaging Science & Photographic Technology
	Engineering, Environmental	Geosciences, Multidisciplinary
	Computer Science, Interdisciplinary Applications	Crystallography
	Statistics & Probability

Note: The list of science category-subjects are arranged in descending order.

Fig. 6 Comparison of the EIG in following period between Growing-Interdisciplinary Science category-subjects and others. Note: On average, Eigenvector centrality in the following period of Growing-Interdisciplinary Science category-subjects ( 0.348 ) is higher than others ( 0.093 ).

Table 3 Hyperparameter testing of BERTopic.

	Publication	n-gram range	Number of topics	Minimum topic size
Growing-science of LSB-TE	26,164	or or	50 ~ 1000	130-780
Growing-science of LSB-PS	10,577			50-300
Growing-science of PS-TE	49,042			240-1440
Growing-science of LSB-PS-TE	904			5-50

that the result is also acceptable in terms of application and obtaining insights.

For this reason, the three hyperparameters of n-gram range, number of topics, and minimum topic size are tested within ranges to find the best BERTopic model results (Table 3). The n-gram range determines whether the term should cover unigrams, bigrams, or trigrams, the number of topics sets the initial number of topics when running BERTopic, and the minimum topic size sets the minimum number of documents that each topic should contain. While the first two hyperparameter values were tested with the same range (n-gram range: unigram, bigram, trigram; number of topics: 5-1000), minimum topic size values proportional to the total number of publications were used. Minimum topic size values can be strongly affected by the size of documents, which may lead to topic sizes that are too broad or narrow for different cases. This especially largely influences the creation of outlier topics and an inexplicable number of topics. Thus, applying a proportional minimum topic size can help us minimize the size of outlier topics and maintain an explainable number of topics. For this reason, an integer value is used for the minimum topic size for each case that represents

of total publications. To help us consider a combination of different hyperparameters with wide ranges, a random search method is used to find an optimized parameter with random combinations, limited to no more than 100 iterations.

For each iteration, the information entropy value is measured (Eq. (2)) (MacKay 2003). By finding cases with uneven distribution of words in the topic, a set of topics with explicit semantic expression can be found (Wang et al. 2023). Known as a measurement of uncertainty, information entropy provides a means to determine whether topics can be clearly distinguished. In this regard, a model with the lowest information entropy value (Eq. (2)) is selected as the best model.

BERTopic results. Once the dataset has been divided into different interdisciplinary sciences, the BERTopic process identifies articles that have similar topics, limited to the number of topics defined. The topics are defined through an unsupervised algorithm that identifies common lists of keywords that describe the topics.

Table 4 presents the groups of topics that appear in the greatest number of articles for each pairing of the subheadings: LSB-TE, LSB-PS, PS-TE, and LSB-PS-TE. The list of topic keywords identified in the interdisciplinary text set is used to define the topics. Outlier groups are used to prevent the formation of nonsensical or isolated topic groups.

Qualitative validation of results. Following recent studies that apply BERTopic (Balcı et al. 2023; Capra 2024; de Lima et al. 2023; Kasperiuniene et al. 2020; Wang et al. 2023), this study performed qualitative or manual validation of the results. While topic modeling may allow for the analysis of a large corpus of data, the results of the topic modeling should remain decipherable to non-experts (Rafols et al. 2010). Thus, we perform smallscale, qualitative analysis to verify that this condition holds.

While all articles are matched with the topic that is the most likely fit, not all articles that fall under the topic are equally representative of the topic. The representative articles are identified through the topic modeling technique, which means that they have the highest probability of matching the topic. The top 3 representative articles that fit the topics defined through topic modeling are provided in Table 5. All of the representative articles can be readily fit with the topics with which they are matched.

When considering the LSB-TE case, “Mechanical Properties and Composition of Natural Fibrous Materials” is most represented by articles LSB-TE-0-A through C. The article titles contain the phrases that are recognizably appropriate for the emergent topic: “tree bark,” “insulation material,” “manufacturing,” “green-glued plywood panel,” “resistance of thermally modified,” “under extreme pressure,” and “ash wood.” Moreover, the journal titles are also representative of the topic: Forest Products Journal and European Journal of Wood and Wood Products (appears twice). Similar patterns are found for the other emergent topics listed in Table 5. Therefore, we find that the emergent topics that have been defined represent an easily recognizable theme. More broadly, many of the emergent topics are related to green technologies and sciences and to a lesser extent health-related technologies.

The journals with the greatest number of emergent interdisciplinary topic publications can be identified from the list of identified topics (Table 6). Yet, the journals in which the topics appear are clustered among a small portion of all publications; the distribution of publications with emergent interdisciplinary topics is skewed towards a small share of all journals in the dataset. Half of all publications were published in the top quintile of all journals for each interdisciplinary category group: 14th percentile (LSB-TE),

percentile (LSB-PS), 10th percentile (PS-TE), and 18th percentile (LSB-PS-TE). Additionally, when considering the top journals that emerge from the ranking of interdisciplinarity results, the categories become clearer when considering the emergent topics. For PS-TE, the emergent topics can only be seen in Desalination and Water Treatment and International Journal of Hydrogen Energy. The other titles are suggestive of the science and technologies involved: physical chemistry, sensors, and materials.

Discussion and conclusion

As science continues to expand its research output, the science of science emergence provides an opportunity to understand where new knowledge-the source of innovation-originates from by examining global interdisciplinarity. Most previous studies have focused on breakthroughs or identifying popular directions within narrow fields of study measured by frequency size. These past approaches apply the logic of identifying patterns of frequency-based dominant topics within a specific field of science. In contrast, the present study provides an alternative perspective in understanding the science of science emergence with a focus on the influence of the changing boundaries of conjoining science across categories. The main contributions of our research are (i) to expand the definition of interdisciplinarity to include global

Table 4 Topics and keyword lists for science category co-occurrence pairs

Science Categories	Topic	Keywords	Generated Labels	Number of Documents
LSB-TE (26,164 publications)	Outlier	cutting, machining, leather, grinding, radon, asbestos, lubrication, tanning, lead, mql	–	272
	0	wood, lignin, properties, strength, mechanical, bamboo, moisture, cellulose, modulus, specimens	Mechanical Properties and Composition of Natural Fibrous Materials	22314
	1	water, study, energy, results, environmental, waste, model, using, production, based	Sustainable Environmental Technologies and Resource Management	1339
	2	patients, group, It, groups, clinical, control, cancer, expression, cells	Cancer Biomarker Expression in Clinical Patient Groups	2239
LSB-PS (10,577 publications)	Outlier	brain, fnirs, imaging, optical, neurons, cortex, cortical, neural, attribution, stimulation	–	176
	0	species, water, sea, early, data, late, marine, formation, climate, new	Marine Biodiversity and Climate Impact Studies	5129
	1	model, data, models, proposed, methods, trial, regression, method, simulation, clinical	Clinical Trial Modeling and Simulation Techniques	397
	2	showed, activity, properties, protein, cells, chitosan, cell, ph, acid, drug	Chitosan Bioactivity and Drug Delivery Applications	4785
PS-TE (49,042 publications)	Outlier	forum, journal, views, readership, essays, speculation, editorial, provoking, asce, founded	–	11
	0	adsorption, removal, membrane, ph, process, concentration, water, treatment, mg, acid	Adsorption and Membrane Processes for Water Treatment	10,873
	1	heat, model, flow, results, data, based, water, method, temperature, transfer	Heat Transfer Modeling and Analysis in Fluid Systems	38,158
LSB-PS-TE (905 publications)	0	data, study, land, area, using, flood, spatial, based, model, used	Flood Risk Assessment and Spatial Modeling	334
LSB-PS-TE (905 publications)	1	binding, molecular, protein, energy, interactions, docking, structure, dynamics, molecules, results	Protein-Molecule Docking and Interaction Dynamics	570

Table 5 Representative articles for each interdisciplinary emergent topic.
Science Categories-Emergent Topic	Representative Article
LSB-TE
Mechanical Properties and	0-1: Kain et al. (2015)
Composition of Natural Fibrous	0-2: Lavalette et al. (2016)
Materials	0-3: Candelier et al. (2017)
Sustainable Environmental	1-1: Egle et al. (2015)
Technologies and Resource	1-2: Palma-Rojas et al. (2017)
Management	1-3: Harijani et al. (2017)
Cancer Biomarker Expression in Clinical	2-1: Liu et al. (2017)
Patient Groups	2-2: Liu and Li (2017)
	2-3: Qi et al. (2017)
LSB-PS
Marine Biodiversity and Climate Impact	0-1: Chen et al. (2016)
Studies	0-2: Bataille et al. (2016)
	0-3: Lowery et al. (2017)
Clinical Trial Modeling and Simulation	1-1: French et al. (2016)
Techniques	1-2: Luo et al. (2016)
	1-3: Lu (2017)
Chitosan Bioactivity and Drug Delivery	2-1: Zhao et al. (2016)
Applications	2-2: Berah et al. (2017)
	2-3: Gomes et al. (2017)
PS-TE
	0-1: Ahmed (2016)
Adsorption and Membrane Processes for Water Treatment	0-2: Zhang et al. (2016)
	0-3: Saadati et al. (2017)
Heat Transfer Modeling and Analysis in Fluid Systems	1-1: Colombo and Fairweather (2016)
	1-2: Wu et al. (2017).
	1-3: Daabo et al. (2017)
LSB-PS-TE
Flood Risk Assessment and Spatial	0-1: Ding et al. (2017)
Modeling	0-2: Chian and Wilkinson (2015)
	0-3: Rizeei et al. (2016)
Protein-Molecule Docking and	1-1: Shamim et al. (2015)
Interaction Dynamics	1-2: Khan et al. (2017)
	1-3: Bobovská et al. (2016)
Note: The representative articles are preceded by the topic number and a number index, e.g., “O.

domain-crossing science categories, (ii) to use Eigenvector centrality as a measure of influence on emergent topics, and (iii) to demonstrate the use of embedded topic modeling over a dataset the represents a global science map. This study provides an early foray into applying unsupervised classification using BERTopic modeling on interdisciplinary science datasets. This approach is one of the few contemporary studies that apply text-embeddingbased topic modeling techniques to the science of science emergence, and the only one to focus on the influence of existing science topics on emergence.

Furthermore, the present investigation provides a simple model to achieve the desired analysis and, in addition, demonstrates that the originating subjects of interdisciplinary topics can be identified using embedded topic modeling. Using the Schumpeterian definition of knowledge creation based on recombination processes, the model examines the intersection of interdisciplinary sciences to identify the most influential topics related to emergent scientific knowledge based on science topics that are projected onto a global science map. The results can be used to identify trend profiles of the interdisciplinary sources of emergent topics over time.

Since dominant science is subject to the bias of size and canonical fields, emergent science based on the influence of cooccurring science domains provides an alternative measure. The

Eigenvector centrality value can be used as a measure for the growth of interdisciplinarity that is different from approaches that focus on dominant science in a co-occurrence network of interdisciplinary emergence. Dominant science subjects are different than the topics related to growing interdisciplinary science, differentiating the results of this study from prior studies that emphasize frequency-based, dominant science. The approach that we used allows us to retain contextual knowledge in text analysis. Nonetheless, those science subjects that appear in both emergent growing and dominant interdisciplinary sciences such as “Green & Sustainable Science & Technology” may indicate greater influence on research for society and have greater potential for applications.

This study suggests that identifying emergent topics may help us better understand how to direct and use innovative research. This study detected green- and health-related topics are emergent across many of the global interdisciplinary science categories. As global challenges emerge, more efficient and effective means to identify emergent research to address them are necessary; yet, it has become increasingly difficult to meet this aim (Petersen et al. 2021). Bloom et al. (2020) posit that if firms are shifting towards defensive research activities, then government policy must reconsider how research is publicly funded. In order to increase economic productivity, the sources (and barriers) of innovation need to be detected within sectors and individuals. Although this may help when focusing on economic-related challenges, there may be the need for additional measures of research productivity when considering socially oriented innovation demands. Thus, an alternative explanation for the decline in science productivity is that social innovation may be driving research rather than economic imperatives.

Although the present study has departed from prior studies in several aspects, further research is needed to address its limitations. First, the number of topics that were automatically generated was small, which means that there are likely additional emergent topics that can be identified in follow-up studies. Nevertheless, the current investigation adopted a conservative approach to ensure that the topics identified were meaningful, especially when considering that the distributions are highly skewed. Future research should also consider how to refine the level at which emergent topics are still acceptably defined, e.g., recursive clustering on large-scale bibliometric data (cf. Mejia and Kajikawa 2020) while balancing the diversity of domains and similarity of emergent topics. Additionally, the NLP approach adopted here requires a comparably large amount of computing power, which, in turn, might pose a challenge for universal day-to-day applications and policy purposes.

Another limitation is that our data is constrained to scientific journal articles in the WoS. Not all innovations-especially social innovations-may be derived from science and technology fields. This approach may also ignore disciplines that tend to produce other types of publications. A broader approach that considers these types of interdisciplinarity may provide alternative sources of identifying social innovation. Lastly, while this study focused on specific characteristics of emergence defined through interdisciplinarity in the WoS, future research assessments should “consider the value and impact of all research outputs” and “consider a broad range of impact measures,” as stated in the San Francisco Declaration on Research Assessment (Cagan 2013). Rather than redefine emergence through science maps, this study aimed to explore a different approach to understanding emergence by providing an alternative perspective on emergence.

The science of science can link existing knowledge reservoirs for technology development, especially as global challenges influence the direction of science emergence that can be applied to the innovation of new technologies. A better understanding of the existing topics that are cross-domain and, as such, generate new innovative outcomes and solutions can help to apply the

Table 6 Top 10 journals by interdisciplinary category pairs.

Interdisciplinary categories	Journal Title	Number of documents	Share
LSB-TE	Journal of Cleaner Production	5589	21.4%
	Environmental Science & Technology	4524	17.3%
	Journal of Hazardous Materials	2417	9.2%
	Biomedical Research-India	1933	7.4%
	Ecological Engineering	1615	6.2%
	Waste Management	1368	5.2%
	Environmental Modeling & Software	735	2.8%
	Environmental Progress & Sustainable Energy	652	2.5%
	Resources Conservation and Recycling	541	2.1%
	Clean Technologies and Environmental Policy	512	2.0%
LSB-PS	International Journal of Biological Macromolecules	3347	31.6%
	Paleogeography Paleoclimatology Paleoecology	1257	11.9%
	Biomacromolecules	1250	11.8%
	ICES Journal of Marine Science	698	6.6%
	Cretaceous Research	586	5.5%
	Marine and Freshwater Research	501	4.7%
	Statistical Methods in Medical Research	399	3.8%
	Journal of Water and Health	282	2.7%
	Paleoceanography	268	2.5%
	Food and Agricultural Immunology	259	2.4%
PS-TE	Desalination and Water Treatment	5622	11.5%
	Applied Thermal Engineering	5040	10.3%
	International Journal of Heat and Mass Transfer	3791	7.7%
	ACS Sustainable Chemistry & Engineering	2468	5.0%
	Journal of Hydrology	2206	4.5%
	Advances in Mechanical Engineering	1931	3.9%
	Ocean Engineering	1639	3.3%
	IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing	1447	3.0%
	Combustion and Flame	1093	2.2%
	Ultrasonics Sonochemistry	1089	2.2%
LSB-PS-TE	Journal of Molecular Graphics & Modeling	570	63.0%
	Geocarto International	212	23.4%
	Natural Hazards Review	115	12.7%
	Geocarto International	8	0.9%

science of science to applicable and effective STI policy initiatives that incorporate social innovation objectives as well.

Data availability

The data that support the findings of this study are available from the Web of Science but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Web of Science.

Received: 22 June 2023; Accepted: 12 April 2024;
Published online: 10 May 2024

Note

1 For clarity, in this paper we refer to ‘convergence’ as technological convergence with respect to the realization of new technologies unless otherwise stated.

clarivate.libguides.com/c.php?g

.
3 https://www.sbert.net/docs/pretrained_models.html.

is the term and

refers to the class,

is the frequncy of term

extracted from class

is total number of terms from class

is the total number of documents.
5 Full list of science classification: https://support.clarivate.com/ ScientificandAcademicResearch/s/article/Web-of-Science-List-of-Subject-Classifications-for-All-Databases?language=en_US.
6 Following prompt has been used with ChatGPT (GPT-4): I have topic that contains the scientific publications related to [“Name of Interdisciplinary Science”]. The topic is described by the following keywords: [“List of keywords”] Based on the above information, can you give a short label of the topic?

References

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1G1A1006464) and by Handong Global University Research Grants (No. 202300710001). Dieter F. Kogler would like to acknowledge funding from the European Research Council (https://erc.europa.eu/) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 715631, ERC TechEvo). Further, the authors, Keungoui Kim & Dieter F. Kogler, would also like to acknowledge funding from the Science Foundation Ireland (SFI; https://www.sfi.ie/) under the SFI Science Policy Research Programme (grant
agreement No 17/SPR/5324, SciTechSpace). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author contributions

The authors confirm contribution to the paper as follows: Conception or design of the work: Keungoui Kim, Sira Maliphol; Acquisition, analysis, or interpretation of data: Keungoui Kim, Dieter F. Kogler, Sira Maliphol; Creation of new software used in the work: Keungoui Kim; Drafted the work or substantively revisions: Keungoui Kim, Dieter F. Kogler, Sira Maliphol; Corresponding Author: Sira Maliphol, correspondence to: sira.maliphol@sunykorea.ac.kr

Competing interests

The authors declare no competing interests.

Ethical approval

Ethical approval was not required as the study did not involve human participants.

Informed consent was not required as the study did not involve human participants.

Additional information

Correspondence and requests for materials should be addressed to Sira Maliphol.
Reprints and permission information is available at http://www.nature.com/reprints
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/ licenses/by/4.0/.
© The Author(s) 2024

School of Applied Artificial Intelligence, Handong Global University, Pohang, South Korea. Spatial Dynamics Lab, School of Architecture, Planning & Environmental Policy & Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland. Dept. of Technology & Society, the State University of New York, Songdo, South Korea. email: sira.maliphol@sunykorea.ac.kr