GCL_FCS30: مجموعة بيانات الساحل العالمي بدقة 30 مترًا ونظام تصنيف دقيق من 2010 إلى 2020 GCL_FCS30: a global coastline dataset with 30-m resolution and a fine classification system from 2010 to 2020

المجلة: Scientific Data، المجلد: 12، العدد: 1
DOI: https://doi.org/10.1038/s41597-025-04430-0
PMID: https://pubmed.ncbi.nlm.nih.gov/39843467
تاريخ النشر: 2025-01-22
المؤلف: Jian Zuo وآخرون
الموضوع الرئيسي: ديناميات السواحل والبحار

نظرة عامة

تقدم البحث مجموعة بيانات الخط الساحلي العالمية (GCL_FCS30)، التي تعالج نقص تصنيف الخط الساحلي التفصيلي في مجموعات البيانات العالمية الحالية. تستخدم GCL_FCS30 طريقة جديدة لاستخراج الخط الساحلي تجمع بين مؤشر الفرق المائي المعدل مع نهج تقسيم العتبة التكيفية. لتصنيف، يتم استخدام مصنف مقطع هجين، يدمج خوارزمية الغابة العشوائية مع عينات تدريب مستقرة من مصادر بيانات جيوفيزيائية متنوعة.

تظهر مجموعة البيانات مزايا كبيرة في التقاط الخطوط الساحلية الاصطناعية بدقة، حيث تحقق دقة تصنيف إجمالية تتجاوز 85% ومعامل كابا يزيد عن 0.75. يمثل كل نوع من أنواع الخط الساحلي بفعالية الغالبية العظمى من المناطق المشار إليها في مجموعات البيانات التابعة لجهات خارجية، مما يظهر درجة عالية من الصلة المكانية. ومن الجدير بالذكر أن GCL_FCS30 هي أول مجموعة بيانات تصنيف الخط الساحلي العالمية التي تقدم تنسيق خط متجه مستمر وسلس للخطوط الساحلية في خطوط العرض العالية، مما يمثل تقدمًا كبيرًا في تمثيل بيانات البيئة الساحلية.

مقدمة

تسلط مقدمة هذه الورقة البحثية الضوء على نقاط الضعف الحرجة في المناطق الساحلية، والتي هي أنظمة معقدة مرتبطة بين الإنسان والبيئة تواجه العديد من التحديات البيئية، بما في ذلك التآكل، وارتفاع مستوى سطح البحر، والتعدي البشري. يتطلب استبدال الخطوط الساحلية الطبيعية بالهياكل الاصطناعية فهمًا شاملاً لديناميات الخط الساحلي من أجل إدارة مستدامة فعالة. لقد سهلت التطورات الأخيرة في بيانات المراقبة الأرضية عالية الدقة والحوسبة السحابية إنشاء مجموعات بيانات مختلفة لمراقبة الخط الساحلي؛ ومع ذلك، تركز هذه بشكل أساسي على الشكل بدلاً من التصنيف، وغالبًا ما تفتقر إلى المعلومات الفئوية طويلة الأجل.

لمعالجة هذه الفجوات، طور المؤلفون مجموعة بيانات الخط الساحلي العالمية (GCL_FCS30)، التي تقدم نظام تصنيف مفصل بدقة 30 مترًا باستخدام صور لاندسات طويلة الأجل من 2010 إلى 2020. تصنف هذه المجموعة الخطوط الساحلية إلى ستة أنواع: رملية، حيوية، صخرية، طينية، مصب، واصطناعية. من خلال توليد عينات تدريب عالية الثقة من مجموعات البيانات الجيوفيزيائية، تهدف GCL_FCS30 إلى تعزيز تصنيف الخط الساحلي العالمي وتوفير مورد مفتوح الوصول لفهم تأثيرات تغير المناخ والتحضر على الأنظمة الساحلية. تم وضع هذه المجموعة لدعم جهود إدارة السواحل والمساهمة في تحقيق أهداف التنمية المستدامة في البيئات الساحلية المتغيرة بسرعة.

الطرق

في هذه الدراسة، تم استخدام نهج منهجي لرسم خرائط الخطوط الساحلية العالمية باستخدام صور لاندسات الزمنية، كما هو موضح في الشكل 2. تشمل المنهجية ثلاثة مكونات رئيسية: المعالجة المسبقة، استخراج الخط الساحلي، والتصنيف. تتضمن مرحلة المعالجة المسبقة عدة خطوات حاسمة، بما في ذلك المعايرة الإشعاعية، وتصحيح الغلاف الجوي، وإخفاء السحب، وتجميع القيم المتوسطة لتحسين جودة الصور.

لاستخراج الخط الساحلي، تم تنفيذ طريقة تقسيم عتبة جديدة تعتمد على السلاسل الزمنية الأسية، مما يضمن تحديدًا دقيقًا للخطوط الساحلية من الصور المعالجة. بعد ذلك، تم تطوير مصنف مقطع هجين، يستفيد من عينات تدريب موزعة عالميًا ومستقرة زمنيًا لتصنيف الخطوط الساحلية المستخرجة بفعالية. يتم تقديم شروحات مفصلة لكل خطوة إجرائية في الأقسام التالية من الورقة.

المناقشة

في هذه الدراسة، تم استخدام بيانات انعكاس سطح لاندسات المجموعة 2 لتحليل ديناميات الخط الساحلي، مستفيدين من الصور من الماسح الموضوعي (TM)، والماسح الموضوعي بلس (ETM+)، ومصور الأراضي التشغيلي (OLI) على مدى فترة من 2010 إلى 2020. سمحت مجموعة البيانات، التي تتميز بدقة مكانية تبلغ 30 مترًا وتغطية عالمية، بمراقبة فعالة للميزات الساحلية مع تقليل تأثير المد والجزر من خلال اختيار الصور من أبريل إلى أكتوبر. تم استخدام خوارزمية CFmask لإخفاء البكسلات ذات الجودة المنخفضة، محققة دقة تبلغ 96.4%. تم إنشاء نظام مقاطع عالمي لتسهيل تصنيف الخطوط الساحلية إلى فئات مختلفة، مما يعزز دقة التحليل. تضمنت الدراسة عدة مجموعات بيانات جيوفضائية ساحلية عالمية لإثراء عملية التصنيف، مما يضمن تحديدًا قويًا لأنواع الخط الساحلي.

جمعت خوارزمية استخراج الخط الساحلي التي تم تطويرها في هذا البحث بين مؤشر الفرق المائي المعدل (MNDWI) مع طريقة أقصى تباين بين الفئات (OSTU) وكشف حواف كاني، محققة دقة تصنيف إجمالية تبلغ 87.5% باستخدام مصنف مقطع هجين يعتمد على خوارزميات الغابة العشوائية. صنف المصنف الخطوط الساحلية بفعالية إلى أنواع اصطناعية وطبيعية، حيث تم تقسيم الأخيرة إلى خطوط ساحلية رملية، حيوية، صخرية، طينية، ومصب. تم استخدام خوارزمية أخذ عينات عشوائية طبقية للتحقق من مجموعة بيانات الخط الساحلي، مما يضمن حجم عينة يبلغ 964 نقطة لتقييم دقة موثوقة. تسلط منهجية الدراسة، بما في ذلك استخدام الصور عالية الدقة للتحقق من الصحة ودمج الميزات متعددة الطيف والجغرافية، الضوء على مساهمتها في تقدم تقنيات رسم الخرائط وتصنيف الخط الساحلي.

القيود

تسلط قيود منتج الخط الساحلي GCL_FCS30 الضوء على عدة تحديات حاسمة في مراقبة الخط الساحلي. تتمثل إحدى القضايا الرئيسية في غياب القياسات في الموقع لبيانات التدريب وتقييم النتائج، مما يعيق دقة وموثوقية تسميات الخط الساحلي. استخدمت الدراسة طريقة توليد عينات تدمج البيانات الجغرافية مع التفسير البصري، وهي ممارسة شائعة في الاستشعار عن بعد وتعلم الآلة. ومع ذلك، فإن هذا النهج يقدم عدم اليقين ويعقد تقييم الأداء. بالإضافة إلى ذلك، فإن التغطية المكانية والزمنية غير المتساوية لصور لاندسات، لا سيما في المناطق ذات خطوط العرض العالية والمناطق المعرضة للسحب، تقيد فعالية الدراسة. يجب أن تهدف الأبحاث المستقبلية إلى تعزيز التغطية الزمنية من خلال دمج البيانات من مصادر مكملة مثل أقمار صناعية Sentinel-2 وSentinel-1.

علاوة على ذلك، على الرغم من استخدام عينات تدريب موزعة عالميًا ومصنف مقطع هجين لتحسين دقة الرسم، لا تزال التحديات المتعلقة بأخطاء الإغفال والارتكاب شائعة بسبب الخصائص المكانية والطيفية المعقدة لفئات الخط الساحلي. أظهرت بعض الفئات معدلات دقة أقل من 80%، مما يشير إلى الحاجة إلى تحسين مستمر في منهجيات التصنيف. كما أن حساسية طريقة الاستخراج لتعقيد البيئة واختيار العتبة تشكل تحديات، لا سيما في المناطق الساحلية غير المتجانسة حيث تعقد الغطاء النباتي الكثيف التمييز بين اليابسة والماء. قد لا تكون طريقة العتبة التكيفية OTSU، على الرغم من فعاليتها للتطبيقات العالمية، مثالية في هذه البيئات المتنوعة. علاوة على ذلك، تتأثر دقة المعالجة اللاحقة باختيار حجم العازل، مما قد يؤدي إلى تباين كبير في النتائج، خاصة في المناطق ذات التحولات الساحلية المفاجئة.

Journal: Scientific Data, Volume: 12, Issue: 1
DOI: https://doi.org/10.1038/s41597-025-04430-0
PMID: https://pubmed.ncbi.nlm.nih.gov/39843467
Publication Date: 2025-01-22
Author(s): Jian Zuo et al.
Primary Topic: Coastal and Marine Dynamics

Overview

The research presents the Global CoastLine Dataset (GCL_FCS30), which addresses the lack of detailed coastline typology in existing global datasets. The GCL_FCS30 employs a novel coastline extraction method that combines the Modified Normalized Difference Water Index with an adaptive threshold segmentation approach. For classification, a hybrid transect classifier is utilized, integrating a random forest algorithm with stable training samples from diverse geophysical data sources.

The dataset demonstrates significant advantages in accurately capturing artificial coastlines, achieving an overall classification accuracy exceeding 85% and a Kappa coefficient of over 0.75. Each coastline category effectively represents the majority of areas indicated in third-party datasets, showcasing a high degree of spatial relevance. Notably, the GCL_FCS30 is the first global coastline category dataset to provide a continuous and smooth line vector format for high latitudes, marking a substantial advancement in coastal environmental data representation.

Introduction

The introduction of this research paper highlights the critical vulnerabilities of coastal zones, which are complex Coupled Human-Environment Systems facing numerous environmental challenges, including erosion, sea level rise, and human encroachment. The replacement of natural coastlines with artificial structures necessitates a comprehensive understanding of coastline dynamics for effective sustainable management. Recent advancements in high-resolution earth observation data and cloud computing have facilitated the creation of various coastline monitoring datasets; however, these primarily focus on morphology rather than classification, often lacking long-term categorical information.

To address these gaps, the authors developed the Global CoastLine dataset (GCL_FCS30), which offers a detailed classification system at a 30-meter resolution using long-term Landsat imagery from 2010 to 2020. This dataset categorizes coastlines into six types: sandy, biogenic, rocky, muddy, estuary, and artificial. By generating high-confidence training samples from geophysical datasets, the GCL_FCS30 aims to enhance global coastline classification and provide an open-access resource for understanding the impacts of climate change and urbanization on coastal systems. This dataset is positioned to support coastal management efforts and contribute to achieving sustainable development goals in rapidly changing coastal environments.

Methods

In this study, a systematic approach was employed to map global coastlines utilizing time-series Landsat imagery, as depicted in Figure 2. The methodology encompasses three primary components: pre-processing, coastline extraction, and classification. The pre-processing phase involves several critical steps, including radiometric calibration, atmospheric correction, cloud masking, and mean value compositing to enhance the quality of the imagery.

For coastline extraction, a novel threshold segmentation method based on exponential time series was implemented, ensuring precise delineation of coastlines from the processed imagery. Subsequently, a hybrid transect classifier was developed, leveraging globally distributed and temporally stable training samples to effectively categorize the extracted coastlines. Detailed explanations of each procedural step are provided in subsequent sections of the paper.

Discussion

In this study, Landsat Collection 2 surface reflectance data was utilized to analyze coastline dynamics, leveraging images from the Thematic Mapper (TM), Thematic Mapper Plus (ETM+), and Operational Land Imager (OLI) over a period from 2010 to 2020. The dataset, characterized by a 30-meter spatial resolution and global coverage, allowed for effective monitoring of coastal features while minimizing the influence of tides by selecting images from April to October. The CFmask algorithm was employed to mask low-quality pixels, achieving an accuracy of 96.4%. A global transect system was established to facilitate the classification of coastlines into various categories, enhancing the precision of the analysis. The study incorporated multiple global coastal geospatial datasets to enrich the classification process, ensuring robust identification of coastline types.

The coastline extraction algorithm developed in this research combined the Modified Normalized Difference Water Index (MNDWI) with the Maximum Between-Class Variance (OSTU) method and Canny edge detection, achieving an overall classification accuracy of 87.5% using a hybrid transect classifier based on random forest algorithms. The classifier effectively categorized coastlines into artificial and natural types, with the latter further divided into sandy, biogenic, rocky, muddy, and estuary coastlines. A stratified random sampling algorithm was employed to validate the coastline dataset, ensuring a sample size of 964 points for reliable accuracy assessment. The study’s methodology, including the use of high-resolution imagery for validation and the integration of multispectral and geospatial features, underscores its contribution to advancing coastline mapping and classification techniques.

Limitations

The limitations of the GCL_FCS30 coastline product highlight several critical challenges in coastline monitoring. A primary concern is the absence of in-situ measurements for training data and result evaluation, which hampers the precision and reliability of coastline labels. The study utilized a sample generation method that integrates geospatial data with visual interpretation, a common practice in remote sensing and machine learning. However, this approach introduces uncertainty and complicates performance assessment. Additionally, the uneven spatial and temporal coverage of Landsat imagery, particularly in high-latitude regions and areas prone to cloud cover, further constrains the study’s effectiveness. Future research should aim to enhance temporal coverage by incorporating data from complementary sources like Sentinel-2 and Sentinel-1 satellites.

Moreover, despite employing globally distributed training samples and a hybrid coastal transect classifier to improve mapping accuracy, challenges related to omission and commission errors remain prevalent due to the intricate spatial and spectral characteristics of coastline categories. Some categories demonstrated producer accuracy rates below 80%, indicating a need for ongoing refinement in classification methodologies. The extraction method’s sensitivity to environmental complexity and threshold selection also poses challenges, particularly in heterogeneous coastal regions where dense vegetation complicates the distinction between land and water. The OTSU adaptive thresholding method, while effective for global applications, may not be optimal in these varied environments. Furthermore, the accuracy of post-processing is influenced by buffer size selection, which can lead to significant variability in results, especially in areas with abrupt coastal transitions.