Evaluation Of Segment Anything Model (SAM) For Automated Labelling in Machine Learning Classification of UAV Geospatial Data

doi:10.21203/rs.3.rs-4107374/v1

Download PDF

Research Article

Evaluation Of Segment Anything Model (SAM) For Automated Labelling in Machine Learning Classification of UAV Geospatial Data

https://doi.org/10.21203/rs.3.rs-4107374/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

With the present trend toward digitization in many areas of urban planning and development, accurate object classification is becoming increasingly vital. To develop machine learning models that can effectively classify the broader region, it is crucial to have accurately labeled datasets for object extraction. However, the process of generating sufficient labeled data for machine learning models remains challenging. A recently developed AI-assisted segmentation approach called Segment Anything Model (SAM) offers a solution to enhance the semi-automated labeling of complex and intricate image structures. By utilizing SAM, the accuracy and consistency of annotation results can be improved, while also significantly reducing the time required for annotation. This paper aims to assess the efficiency of SAM annotated labels for training machine learning models using high-resolution remote sensing data captured by UAVs (Unmanned Aerial Vehicles) in the peri-urban region of Anad, Kerala, India. A comparative analysis was conducted to evaluate the performance of training datasets generated using SAM and semi-automated labeling with existing tools. Multiple machine learning models, including Random Forest, Support Vector Machine, and XGBoost, were employed for this analysis. The results, as indicated by the classification accuracy of the machine learning models, demonstrate that the SAM-based segmentation approach is more efficient in generating semantic labels for accurately training the models.

SAM

labelling

Meta

Labelling in the context of machine learning pertains to the process of associating meaningful and descriptive tags, commonly referred to as labels or annotations, with the input data (Mjolsness and DeCoste 2001). These labels play a crucial role as they serve as the ground truth or target output for the machine learning model during its training phase. By providing these labels, the model is able to learn and identify patterns from the labelled data, thereby enabling it to make accurate predictions or classifications on new and unseen data (Dietterich, 2002). Traditionally, labelling has been carried out through manual or semi-automatic methods, involving human annotators or domain experts assigning labels to individual data instances by hand. However, these approaches can be quite time-consuming and labour-intensive, particularly when dealing with large datasets. To address this challenge, automated labelling techniques have been developed, which offer faster and more efficient processing capabilities. This allows machine learning models to be trained on larger volumes of data. Unlike human annotators, who may introduce inconsistencies in labelling due to varying interpretations or biases, automated labelling ensures a higher level of consistency across the entire dataset. Consequently, this leads to more reliable model training (Ng et al. 2001; Sebastian 1998; Russakovsky et al. 2015).

Labelling in geospatial datasets holds immense importance due to several reasons, as it plays a pivotal role in enhancing the effectiveness and dependability of geospatial analyses. The accuracy of labelling ensures that the algorithm learns from precise and representative examples, thereby improving its capability to generalize and make precise predictions on unseen data. The process of labelling in geospatial datasets is indispensable for a wide array of applications, ranging from training machine learning models to facilitating decision-making in diverse domains such as emergency response, navigation, infrastructure planning, and environmental monitoring. The quality and precision of labels directly influence the reliability and effectiveness of geospatial analyses and applications.

The utilization of Unmanned Aerial Vehicles (UAVs), commonly known as drones, has witnessed a significant rise in the realm of geospatial applications, and this trend is projected to continue its growth trajectory. UAVs offer a cost-effective alternative to manned aerial surveys or satellite imagery, particularly for smaller-scale projects or areas that necessitate frequent updates. The process of processing ultra-high-resolution images obtained from UAVs for geospatial applications encompasses multiple steps aimed at extracting meaningful information and generating accurate output. The success of the classification process relies on the quality of the training data, the appropriateness of the chosen model, and the meticulous tuning of parameters throughout the entire process.

The quality of the training data holds paramount importance in ensuring the success of machine learning models employed in the processing of UAV data. The training data serves as the bedrock for the model to comprehend patterns and relationships within the data, and its quality directly impacts the accuracy, generalization, and ability of the model to make meaningful predictions. High-quality training data guarantees that the machine learning model acquires precise representations of the various land cover or land use classes present in the UAV imagery. Precise annotations and labels play a crucial role in enabling the model to differentiate between various classes, ultimately enhancing classification accuracy and consequently improving decision-making. The utilization of automated labelling tools, rooted in machine learning methodologies, has facilitated the automation of the labelling process (Aksoy et al. 2012; Ghamisi & Benediktsson 2014).

The Segment Anything Model (SAM) is an innovative AI platform that offers automated labelling capabilities. Developed by Meta AI, SAM serves as a foundational model for image segmentation (Kirillov et al. 2023). It has been trained on a vast dataset of images and masks, enabling it to accurately segment various objects within images. SAM utilizes advanced techniques such as Convolutional Neural Networks (CNNs) and transformers specifically designed for natural language processing. These techniques allow SAM to learn the relationships between objects in images and focus on specific parts of an image when making predictions, thanks to the attention mechanism incorporated into its architecture. One of SAM's notable strengths is its ability to excel in zero-shot learning. This means that SAM can adapt to diverse prompts, including clicks, bounding boxes, and textual descriptions, without explicit training on the specific data. Instead, it leverages knowledge from previously learned tasks and concepts to perform tasks on new data (Palatucci et al. 2009). On the other hand, one-shot learning aims to train machine learning models with minimal data exposure, ideally requiring only a single instance per class, for accurate classification or task recognition (Duan et al. 2017). SAM's focus on object segmentation is facilitated by its training using self-supervised learning. This approach involves learning from the intrinsic structure and relationships present within the data itself, eliminating the need for manually labelled examples (Shurrab and Duwairi 2022). In the case of SAM, it learns to segment images by predicting the order of pixels in the image. Initially developed for segmenting RGB datasets, SAM's accurate results have led to its utilization in geospatial datasets as well (Wang et al. 2023). The architecture of SAM consists of an image encoder, prompt encoder, and mask decoder. The image encoder incorporates a Masked Autoencoder (MAE) pre-trained Vision Transformer (ViT) to effectively handle high-resolution images (Kirillov et al. 2023). Overall, SAM's capabilities in automated labelling and its adaptability to different prompts make it a valuable tool for various applications in image segmentation.

Vision Transformers (ViTs) implemented in Self-Attention Mechanisms (SAM) demonstrate superior efficiency in processing high-resolution images compared to Convolutional Neural Networks (CNNs). CNNs frequently encounter challenges related to memory constraints and computational burdens (Dosovitskiy at al 2020). ViTs leverage self-attention mechanisms to effectively capture extensive dependencies across pixels, enabling the model to prioritize crucial segments of an input sequence through comprehension of inter-element relationships (Vaswani et al. 2017). Consequently, SAM proves to be highly suitable for processing high-resolution geospatial datasets.

Previous prominent works in the application of SAM in the field of remote sensing involved testing across multiple datasets using input methods such as bounding boxes, individual points and text descriptors (Osco et al. 2023). Another such work explores evaluation of SAM’s efficacy for image segmentation in agricultural and UGS contexts (Gui et al. 2024).

In our geospatial analysis, we emphasized SamGeo, a module that encompasses various functionalities and serves as a crucial component. This module offers a diverse selection of models, namely ViT-H, ViT-L, and ViT-B, each possessing unique computational and architectural attributes. Notably, ViT-H SAM stands out as an exceptionally efficient model, prompting us to incorporate it into our tests in order to fully harness the capabilities of SamGeo (Kirillov et al. 2023).

In this study, the training pixels produced by the SAM architecture are utilized to train well-known machine learning models including Random Forest, Support Vector Machines (SVM), and XGBoost. A comparison is made between the classification outcomes obtained from these machine learning algorithms trained with labelled datasets.

The study utilized the "Ideaforge Q6 UAV" drone to collect the dataset. The MicaSense RedEdge-MX™ camera was employed to capture multispectral images, operating within a spectrum of 5 bands: Blue (475 nm), Green (560 nm), Red (668 nm), RedEdge (717 nm), and NIR (842 nm). The data acquisition took place in Anad, Nedumangad Taluk, Thiruvananthapuram District, Kerala. The dataset covers an area of 47.9270 hectares or 0.479 square kilometres, with a Ground Sampling Distance (GSD) of 9.12 cm or 3.59 inches. A total of 2545 images were processed, each containing 5 bands, resulting in a cumulative count of 509 images. Figure 1 illustrates the location along with the orthomosaic of the study area.

The methodology employed in this study is presented in Fig. 2.

3.1 Generation of orthophotos

Initially, multispectral images obtained from an unmanned aerial vehicle (UAV) were subjected to processing in Pix4D software to generate an orthomosaic. Pix4D software analyses the images and identifies shared features (known as tie points) in the overlapping regions of the images. Subsequently, by matching these tie points across the images and considering the camera calibration parameters, the software conducts a bundle adjustment to refine the positions and orientations of the cameras relative to each other and the Earth's coordinate system. This process ensures accurate projection of the images onto a unified coordinate system. The culmination of these techniques yields the creation of an orthomosaic, which was then divided into tiles measuring 1000 x 1000 pixels.

3.2 Generating training pixels for labelling

Training pixels were generated using two approaches, 1) By Using Segment Anything Model (SAM). 2) Manual labelling by using QGIS’ SCP plugin.

3.2.1 Generation of training samples through SAM

The SAM model was originally designed to segment RGB datasets, thus initially focusing on processing RGB bands. There are multiple methods available for providing input to SAM during the initial segmentation process. These include: 1) Utilizing coordinates, 2) Employing an Automatic Mask Generator, and 3) Implementing LangSAM, which involves utilizing a text prompt to identify image features and utilize them for segmentation and subsequent classification. In this research, LangSAM was utilized along with the method of utilizing coordinates, also referred to as point feature prompt. To elaborate on the methods employed by LangSAM, segmentation was conducted for 5 classes - trees, buildings, barren land, roads, and water bodies. This was achieved through the sam.predict() function, which requires four parameters: (1) the image to be segmented, (2) the text prompt, (3) the box_threshold parameter, and (4) the text_threshold parameter. The box_threshold and text_threshold parameters in SAM determine the confidence level necessary for the prediction to generate a mask for a box or text. Subsequently, the masks underwent refinement through a diffusion process to enhance their accuracy by smoothing out the edges. Additionally, the point feature prompt was implemented to enhance the segmentation process. By providing coordinates corresponding to specific classes to the predictor and iterating the process, predictions were generated. Individual masks were created for each text prompt, which were then merged to form the final mask. In instances where masks overlapped, the pixel values were adjusted based on the class to which the pixel belonged.

3.2.2 Generation of training labels using manual labeling

The Semi-Automatic Classification Plugin (SCP), a freely available open-source extension for QGIS, expedites supervised classification workflows for remotely sensed imagery. SCP facilitates the delineation of training areas (Regions of Interest, ROIs) directly on the image. These ROIs, which can be established either manually through polygon drawing or by leveraging pre-existing vector data, represent specific land cover classes, such as vegetation or built-up areas. Notably, these ROIs serve as ground truth data, providing the classification algorithm with crucial information regarding the spectral characteristics of each land cover class. Furthermore, SCP offers functionalities for ROI editing, enabling modifications, mergers, splits, and even deletions. Within the context of this work, we emphasized on a manual approach to assure meticulous labeling that accurately reflects the true land cover within the imagery.

3.3 Machine learning based labeled dataset

The dataset that had been labelled was classified using Random Forest (RF), Support Vector Machine (SVM), and Xtreme Boosting (XGBoost) algorithms. SVMs, or Support Vector Machines, are a type of machine learning algorithm commonly used for binary classification tasks. The core idea behind SVM involves finding an optimal hyperplane that effectively separates the two classes within the dataset (Cortes and Vapnik 1995; Cristianini and Shawe-Taylor 2000). This hyperplane is selected in a way that maximizes the margin, which refers to the distance between the two closest data points from different classes. These specific data points, known as support vectors, are what give SVMs their name. SVMs are particularly valuable in scenarios where the data is not linearly separable. They leverage the concept of the kernel trick, which allows them to transform the data into a higher-dimensional space, making it easier to achieve linear separation. This capability enables SVMs to handle complex decision boundaries and accurately predict outcomes even for data with intricate patterns. Additionally, RF was used to process tiles. The Random Forest algorithm is an ensemble learning approach that is used for both classification and regression tasks. It employs multiple decision trees during training by using data bootstrapping and randomly selecting features at each division point (Breiman 2001).

The final prediction is obtained by majority voting (for classification) or averaging (for regression) the predictions from individual trees. Random Forests are robust, perform well with high-dimensional data, and are less prone to overfitting compared to single decision trees. XGBoost, on the other hand, constructs powerful models by sequentially adding weak decision trees that focus on correcting the errors made by their predecessors. Each tree learns from the residuals of the previous one, creating an ensemble that progressively improves prediction accuracy. Regularization techniques are employed to prevent overfitting, and the flexibility of its parameters allows for tuning specific to the task at hand. This boosting approach makes XGBoost a highly efficient algorithm (Chen and Guestrin 2016). In order to compare the results, six use cases were considered.

Model 1 = SAM based RF

Model 2 = Manual Labelling based RF

Model 3 = SAM based SVM

Model 4 = Manual Labelling based SVM

Model 5 = SAM based XGB

Model 6 = Manual Labelling based XGB

3.4 Accuracy Assessment

The classification results were evaluated using the f1-score metric, which is a measure of the model's performance. The f1-score is calculated as the harmonic mean of precision and recall. Precision represents the ratio of correctly predicted positives to all predicted positives, while recall represents the ratio of correctly predicted positives to all actual positives. The f1-score takes into account both the ability to identify true positives and the avoidance of false positives.

F1 score = (2×Precision×Recall) / (Precision + Recall)

Precision = TP/(TP + FP)

Recall = TP/(TP + FN)

TP = True Positives- The number of instances that were correctly classified as positive,

TN = True Negatives- The number of instances that were correctly classified as negative,

FP = False Positives- The number of instances that were incorrectly classified as positive,

FN = False Negatives- The number of instances that were incorrectly classified as negative.

3.5 Knowledge Transfer

Knowledge transfer involves taking knowledge learned in one task or domain and applying it to a different but related task or domain. It aims to improve learning efficiency and performance on the new task by leveraging previously acquired knowledge. This is carried out by reusing a model trained on a large dataset (source task) for a new task (target task) with limited data. Various benefits of this include performance improvement that often leads to better performance on the target task, especially with limited target data, faster learning, which can reduce training time and data requirements for the new task. Also, it can help for generalization, to enhance model generalization to unseen data and new scenarios and thus it allows Knowledge sharing between different tasks and domains. In this research, a knowledge transfer algorithm was tested for a geographically different region from Thiruvallam, Kerala having a similar class distribution.

Masks generated by SAM were compared with the masks generated by manual labelling. The comparison allowed for the calculation of the percentage of overlapping pixels, indicating the accuracy of SAM-generated pixels in relation to the ground truth data. The extracted masks representing different features like buildings, barren land, trees, roads, and water bodies are illustrated in Fig. 4.

These class-specific masks were subsequently merged to create the complete mask for the corresponding tiles, as shown in Fig. 5(b). By comparing, the percentage of shared pixels was calculated. This provides how accurate SAM-generated pixels were with respect to ground truth data. This study evaluated six machine learning models for land cover feature prediction using datasets generated by both SAM and manual labelling.

Model 1 (SAM based RF) demonstrated strong performance in predicting vegetation (86% accuracy) and built-up areas (67% accuracy). An f1-score of 0.78 was observed. In the case of Model 2 (Manual Labelling based RF), 66% accuracy was observed in predicting vegetation and average performance was observed in detection of barren land. An f1 score of 60 percent was observed in this case. Demonstrating 69% accuracy in predicting vegetation, with barren land identification being less prominent, Model 3 which is SAM based SVM struggled to predict other classes apart from these two. Model 4 (manual labelling based SVM) showed 76% accuracy in predicting vegetation, however predicting the other classes remained a challenge for this model. SAM based XGBoost (Model5) provided 83% accuracy in vegetation prediction, 71% in prediction of built-up. Despite exhibiting average predictive capabilities for specific classes like roads, barren lands, and waterbodies, the model exhibited remarkable comprehensiveness by segmenting a wide range of major land cover features. An f1-score of 78 percent was observed in this model. In the case of Model 6, i.e., manual labelling based XGBoost, 76% accuracy was observed in vegetation prediction. Identifying other classes remained a challenge for this model. Visualized results of the SAM based model are depicted in Fig. 6. and class wise comparison is shown graphically in Fig. 7.

Figure 6(a) Result of SAM based RF

Figure 6(b) Result of SAM based SVM

Figure 6(c) Result of SAM based XGBoost

Figure 6(a) Predicted results for model 1- SAM based RF over study area of Anad, Fig. 6(b) Predicted results for model 3- SAM based SVM, over the same area, Fig. 6(c) Predicted results for model 5- SAM based XGBoost

Table: Evaluation of class-wise performances between utilized models

Techniques	Model 1	Model 2	Model 3	Model 4	Model 5	Model 6
Classes	SAM based RF	Manual Labelling based RF	SAM based SVM	Manual Labelling based SVM	SAM based XGB	Manual Labelling based XGB
Vegetation	0.86	0.66	0.69	0.76	0.83	0.76
Built-up	0.67	0.02	0.02	0	0.71	0.01
Road	0.36	0	0	0	0.41	0
Barren land	0.29	0.26	0.27	0	0.29	0.14
Waterbody	0.19	0.01	0	0	0.22	0

Figure 8(a) Study area for knowledge transfer-Thiruvallam

Figure 8(b) Ground truth of study area-Thiruvallam

Figure 8 describes area assigned for testing knowledge transfer. Figure 8(a) Area designated for testing knowledge transfer, Thiruvallam, Kerala, India, Fig. 8(b) Ground truth of the same area created by manual labelling assigned for testing knowledge transfer.

Figure 9(a) Result of pre-trained Model 1 over Thiruvallam

Figure 9(b) Result of pre-trained Model 5 over Thiruvallam

Figure 9 Results of Knowledge Transfer: Fig. 9(a) Predicted result of pre-trained Model 1 (SAM over RF) tested over Thiruvallam; Fig. 9(b) Predicted result of pre-trained Model 5 (SAM over XGB) tested over the same area.

Results of Knowledge Transfer:

Case 1)

SAM based RF:

In case of vegetation, 91 percent f1-score was observed, and prediction in the case of built-up was average. However, predicting other features remained a challenge for this model. The f1-score of model was 0.81.

Case 2)

SAM based XGB:

F1-score of 92 percent was observed in case of vegetation, and prediction in the case of built-up was average. Similar to the previous case, predicting other features remained a challenge for this model too. However, the f1-score of model was observed to be more than the previous model SAM based RF, which was 0.82.

The burgeoning trend of digitization in urban planning and development underscores the growing significance of accurate object classification. To develop robust machine learning models capable of effectively classifying vast regions, large number of meticulously labelled datasets are indispensable for ensuring precise object extraction. High-quality data with precise labels empowers models to differentiate between land cover classes, ultimately enhancing decision-making through accurate classification.

In this paper, the power of Segment Anything Model (SAM) to expedite and enhance the manual labelling process in very high resolution UAV images has been explored. By leveraging SAM, the accuracy and consistency of annotation results can be significantly improved, while simultaneously reducing the time required for manual labelling.

Models trained on datasets labelled with the SAM approach exhibited superior performance compared to their manually labelled counterparts, as measured by the F1-score metric, specifically Model 1 (SAM based RF) and Model 5 (SAM based XGB). This can be attributed to the fact that SAM is trained on massive datasets and learn to identify complex features from images. These features are often more effective for classification tasks compared to those captured through manual labelling. Pre-trained models like SAM often use Convolutional Neural Networks (CNNs) which are specifically designed to extract features from images. These features are a more suitable input for the mentioned algorithms, leading to better performance.

Results of analysis revealed a clear distinction in classification accuracy between vegetation and built-up areas compared to roads, waterbodies, and barren land. While the models achieved high accuracy for vegetation and moderate performance for built-up areas, the other classes proved more challenging. This disparity can be attributed to two key factors: spectral variability and class imbalance.

Predominantly, unlike the distinct spectral signatures of vegetation and built-up areas, roads, waterbodies, and barren land exhibit greater spectral variation. The spectral response of roads is influenced by material composition (asphalt, concrete, gravel), while waterbody appearance changes with depth, algae content, and sun glint. Barren land can further complicate matters by spectrally resembling soil or rocks within vegetated areas.

Furthermore, the dataset employed likely suffers from class imbalance. Vegetation and built-up areas are frequently the dominant land cover types, resulting in the models encountering significantly more training examples for these classes compared to the rarer roads, waterbodies, and barren land. This imbalance can lead classification algorithms to prioritize the dominant classes and underperform on the less frequent ones.

However, SAM-based approaches described in this work can be helpful in the cases of various regions, especially those that are abundant in vegetation and built-up, to manage processes in a resource-efficient manner, requiring minimal time and effort. Given SAM's current stage of development and its limitation of handling only RGB bands rather than multispectral, ongoing research and optimization are expected to enhance its capability in handling intricate spatial features over time. In future, finer level of details of landcover classification may also be attempted so as to understand the flexibility of SAM architecture to identify more complex classes which are spectrally and geometrically different.

This study investigated the efficacy of SAM-generated labelled data for training machine learning models. The research utilizes high-resolution remote sensing data captured by Unmanned Aerial Vehicles (UAVs) over the peri-urban region of Anad, Kerala, India. The study has demonstrated the advantage of automated labelling over manual labelling for geospatial dataset. SAM trained with popular machine learning models yields efficient results. The study also explored the adaptability of the transfer learning approach and achieved a decent accuracy, demonstrating the model's strong capability. In the context of huge geospatial data consisting of intricate features, embedding SAM in the workflow will have great potential to manage processes in a resource-efficient manner, requiring minimal time and effort.

Acknowledgements

The Authors acknowledge the technical support provided by Indian Institute of Space Science & Technology, Thiruvananthapuram, Kerala, India for conducting this research work.

Funding

No funding was received was conducting this research.

Conflict of interest

There is no conflict of interest.

Materials availability

Not Applicable

Data availability

The data that support the findings of this study was captured with the help of instruments provided by, the Indian Institute of Space Science & Technology, Thiruvananthapuram, Kerala, India.

Author contributions

All authors have equally contributed to the work.

Aksoy, S., Yalniz, I. Z., & Tasdemir, K. (2012). Automatic detection and segmentation of orchards using very high resolution imagery. IEEE Transactions on geoscience and remote sensing, 50(8), 3117-3131. https://doi.org/10.1109/TGRS.2011.2180912
Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Chen, T., & Guestrin, C. (2016, August). Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining (pp. 785-794) http://dx.doi.org/10.1145/2939672.2939785
Cortes, C., & Vapnik, V. (1995). "Support-vector networks." Machine Learning, 20(3), 273-297. https://doi.org/10.1007/BF00994018
Cristianini, N., & Shawe-Taylor, J. (2000). "An introduction to Support Vector Machines and other kernel-based learning methods." Cambridge University Press.
Dietterich, T. G. (2002). Machine learning for sequential data: A review. In Structural, Syntactic, and Statistical Pattern Recognition: Joint IAPR International Workshops SSPR 2002 and SPR 2002 Windsor, Ontario, Canada, August 6–9, 2002 Proceedings (pp. 15-30). Springer Berlin Heidelberg.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929. https://doi.org/10.48550/arXiv.2010.11929
Duan, Y., Andrychowicz, M., Stadie, B., Jonathan Ho, O., Schneider, J., Sutskever, I., ... & Zaremba, W. (2017). One-shot imitation learning. Advances in neural information processing systems, 30.
Ghamisi, P., & Benediktsson, J. A. (2014). Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geoscience and remote sensing letters, 12(2), 309-313. DOI: 10.1109/LGRS.2014.2337320 https://doi.org/10.1109/LGRS.2014.2337320
Gui, B., Bhardwaj, A., & Sam, L. (2024). Evaluating the Efficacy of Segment Anything Model for Delineating Agriculture and Urban Green Spaces in Multiresolution Aerial and Spaceborne Remote Sensing Images. Remote Sensing, 16(2), 414. https://doi.org/10.3390/rs16020414
Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., ... & Girshick, R. (2023). Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 4015-4026).
https://doi.org/10.48550/arXiv.2304.02643
Mjolsness, E., & DeCoste, D. (2001). Machine learning for science: state of the art and future prospects. science, 293(5537), 2051-2055. https://doi.org/10.1126/science.293.5537.2051
Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 14.
Osco, L. P., Wu, Q., de Lemos, E. L., Gonçalves, W. N., Ramos, A. P. M., Li, J., & Junior, J. M. (2023). The segment anything model (sam) for remote sensing applications: From zero to one shot. International Journal of Applied Earth Observation and Geoinformation, 124, 103540. https://doi.org/10.1016/j.jag.2023.103540
Palatucci, M., Pomerleau, D., Hinton, G. E., & Mitchell, T. M. (2009). Zero-shot learning with semantic output codes. Advances in neural information processing systems, 22.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. International journal of computer vision, 115, 211-252. https://doi.org/10.1007/s11263-015-0816-y
Sebastian Thrun, Learning to play the game of chess, Springer Science & Business Media, 1998.
Shurrab, S., & Duwairi, R. (2022). Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Computer Science, 8, e1045. https://doi.org/10.7717/peerj-cs.1045
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.
Wang, D., Zhang, J., Du, B., Tao, D., & Zhang, L. (2023). Scaling-up remote sensing segmentation dataset with segment anything model. arXiv preprint arXiv:2305.02034.

No competing interests reported.

Download PDF

Editorial decision: Revision requested
29 May, 2024
Reviews received at journal
27 May, 2024
Reviews received at journal
20 May, 2024
Reviews received at journal
14 May, 2024
Reviewers agreed at journal
08 May, 2024
Reviewers agreed at journal
06 May, 2024
Reviewers agreed at journal
06 May, 2024
Reviewers agreed at journal
04 May, 2024
Reviewers agreed at journal
04 May, 2024
Reviewers agreed at journal
02 May, 2024
Reviewers agreed at journal
29 Mar, 2024
Reviewers agreed at journal
26 Mar, 2024
Reviewers agreed at journal
26 Mar, 2024
Reviewers invited by journal
24 Mar, 2024
Editor assigned by journal
24 Mar, 2024
Submission checks completed at journal
15 Mar, 2024
First submitted to journal
15 Mar, 2024

You are reading this latest preprint version

Evaluation Of Segment Anything Model (SAM) For Automated Labelling in Machine Learning Classification of UAV Geospatial Data

Status:

Version 1

Abstract

Figures

1 Introduction

2 Data Acquisition and Study Area

3 Data Processing

3.1 Generation of orthophotos

3.2 Generating training pixels for labelling

3.2.1 Generation of training samples through SAM

3.2.2 Generation of training labels using manual labeling

3.3 Machine learning based labeled dataset

3.4 Accuracy Assessment

3.5 Knowledge Transfer

4 Results

5 Discussions

6 Conclusion

Declarations

References

Additional Declarations

Status:

Version 1