Efficient ML Techniques for Spatial and Spatial Temporal Analysis of Remote Sensing Data

Usman Nazir
5 min readJul 9, 2023

--

The following research questions are explored and addressed in this article:
1. Reducing unnecessary computations from CNN architectures

2. Combining low compute and high compute CNNs for increased efficiency

3. Combining low resolution with high resolution

4. Improving spatio-temporal analysis in non-Euclidean domains

Reducing unnecessary computations from CNN architectures:

Most of the earlier work on remote sensing is based on analysis of low-resolution multi-spectral imagery from Sentinel-2 and Landsat. For more fine-grained analysis such as object localization processing of high-resolution imagery is required. Such analysis has two key challenges: i) availability of annotated dataset and ii) efficient algorithm that can produce accurate results with limited compute. I addressed these concerns by producing first of its kind 14 class dataset namely Asia14 for South Asian region. The dataset developed during the process includes the Digital Globe RGB images of brick kilns, houses, roads, farms, sparse trees, dense trees, orchards, parking lots, parks, ground and barren lands. The dataset is made publicly available for further research. I employed my model for identifying brick kilns within “Brick-Kiln-Belt” of South Asia. Existing methods particularly, those developed for ImageNet Challenge typically has large number of parameters and cannot be used directly for remotes sensing via fine-tuning. Thus I proposed Tiny-Inception-ResNet-v2, inspired from Inception-ResNet-v2, for Land-use Land-cover (LULC) classification. The framework is developed by training a network on the satellite imagery consisting of 11 different classes of South Asian region. My proposed Tiny-Inception-ResNet-v2 has 2.84x less parameters as compared to vanilla Inception-ResNet-v2. My proposed network architecture, despite fewer parameters and higher speed, outperforms all state-of-the-art architectures employed for recognition of brick kilns. My proposed solution would enable regional monitoring and evaluation mechanisms for the Sustainable Development Goals.

Paper Link: https://arxiv.org/abs/1907.05552

Combining low compute and high compute CNNs for increased efficiency

Modern machine learning techniques have achieved high accuracy for a wide variety of applications, involving large-scale analysis using high-resolution satellite imagery. It requires both accuracy as well as computational efficiency. Tiny-Inception-ResNet-v2 architecture provide a compute efficient solution for LULC classification. But this approach is inefficient to incorporate intra-class variations over large scale spatial analysis. To solve this problem, I proposed a coarse-to-fine strategy consisting of an inexpensive classifier and a detector which work in tandem to achieve high accuracy at low computational cost. More specifically, I proposed a two-stage gated neural network architecture called Kiln-Net. At the first stage, imagery is classified using the ResNet-152 model which filters out over 99% of irrelevant data. At the second stage, a YOLOv3-based object detector is applied to find the precise location of each brick kiln in the candidate regions. The dataset, namely Asia14 consisting of 14,000 Digital Globe RGB images and 14 categories is also developed to train the proposed Kiln-Net architecture. My proposed network architecture is evaluated on approximately 3,300 square km region (3,37,723 image patches) from 14 different cities in five different countries of South Asia. It outperforms state-of-the-art methods employed for the recognition of brick kilns and achieved an average accuracy of 99.96% and average F1 score of 0.91. To the best of our knowledge, it is also 20x faster than existing methods.

Paper Link: https://ieeexplore.ieee.org/document/9115879

Code repository: https://github.com/usmanweb/Codes/tree/main/KilnNet

Combining low resolution with high resolution

Small scale industries particularly bull-trench brick kilns are one of the key sources of air pollution in South Asia often creating hazardous levels of smog that is injurious to human health. To mitigate the climate and health impact of the kiln industry, fine-grained kiln localization at different geographic locations is needed. Kiln localization using multi-spectral remote sensing data such as vegetation indices can result in a noisy estimates whereas relying solely on high-resolution imagery is infeasible due to cost and compute complexities. This paper proposes a fusion of spatio-temporal multi-spectral data with high-resolution imagery for detection of brick kilns within the “Brick-Kiln-Belt” of South Asia. We first perform classification using low-resolution spatio-temporal multi-spectral data from Sentinel-2 imagery by combining vegetation, burn, build up and moisture indices. Next, orientation aware object detector YOLOv3 (with theta value) is implemented for removal of false detections and fine-grained localization. Our proposed technique, when compared with other benchmarks, results in a 21 times improvement in speed with comparable or higher accuracy when tested over multiple countries.

Paper Link: https://arxiv.org/abs/2303.11654

Improving spatio-temporal analysis in non-Euclidean domains

Despite the significant computational overhead, 3D convolution networks fail to address the problems of non-Euclidean domains. We, therefore, plan to exploit Graph Neural Networks which is considered to perform efficiently in non-Euclidean domain problems as well. We propose a generic framework Spatio-Temporal driven Attention Graph Neural Network (STAG-NN) to learn the interactions between dense spatial and sparse temporal data. To apply graph neural networks to the images, the images first need to be represented in the form of graphs. We used SLIC to make superpixels and then construct region adjacency graphs from these superpixels. We propose a novel method: Spatio-Temporal driven Attention Graph Neural Network (STAG-NN) to represent the series of images as spatio-temporal region adjacecy graphs called Temporal-RAGs (T-RAGs). We purposed three approaches for transition classification. In the first approach, we trained a STAG-NN model on the Asia14 dataset for one of 14 land-use classes. To determine transition we did inference on the RAG of each geo-spatial image separately and used those inferences to label the transition based on a simple voting policy. Secondly, we fed the T-RAGs to the spatio-temporal STAG-NN (ST-STAG-NN) directly and changed the number of outputs units of the MLP to 4 instead of 14. Finally, we modified the ST-STAG-NN to give separate readouts for each of the RAGs inside a T-RAG and concatinated the readouts to make a three times in size embedding for T-RAG. We also discuss the qualitative and quantitative analysis for our approaches and make a comparative anlysis of computation and performance with previous models.

Paper link: https://arxiv.org/abs/2303.14322

--

--

Usman Nazir
Usman Nazir

Written by Usman Nazir

Experienced in industry with a PhD in Computer Science, I take pleasure in discovering inventive tool combinations that are ideal for the given problem.

No responses yet