Peer Reviewed Research

Postdoc 

Plant detection from ultra high resolution remote sensing images: A Semantic Segmentation approach based on fuzzy loss

Abstract: In this study, we tackle the challenge of identifying plant species from ultra high resolution (UHR) remote sensing images. Our approach involves introducing an RGB remote sensing dataset, characterized by millimeter-level spatial resolution, meticulously curated through several field expeditions across a mountainous region in France covering various landscapes. The task of plant species identification is framed as a semantic segmentation problem for its practical and efficient implementation across vast geographical areas. However, when dealing with segmentation masks, we confront instances where distinguishing boundaries between plant species and their background is challenging. We tackle this issue by introducing a fuzzy loss within the segmentation model. Instead of utilizing one-hot encoded ground truth (GT), our model incorporates Gaussian filter refined GT, introducing stochasticity during training. First experimental results obtained on both our UHR dataset and a public dataset are presented, showing the relevance of the proposed methodology, as well as the need for future improvement.

igarss_ppt_wSixP_2

Mapping Earth Mounds From Space

Abstract: Regular patterns of vegetation are considered widespread landscapes, although their global extent has never been estimated. Among them, spotted landscapes are of particular interest in the context of climate change. Indeed, regularly spaced vegetation spots in semi-arid shrublands result from extreme resource depletion and prefigure catastrophic shift of the ecosystem to a homogeneous desert, while termite mounds also producing spotted landscapes were shown to increase robustness to climate change. Yet, their identification at large scale calls for automatic methods, for instance using the popular deep learning framework, able to cope with a vast amount of remote sensing data, e.g., optical satellite imagery. In this paper, we tackle this problem and benchmark some state-of-the-art deep networks on several landscapes and geographical areas. Despite the promising results we obtained, we found that more research is needed to be able to map automatically these earth mounds from space.

igarss_ppt_mounds2

Ph.D.

PhD Thesis: Hyperspectral Image Analysis in Single-Modal and Multimodal setting using Deep Learning Techniques

Abstract: Hyperspectral imaging provides precise classification for land use and cover due to its exceptional spectral resolution. However, the challenges of high dimensionality and limited spatial resolution hinder its effectiveness. This study addresses these challenges by employing deep learning techniques to efficiently process, extract features, and classify data in an integrated manner. To enhance spatial resolution, we integrate information from complementary modalities such as LiDAR and SAR data through multimodal learning. Moreover, adversarial learning and knowledge distillation are utilized to overcome issues stemming from domain disparities and missing modalities. We also tailor deep learning architectures to suit the unique characteristics of HSI data, utilizing 1D convolutional and recurrent neural networks to handle its continuous spectral dimension. Techniques like visual attention and feedback connections within the architecture bolster the robustness of feature extraction. Additionally, we tackle the issue of limited training samples through self-supervised learning methods, employing autoencoders for dimensionality reduction and exploring semi-supervised learning techniques that leverage unlabeled data. Our proposed approaches are evaluated across various HSI datasets, consistently outperforming existing state-of-the-art techniques.

phd_defence_2

Domain Adaptive 3D Shape Retrieval from Monocular Images

Abstract: In this work, we address the novel and challenging problem of domain adaptive 3D shape retrieval from single 2D images (DA-IBSR). While the existing image-based 3D shape retrieval (IBSR) problem focuses on modality alignment for retrieving a matchable 3D shape from a shape repository given a 2D image query, it does not consider any distribution shift between the training and testing image- shape pairs, making the performance of off-the-shelve  IBSR methods subpar. In contrast, the proposed DA-IBSR addresses the non-trivial problem of modality shift as well distribution shift across training and test sets. To address these issues, we propose an end-to-end trainable model called DAIS-NET. Our objective is to align the images and shapes separately from both domains while simultaneously learn a shared embedding space for the 2D and 3D modalities. The former problem is addressed by separately employing maximum mean discrepancy loss across the 2D images and 3D shapes of the two domains. To address the modality alignment, we incorporate the notion of negative sample mining and employ triplet loss to bridge the gap be- tween positive 2D-3D pairs (of same class) and increase the separation between negative 2D-3D pairs (of different class). Additionally, we employ an entropy minimization strategy to align the unlabeled target domain data in thsemantic space. To evaluate our proposed approach, we define the experimental setting of DA-IBSR on the following benchmarks: SHREC’14 ↔ Pix3D and ShapeNet ↔ SHREC’14. Considering the novelty of the problem statement, we have demonstrated that the issue of domain gap is prevalent by comparing our method with the existing literature. Additionally, through extensive evaluations, we demonstrate the capability of DAIS-NET to successfully mitigate this domain gap in image based 3D shape retrieval.

PosterPresentations.com-48x48-Template-V0

Semi-Supervised Learning for Hyperspectral Images by Non Parametrically Predicting View Assignment

Abstract: Hyperspectral image (HSI) classification is gaining a lot of momentum in present time because of high inherent spectral information within the images. However, these images suffer from the problem of curse of dimensionality and usually require a large number samples for tasks such as classification, especially in supervised setting. Recently, to effectively train the deep learning models with minimal labelled samples, the unlabeled samples are also being leveraged in self-supervised and semi-supervised setting. In this work, we leverage the idea of semi-supervised learning to assist the discriminative self-supervised pretraining of the models. The proposed method takes different augmented views of the unlabeled samples as input and assigns them the same pseudo-label corresponding to the labelled sample from the downstream task. We train our model on two HSI datasets, anemly Houston dataset (from data fusion contest, 2013) and Pavia university dataset, and show that the proposed approach performs better than self-supervised approach and supervised training.

igarss_ssl

Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck

Abstract: In this research, we deal with the problem of visual question answering (VQA) in remote sensing. While remotely sensed images contain information significant for the task of identification and object detection, they pose a great challenge in their processing because of high dimensionality, volume and redundancy. Furthermore, processing image information jointly with language features adds additional constraints, such as mapping the corresponding image and language features. To handle this problem, we propose a cross attention based approach combined with information maximization. The CNN-LSTM based cross-attention highlights the information in the image and language modalities and establishes a connection between the two, while information maximization learns a low dimensional bottleneck layer, that has all the relevant information required to carry out the VQA task. We evaluate our method on two VQA remote sensing datasets of different resolutions. For the high resolution dataset, we achieve an overall accuracy of 79.11% and 73.87% for the two test sets while for the low resolution dataset, we achieve an overall accuracy of 85.98%.

VQA_ppt

Self-supervision assisted multimodal remote sensing image classification with coupled self-looping convolution networks

Abstract: Recently, remote sensing community has seen a surge in the use of multimodal data for different tasks such as land cover classification, change detection and many more. However, handling multimodal data requires synergistically using the information from different sources. Currently, deep learning (DL) techniques are being religiously used in multimodal data fusion owing to their superior feature extraction capabilities. But, DL techniques have their share of challenges. Firstly, DL models are mostly constructed in the forward fashion limiting their feature extraction capability. Secondly, multimodal learning is generally addressed in a supervised setting, which leads to high labelled data requirement. Thirdly, the models generally handle each modality separately, thus preventing any cross-modal interaction. Hence, we propose a novel self-supervision oriented method of multimodal remote sensing data fusion. For effective cross-modal learning, our model solves a self-supervised auxiliary task to reconstruct input features of one modality from the extracted features of another modality, thus enabling more representative pre-fusion features. To counter the forward architecture, our model is composed of convolutions both in backward and forward directions, thus creating self-looping connections, leading to a self-correcting framework. To facilitate cross-modal communication, we have incorporated coupling across modality-specific extractors using shared parameters. We evaluate our approach on three remote sensing datasets, namely Houston 2013 and Houston 2018, which are HSI-LiDAR datasets and TU Berlin, which is an HSI-SAR dataset, where we achieve the respective accuracy of 93.08%, 84.59% and 73.21%, thus beating the state of the art by a minimum of 3.02%, 2.23% and 2.84%.

SS_HSI_LSAR

Feedback convolution based autoencoder for dimensionality reduction in hyperspectral images

Abstract: Hyperspectral images (HSI) possess a very high spectral res-olution (due to innumerous bands), which makes them invalu-able in the remote sensing community for landuse/land cover classification. However, the multitude of bands forces the algorithms to consume more data for better performance. To tackle this, techniques from deep learning are often explored, most prominently convolutional neural networks (CNN) based autoencoders. However, one of the main limitations of conventional CNNs is that they only have forward connections. This prevents them to generate robust representations since the information from later layers is not used to refine the earlier layers. Therefore, we introduce a 1D-convolutional autoencoder based on feedback connections for hyperspec-tral dimensionality reduction. Feedback connections create self-updating loops within the network, which enable it to use future information to refine past layers. Hence, the low dimensional code has more refined information for efficient classification. The performance of our method is evaluated on Indian pines 2010 and Indian pines 1992 HSI datasets, where it surpasses the existing approaches. 

IGARSS22

RSINet: inpainting remotely sensed images using triple GAN framework

Abstract: We tackle the problem of image inpainting in the remote sensing domain. Remote sensing images possess high resolution and geographical variations, that render the conventional inpainting methods less effective. This further entails the requirement of models with high complexity to sufficiently capture the spectral, spatial and textural nuances within an image, emerging from its high spatial variability. To this end, we propose a novel inpainting method that individually focuses on each aspect of an image such as edges, colour and texture using a task specific GAN. Moreover, each individual GAN also incorporates the attention mechanism that explicitly extracts the spectral and spatial features. To ensure consistent gradient flow, the model uses residual learning paradigm, thus simultaneously working with high and low level features. We evaluate our model, alongwith previous state of the art models, on the two well known remote sensing datasets, Open Cities AI and Earth on Canvas, and achieve competitive performance. The code can be referred here: https://github.com/advaitkumar3107/RSINet.

HyperLoopNet: Hyperspectral image classification using multiscale self-looping convolutional networks

Abstract: Hyperspectral image (HSI) classification using convolutional neural networks (CNNs) has always been a hot topic in the field of remote sensing. This is owing to the high level feature extraction offered by CNNs that enables efficient encoding of the features at several stages. However, the drawback with CNNs is that for exceptional performance, they need a deeper and wider architecture along with humongous amount of training data, which is often impractical and infeasible. Furthermore, the reliance on just forward connections leads to inefficient information flow that further limits the classification. Hence, to mitigate these issues, we propose a self-looping convolution network for more efficient HSI classification. In our method, each layer in a self-looping block contains both forward and backward connections, which means that each layer is the input and the output of every other layer, thus forming a loop. These loopy connections within the network allow for maximum information flow, thereby giving us a high level feature extraction. The self-looping connections enable us to efficiently control the network parameters, further allowing us to go for a wider architecture with a multiscale setting, thus giving us abstract representation at different spatial levels. We test our method on four benchmark hyperspectral datasets: Two Houston hyperspectral datasets (DFC 2013 and DFC 2018), Salinas Valley dataset and combined Pavia University and Centre datasets, where our method achieves state of the art performance (highest percentage kappa of 87.28%, 71.08%, 99.24% and 68.44% respectively for the four datasets).

HLN

Adaptive hybrid attention network for hyperspectral image classification

Abstract: Hyperspectral images (HSIs) have their specific characteristic of very high spectral resolution and relatively lower spatial resolution. In this research, we simultaneously try to explicitly learn the spectral and spatial characteristics of HSIs respectively by using 1D CNN and 2D CNN based attention modules. Furthermore, the spectral and spatial enhanced features characteristics of the are combined using a learning mechanism to govern the contribution of each of the module. The resulting features are jointly sent to a 3D CNN based classifier. In addition, a Wasserstein metric based class discrimination constraint is applied to ensure more accurate classification. This is the first time when such a loss is used for class discrimination and the results show its efficacy in doing the same. 

PRL_ppt

PDF 

Dimensionality reduction using 3d residual autoencoder for hyperspectral image classification

Abstract: Hyperspectral images (HSIs) are actively used for land-use/land-cover classification. However, HSIs suffer from the problem of high dimensionality and high spectral-spatial variability leading to requirement of large number of training samples. Deep learning offers several approaches to handle the aforementioned problems but is limited with its own problem of vanishing gradient that creeps in deeper networks (especially CNNs). In this paper, we propose an autoencoder (3D ResAE) that uses 3D convolutions and residual blocks to project the high-dimensional HSI features to a low-dimensional space. 3D convolutions effectively handle the spectral-spatial characteristics whereas, residual block adds an identity mapping, thereby tackling the issue of vanishing gradient. Furthermore, 3D deconvolutions are used to reconstruct the original features, while the network is trained in a semi-supervised manner. Our proposed method is tested on Indian pines and Salinas hyperspectral datasets and the results clearly demonstrate its effectiveness in classification.  

IGARSS

PDF 

FusAtNet: Dual attention based spectrospatial multimodal fusion network for hyperspectral and lidar classification

Abstract: With recent advances in sensing, multimodal data is becoming easily available for various applications, especially in remote sensing (RS), where many data types like multispectral imagery (MSI), hyperspectral imagery (HSI), LiDAR etc. are available. Effective fusion of these multisource datasets is becoming important, for these multimodality features have been shown to generate highly accurate land-cover maps. However, fusion in the context of RS is non-trivial considering the redundancy involved in the data and the large domain differences among multiple modalities. In addition, the feature extraction modules for different modalities hardly interact among themselves, which further limits their semantic relatedness. As a remedy, we propose a feature fusion and extraction framework, namely FusAtNet, for collective land-cover classification of HSIs and LiDAR data in this paper. The proposed framework effectively utilizses HSI modality to generate an attention map using ``self-attention" mechanism that highlights its own spectral features. Similarly, a ``cross-attention" approach is simultaneously used to harness the LiDAR derived attention map that accentuates the spatial features of HSI. These attentive spectral and spatial representations are then explored further along with the original data to obtain modality-specific feature embeddings. The modality oriented joint spectro-spatial information thus obtained, is subsequently utilized to carry out the land-cover classification task. Experimental evaluations on three HSI-LiDAR datasets show that the proposed method achieves the state-of-the-art classification performance, including on the largest HSI-LiDAR dataset available, University of Houston (Data Fusion Contest - 2013), opening new avenues in multimodal feature fusion for classification.

PBVS

PDF 

An adversarial approach to discriminative modality distillation for remote sensing image classification

Abstract: We deal with the problem of modality distillation for the purpose of remote sensing (RS) image classification by exploring the deep generative models. From the remote sensing perspective, this problem can also be considered in line with the missing bands problem frequently encountered due to sensor abnormality. It is expected that different modalities provide useful complementary information regarding a given task, thus leading to the training of a robust prediction model. Although training data may be collected from different sensor modalities, it is many a time possible that not all the information are readily available during the model inference phase. This paper tackles the problem by proposing a novel adversarial training driven hallucination architecture which is capable of learning discriminative feature representations corresponding to the missing modalities from the available ones during the test time. To this end, we follow a teacher-student model where the teacher is trained on the multimodal data (learning with privileged information) and the student model learns to subsequently distill the feature descriptors corresponding to the missing modality. Experimental results obtained on the benchmark hyperspectral (HSI) datasets and another dataset of multispectral (MS)-panchromatic (PAN) image pairs confirm the efficacy of the proposed approach. In particular, we find that the student model is consistently able to surpass the performance of the teacher model for HSI datasets.

iccv_poster

PDF 

Class reconstruction driven adversarial domain adaptation for hyperspectral image classification

Abstract: We address the problem of cross-domain classification of hyperspectral image (HSI) pairs under the notion of unsupervised domain adaptation (UDA). The UDA problem aims at classifying the test samples of a target domain by exploiting the labeled training samples from a related but different source domain. In this respect, the use of adversarial training driven domain classifiers is popular which seeks to learn a shared feature space for both the domains. However, such a formalism apparently fails to ensure the i) discriminativeness, and ii) non-redundancy of the learned space. 

In general, the feature space learned by domain classifier does not convey any meaningful insight regarding the data. On the other hand, we are interested in constraining the space which is deemed to be simultaneously discriminative and reconstructive at the class-scale. In particular, the reconstructive constraint enables the learning of category-specific meaningful feature abstractions and UDA in such a latent space is expected to better associate the domains. On the other hand, we consider an orthogonality constraint to ensure non-redundancy of the learned space.

Experimental results obtained on benchmark HSI datasets (Botswana and Pavia) confirm the efficacy of the proposal approach.

IbPRIA_ppt

PDF 

Master of Technology (M. Tech.)

Thesis: Land use/land cover classification of fused Sentinel-1 and Sentinel-2 imageries using ensembles of Random Forests

Abstract: The information obtained from imageries generated from Synthetic Aperture Radar (SAR) and Visible-Near Infrared-Short Wave Infrared (VNIR-SWIR) can be synergistically combined through the technique of image fusion and used for land use/land cover (LULC) classification. One of the objective of this thesis is to study the effect of image fusion of SAR (in the form of texture band) and VNIR-SWIR imageries on LULC classification. Image fusion is performed using Bayesian fusion while random forests, one of the most popular supervised classification techniques has been used for classification. However, random forests have limitations such as inability to perform well with less number of features and stagnation in the accuracy after a certain number of decision trees. In addition, randomization leads to different predictions on the same test set for same classifier with same parameters. Therefore, the other objective of this thesis is to address these limitations by creating ensembles of random forests (RFE) after introducing random rotations in the training set (based on Forest-RC algorithm). Three approaches are used for rotation: principal component analysis (PCA), sparse random rotation (SRP) matrix and complete random rotation (CRP) matrix. To train and test these classifiers, SAR data from Sentinel-1 and VNIR-SWIR data from Sentinel-2 has been used for the study area of IIT-Kanpur and surrounding region. Five kinds of datasets are created for training: i) SAR, ii) SAR stacked with texture, iii) VNIR-SWIR, iv) VNIR-SWIR stacked with texture and v) VNIR-SWIR fused with texture. Using these datasets, not only the efficacy of the classifiers is studied but the effect of fusion of SAR and VNIR-SWIR data on classification has also been researched on. In addition, the execution speed of Bayesian fusion code is also increased to as high as 3000 times for 700 x 700 image. The SRP based RFE performs the best among the ensembles for the first two datasets giving average overall kappa of 61.80% and 68.18% respectively while CRP based RFE performs the best for the last three datasets with respective average overall kappa values of 95.99%, 96.93% and 96.30%. Among the datasets, highest overall kappa of 96.93% is observed for the fourth dataset. In addition, using texture with SAR bands leads to maximum increment of 10.00% in overall kappa while maximum increment of about 3.45% is observed by adding texture to VNIR-SWIR bands.

Defence_Presentation

PDF