Research Domain

I primarily work at the intersection of deep learning and remote sensing and have experience with image classification, dimensionality reduction, multimodal learning and image fusion, hyperspectral unmixing and visual question answering and shape retrieval. Some of the research works have been highlight below:

Cross Domain 3D Shape Retrieval: 3D shape retrieval from images and sketches involves the task of finding corresponding 3D shapes in a database based on 2D queries, such as images or sketches. A significant challenge in this area is the cross-domain differences between these 2D queries and the 3D shapes, which include variations in style, abstraction, and representation between domains. These differences make it difficult for standard retrieval methods to accurately match 2D queries with 3D shapes, as they typically do not account for the distribution shifts that occur between training and testing data. Addressing these challenges requires advanced techniques that can effectively align the different modalities and domains, ensuring accurate retrieval despite the cross-domain variations. The figure below shows a schematic where I have addressed domain alignment in cross-domain 3D shape retrieval.

Visual Question Answering in Remote Sensing: Visual Question Answering (VQA) in remote sensing, often referred to as RSVQA, is a task that leverages natural language queries to extract and interpret information from satellite or aerial images. Unlike traditional remote sensing methods, which are usually tailored to specific tasks like land cover classification or object detection, RSVQA aims to provide a more accessible way to interact with complex remote sensing data. The main challenge lies in bridging the gap between the high-level semantic understanding required to answer questions and the low-level pixel data in the images. Additionally, the diversity in image resolution and the specialized nature of the information make it difficult to develop models that can generalize across different datasets. Moreover, accurate responses require integrating contextual knowledge, such as geographic or relational data, which further complicates the task. Schematic and results from one of the my research works focussing on using cross-attention with mutual information maximization is presented below.

Semi-supervised Learning: Semi-supervised learning plays a crucial role in remote sensing, where the challenge of limited labeled data is often a significant hurdle. In remote sensing, obtaining labeled data—whether it's satellite images or aerial photos—is typically an expensive and labor-intensive process. Semi-supervised learning offers a practical solution by combining the small amount of labeled data with the large volumes of unlabeled data that are readily available. The inclusion of vast amount of unlabelled data not only enables the models to learn the intrinsic distribution, but presence of few annotations guide the models in the right direction in the task-specific manner. The schematic presented here shows the semi-supervised learning in the contrastive learning based setting.

Image Enhancement: Image enhancement is a process of increasing the quality of images by removing defects from them. The defects may be the results of sensor problems, low resolution cameras, poor lighting etc. In case of remote sensing as well, the challenge of mitigating image defects is crucial, that may creep in as noise, holes, poor resolution images, night time acquisition and other reasons. Deep learning has been a very effective tool for image restoration in the present time. However, with increasing image data, there has been a need for faster and more efficient restoration frameworks. In addition, the restored images, beside being visually accurate, should also perform well for machine based interpretation tasks. As a researcher, I also actively in this area, to create efficient and generalizable models of image enhancement.

Hyperspectral Dimensionality Reduction: Hyperspectral images have been one of the most popular imaging modalities, primarily in deep learning domain owing to their high spectral resolution. This makes them very significant for tasks such as vegetation monitoring, where the spectral information is dominant and classes are spectrally similar. However, these images have high number of channels (even 2500), and training learning based models with them requires large amount of annotation. To this end, dimensionality reduction, either with feature selection or with feature extraction, is one of the most sought out techniques, especially for classification and segmentation tasks. The former one selects the most informative subset of channels, while the latter projects the data from high dimensional to a low dimensional space. The low dimensional features are then used for downstream tasks like classification, with limited annotations. The schematic here presents an autoencoder, based to bidirectional GRU, to project the high dimensional HSI from 360 to 10 dimensions.

Hyperspectral Unmixing: Hyperspectral images, owing to large number of channels, suffer from relatively lower spatial resolution, since the energy from the electromagnetic spectrum is divided among multiple channels. Owing to this, the images often have problem of mixed pixels, where multiple classes are combined in a single pixel (a representation of the same is shown below). HSI unmixing techniques are thus quite popular to understand the abundances of the classes in the so called "mixels". To this end, deep learning techniques are quite popular in present times, owing to their ability in modelling non-linear relationships, combining spatial-and spectral information, data driven approach leading to limited assumptions, among several others. Though, like other DL models, the techniques require lots of data and could have high computation costs leading to scalability issues, making it yet another prominent area of research.

Image Classification: Image classification is one of the hottest topics in the remote sensing domain. The idea here is to research on and develop the techniques that can correctly identify the land use/ land cover classes present in the remotely sensed images. However, the problem still remains to be challenging one from different perspectives such as identifying the most suitable features/ bands, getting significantly good classification results with limited training samples, ensure that models are trained within limited time. As a researcher, I am working on mitigating these problems and create more robust classification models. The figure below shows the the HSI classification maps from a recent research that uses attention mechanism for classification.

Missing Modality Prediction: Sometimes, it happens in remote sensing domain that while training the classification models, all the the modalities are available. However, during model deployment, a few of the features are not available. In such a scenario, it becomes important that the model is robust enough to handle the missing modality and give accurate predictions. This is another aspect of hyperspectral images that is the focus of PhD research. Here, I am working on the concept of knowledge distillation, using which the features of absent modality could be mimicked during the deployment phase, thus compensating for its absence. The figure here presents the components of the modality prediction/distillation framework (from my research), to compensate for the missing bands .

Multimodal Fusion: The goal here is to work simultaneously with multiple sources of remote sensing data such as hyperspectral, multispectral, synthetic aperture radar (SAR), light detection and ranging (LiDAR). This proves to be challenging task because different modalities have different set of unique characteristics which make their simultaneous processing difficult. Furthermore, acquisition of data from different sensors also leads to difficulty in creating a mapping between the two modalities. I am researching on the efficient ways to combine the HSI data with different modalities (such as LiDAR, because of its inherent elevation information and SAR, for phase information, active sensing and better penetration) for better classification performance. A successful deep learning based HSI-LiDAR fusion framework is proposed and shown below.

Key Areas

Remote Sensing, Machine Learning, Deep Learning, Hyperspectral Images, LiDAR, Image Processing