scispace - formally typeset
Search or ask a question

Showing papers by "Santanu Chaudhury published in 2022"


Journal ArticleDOI
TL;DR: In this article, the authors presented the COVID-19 Multi-Task Network (COMiT-Net) which is an automated end-to-end network for COVID19 screening.

27 citations


Proceedings ArticleDOI
08 Dec 2022
TL;DR: Zhang et al. as discussed by the authors proposed a novel attention based framework that combines the strength of feature attention, topological loss and residual learning for root segmentation, which reached state-of-the-art performance on Arabidopsis Root Segmentation Challenge 2021 dataset from Computer Vision in Plant Phenotyping and Agriculture.
Abstract: Root morphological traits are key to monitoring plant growth and development. Traditionally, plant biologists relied on manual or semi-automatic approaches to accurately estimate these traits. With high-throughput acquisition of root image data, the computation of these root traits is currently achieved with automatic image analysis, and in this context, root segmentation is an important pre-processing step. However, this is a challenging task because of (1) diverse root characteristics i.e orientation, size and shape, (2) complex image background, (3) low contrast and (4) varying degrees of self-occlusion. Deep learning methods proposed for root segmentation have mainly focused on conventional pixel-wise losses. In addition, they neglected the relationship between deep features which is crucial for segmentation of thin root structures in the presence of complex backgrounds such as water droplets and leaves. In this paper, we propose a novel attention based framework that combines the strength of feature attention, topological loss and residual learning for root segmentation. The proposed framework has reached state-of-the-art performance on Arabidopsis Root Segmentation Challenge 2021 dataset from Computer Vision in Plant Phenotyping and Agriculture (CVPPA). An ablation study has also been conducted to evaluate the contribution of each module to the proposed framework.

Journal ArticleDOI
TL;DR: In this paper , a group of 25 participants provided their gaze information wearing Tobii Pro Glasses 2 set up at a museum and the corresponding video stream was clipped into 20 videos corresponding to 20 museum exhibits and compensated for user's unwanted head movements.
Abstract: Egocentric vision data captures the first person perspective of a visual stimulus and helps study the gaze behavior in more natural contexts. In this work, we propose a new dataset collected in a free viewing style with an end-to-end data processing pipeline. A group of 25 participants provided their gaze information wearing Tobii Pro Glasses 2 set up at a museum. The gaze stream is post-processed for handling missing or incoherent information. The corresponding video stream is clipped into 20 videos corresponding to 20 museum exhibits and compensated for user’s unwanted head movements. Based on the velocity of directional shifts of the eye, the I-VT algorithm classifies the eye movements into either fixations or saccades. Representative scanpaths are built by generalizing multiple viewers’ gazing styles for all exhibits. Therefore, it is a dataset with both the individual gazing styles of many viewers and the generic trend followed by all of them towards a museum exhibit. The application of our dataset is demonstrated for characterizing the inherent gaze dynamics using state trajectory estimator based on ancestor sampling (STEAS) model in solving gaze data classification and retrieval problems. This dataset can also be used for addressing problems like segmentation, summarization using both conventional machine and deep learning approaches.



Proceedings ArticleDOI
16 Oct 2022
TL;DR: Wang et al. as discussed by the authors proposed a two-pathway CMRNet (TP-CMRNet) with effective feature integration of spatial and temporal domains at multiple scales for video saliency prediction.
Abstract: Existing dynamic saliency prediction models face challenges like inefficient spatio-temporal feature integration, ineffective multi-scale feature extraction, and lacking domain adaptation because of huge pre-trained backbone networks. In this paper, we propose a two pathway architecture with effective feature integration of spatial and temporal domains at multiple scales for video saliency prediction. Frame and optical flow pathways extract features from video frame and optical flow maps, respectively using a series of cross-concatenated multi-scale residual (CMR) blocks. We name this network as two-pathway CMRNet (TP-CMRNet). Every CMR block follows a feature fusion and attention module for merging features from two pathways and guiding the network to weigh salient regions, respectively. A bi-directional LSTM module is used for learning the task by looking at previous and next video frames. We build a simple decoder for feature reconstruction into the final attention map. TP-CMRNet is comprehensively evaluated using three benchmark datasets: DHF1K, Hollywood-2, and UCF sports. We observe that our model performs at par with other deep dynamic models. In particular, we outperform all the other models with a lesser number of model parameters and lower inference time.