scispace - formally typeset
Search or ask a question
Author

Kristin J. Dana

Bio: Kristin J. Dana is an academic researcher from Rutgers University. The author has contributed to research in topics: Bidirectional texture function & Image texture. The author has an hindex of 36, co-authored 101 publications receiving 6257 citations. Previous affiliations of Kristin J. Dana include Sarnoff Corporation & Columbia University.


Papers
More filters
Journal ArticleDOI
TL;DR: A new texture representation called the BTF (bidirectional texture function) which captures the variation in texture with illumination and viewing direction is discussed, and a BTF database with image textures from over 60 different samples, each observed with over 200 different combinations of viewing and illumination directions is presented.
Abstract: In this work, we investigate the visual appearance of real-world surfaces and the dependence of appearance on the geometry of imaging conditions. We discuss a new texture representation called the BTF (bidirectional texture function) which captures the variation in texture with illumination and viewing direction. We present a BTF database with image textures from over 60 different samples, each observed with over 200 different combinations of viewing and illumination directions. We describe the methods involved in collecting the database as well as the importqance and uniqueness of this database for computer graphics. A related quantity to the BTF is the familiar BRDF (bidirectional reflectance distribution function). The measurement methods involved in the BTF database are conducive to simultaneous measurement of the BRDF. Accordingly, we also present a BRDF database with reflectance measurements for over 60 different samples, each observed with over 200 different combinations of viewing and illumination directions. Both of these unique databases are publicly available and have important implications for computer graphics.

1,370 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN, and can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset.
Abstract: Recent work has made significant progress in improving spatial resolution for pixelwise labeling with Fully Convolutional Network (FCN) framework by employing Dilated/Atrous convolution, utilizing multi-scale features and refining boundaries. In this paper, we explore the impact of global contextual information in semantic segmentation by introducing the Context Encoding Module, which captures the semantic context of scenes and selectively highlights class-dependent featuremaps. The proposed Context Encoding Module significantly improves semantic segmentation results with only marginal extra computation cost over FCN. Our approach has achieved new state-of-the-art results 51.7% mIoU on PASCAL-Context, 85.9% mIoU on PASCAL VOC 2012. Our single model achieves a final score of 0.5567 on ADE20K test set, which surpasses the winning entry of COCO-Place Challenge 2017. In addition, we also explore how the Context Encoding Module can improve the feature representation of relatively shallow networks for the image classification on CIFAR-10 dataset. Our 14 layer network has achieved an error rate of 3.45%, which is comparable with state-of-the-art approaches with over 10A— more layers. The source code for the complete system are publicly available1.

1,235 citations

Proceedings ArticleDOI
17 Jun 1997
TL;DR: The visual appearance of real-world surfaces and the dependence of appearance on imaging conditions is investigated and a BRDF (bidirectional reflectance distribution function) database with reflectance measurements for over 60 different samples, each observed with over 200 different combinations of viewing and source directions is presented.
Abstract: In this work, we investigate the visual appearance of real-world surfaces and the dependence of appearance on imaging conditions. We present a BRDF (bidirectional reflectance distribution function) database with reflectance measurements for over 60 different samples, each observed with over 200 different combinations of viewing and source directions. We fit the BRDF measurements to two recent models to obtain a BRDF parameter database. These BRDF parameters can be directly used for both image analysis and image synthesis. Finally, we present a BTF (bidirectional texture function) database with image textures from over 60 different samples, each observed with over 200 different combinations of viewing and source directions. Each of these unique databases has important implications for a variety of vision algorithms and each is made publicly available.

576 citations

Journal ArticleDOI
TL;DR: A novel automated crack detection algorithm, the STRUM (spatially tuned robust multifeature) classifier, is presented, and results on real bridge data using a state-of-the-art robotic bridge scanning system are demonstrated.
Abstract: Detection of cracks on bridge decks is a vital task for maintaining the structural health and reliability of concrete bridges. Robotic imaging can be used to obtain bridge surface image sets for automated on-site analysis. We present a novel automated crack detection algorithm, the STRUM (spatially tuned robust multifeature) classifier, and demonstrate results on real bridge data using a state-of-the-art robotic bridge scanning system. By using machine learning classification, we eliminate the need for manually tuning threshold parameters. The algorithm uses robust curve fitting to spatially localize potential crack regions even in the presence of noise. Multiple visual features that are spatially tuned to these regions are computed. Feature computation includes examining the scale-space of the local feature in order to represent the information and the unknown salient scale of the crack. The classification results are obtained with real bridge data from hundreds of crack regions over two bridges. This comprehensive analysis shows a peak STRUM classifier performance of 95% compared with 69% accuracy from a more typical image-based approach. In order to create a composite global view of a large bridge span, an image sequence from the robot is aligned computationally to create a continuous mosaic. A crack density map for the bridge mosaic provides a computational description as well as a global view of the spatial patterns of bridge deck cracking. The bridges surveyed for data collection and testing include Long-Term Bridge Performance program's (LTBP) pilot project bridges at Haymarket, VA, USA, and Sacramento, CA, USA.

292 citations

Proceedings ArticleDOI
05 Dec 1994
TL;DR: A real-time system designed to construct a stable view of a scene through aligning images of an incoming video stream and dynamically constructing an image mosaic through the use of the multiresolution coarse-to-fine image registration strategy.
Abstract: We describe a real-time system designed to construct a stable view of a scene through aligning images of an incoming video stream and dynamically constructing an image mosaic. This system uses a video processing unit developed by the David Sarnoff Research Center called the Vision Front End (VFE-100) for the pyramid-based image processing tasks required to implement this process. This paper includes a description of the multiresolution coarse-to-fine image registration strategy, the techniques used for mosaic construction, the implementation of this process on the VFE-100 system, and experimental results showing image mosaics constructed with the VFE-100. >

245 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Abstract: Presents a theoretically very simple, yet efficient, multiresolution approach to gray-scale and rotation invariant texture classification based on local binary patterns and nonparametric discrimination of sample and prototype distributions. The method is based on recognizing that certain local binary patterns, termed "uniform," are fundamental properties of local image texture and their occurrence histogram is proven to be a very powerful texture feature. We derive a generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis. The proposed approach is very robust in terms of gray-scale variations since the operator is, by definition, invariant against any monotonic transformation of the gray scale. Another advantage is computational simplicity as the operator can be realized with a few operations in a small neighborhood and a lookup table. Experimental results demonstrate that good discrimination can be achieved with the occurrence statistics of simple rotation invariant local binary patterns.

14,245 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence that exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories.
Abstract: This paper presents a method for recognizing scene categories based on approximate global geometric correspondence. This technique works by partitioning the image into increasingly fine sub-regions and computing histograms of local features found inside each sub-region. The resulting "spatial pyramid" is a simple and computationally efficient extension of an orderless bag-of-features image representation, and it shows significantly improved performance on challenging scene categorization tasks. Specifically, our proposed method exceeds the state of the art on the Caltech-101 database and achieves high accuracy on a large database of fifteen natural scene categories. The spatial pyramid framework also offers insights into the success of several recently proposed image descriptions, including Torralba’s "gist" and Lowe’s SIFT descriptors.

8,736 citations

Posted Content
TL;DR: This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.
Abstract: There are a huge number of features which are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large datasets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and datasets. We assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT) and Mish-activation. We use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100. Source code is at this https URL

5,709 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: New state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset is achieved without using coarse data.
Abstract: In this paper, we address the scene segmentation task by capturing rich contextual dependencies based on the self-attention mechanism. Unlike previous works that capture contexts by multi-scale features fusion, we propose a Dual Attention Networks (DANet) to adaptively integrate local features with their global dependencies. Specifically, we append two types of attention modules on top of traditional dilated FCN, which model the semantic interdependencies in spatial and channel dimensions respectively. The position attention module selectively aggregates the features at each position by a weighted sum of the features at all positions. Similar features would be related to each other regardless of their distances. Meanwhile, the channel attention module selectively emphasizes interdependent channel maps by integrating associated features among all channel maps. We sum the outputs of the two attention modules to further improve feature representation which contributes to more precise segmentation results. We achieve new state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff dataset. In particular, a Mean IoU score of 81.5% on Cityscapes test set is achieved without using coarse data.

4,327 citations