다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

In this work, we revisit atrous convolution, a powerful tool to explicitly adjust filter's field-of-view as well as control the resolution of feature responses computed by Deep Convolutional Neural Networks, in the application of semantic image segmentation. To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates. Furthermore, we propose to augment our previously proposed Atrous Spatial Pyramid Pooling module, which probes convolutional features at multiple scales, with image-level features encoding global context and further boost performance. We also elaborate on implementation details and share our experience on training our system. The proposed `DeepLabv3' system significantly improves over our previous DeepLab versions without DenseCRF post-processing and attains comparable performance with other state-of-art models on the PASCAL VOC 2012 semantic image segmentation benchmark.

Rethinking Atrous Convolution for Semantic Image Segmentation

“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告

The PASCAL Visual Object Classes Challenge

Semantic segmentation requires both rich spatial information and sizeable receptive field. However, modern approaches usually compromise spatial resolution to achieve real-time inference speed, which leads to poor performance. In this paper, we address this dilemma with a novel Bilateral Segmentation Network (BiSeNet). We first design a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features. Meanwhile, a Context Path with a fast downsampling strategy is employed to obtain sufficient receptive field. On top of the two paths, we introduce a new Feature Fusion Module to combine features efficiently. The proposed architecture makes a right balance between the speed and segmentation performance on Cityscapes, CamVid, and COCO-Stuff datasets. Specifically, for a 2048 \(\times \) 1024 input, we achieve 68.4% Mean IOU on the Cityscapes test dataset with speed of 105 FPS on one NVIDIA Titan XP card, which is significantly faster than the existing methods with comparable performance.

/pdf/bisenet-bilateral-segmentation-network-for-real-time-2pc7rh0tw1.pdf

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Recently, using large visual vocabulary or codebooks to quantize and partition the set of local feature descriptors into large set of disjoint subsets termed visual words (or large visual words) has become an important research topic in solving many computer vision problems including near duplicate image retrieval, object retrieval, etc. Generally, large visual words means a heavy burden on the cost of time and memory space for both the construction of large vocabulary and the searching process, especially for large scale applications. In this paper, we present an efficient generation approach of large visual words with a very compact vocabulary, namely two dictionaries learned with sparse non-negative matrix factorization (NMF). After piecewise sparse decomposition of features with two learned dictionaries, we map a pair of indices of the dictionary's bases corresponding to the maximum elements of the two sparse codes to a large set of visual words upon the assumption that data with similar properties will share the same base with the largest sparse coefficient. With the help of an inverted file structure built through the large visual words, K-nearest neighbors (KNN) can be efficiently retrieved. Therefore, we can classify images very efficiently with the incorporation of our fast KNN search based on large visual words into SVM-KNN method. Experiments on the public Oxford dataset, and ACM Multimedia 2013 Yahoo! image classification challenge dataset show that our approach is both effective and efficient.

Large visual words for large scale image classification

Automatic mass detection in breast dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) helps to reduce the workload of radiologists and improves diagnostic accuracy. However, most of the existing methods rely on hand-crafted features followed by rule-based or shallow machine learning based detection methods. Due to the limited expressive power of hand-crafted features, the diagnostic performances of existing methods are usually unsatisfactory. In this work, we aim to leverage recent deep learning techniques for breast lesion detection and propose the Spatiotemporal Breast Mass Detection Networks (MD-Nets) to detect the masses in the 4D DCE-MRI images automatically. Simulating the clinical diagnosis process, we initially generate image-based candidates from all individual images and then construct a spatiotemporal 4D data to classify mass by using the convolutional long short-term memory network (ConvLSTM) to incorporate kinetic and spatial characteristics. Moreover, we collect a DCE-MRI dataset containing 21,294 annotated images from 172 studies. In experiments, we achieve an AUC of 0.9163 with a sensitivity of 0.8655 and a specificity of 0.8452, which verifies the effectiveness of our method.

Spatiotemporal Breast Mass Detection Network (MD-Net) in 4D DCE-MRI Images

In this paper, we propose a novel non-negative matrix factorization (NMF) to the affinity matrix for document clustering, which enforces non-negativity and orthogonality constraints simultaneously. With the help of orthogonality constraints, this NMF provides a solution to spectral clustering, which inherits the advantages of spectral clustering and presents a much more reasonable clustering interpretation than the previous NMF-based clustering methods. Furthermore, with the help of non-negativity constraints, the proposed method is also superior to traditional eigenvector-based spectral clustering, as it can inherit the benefits of NMF-based methods that the non-negative solution is institutive, from which the final clusters could be directly derived. As a result, the proposed method combines the advantages of spectral clustering and the NMF-based methods together, and hence outperforms both of them, which is demonstrated by experimental results on TDT2 and Reuters-21578 corpus.

Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization

For conveniently navigating and editing the news programs, it is very important to segment the video into meaningful units. The effective indexing of news videos can be fulfilled by the anchorperson shot because it is an indicator which denotes the occurrence of upcoming news stories. The paper presents a novel anchorperson detection algorithm based on spatio-temporal slice (STS). With STSpattern analysis, clustering and decision fusion, anchorperson shots can be detected for browsing news video. The large-scale experimental results demonstrate that the algorithm is accurate, robust and effective.

/pdf/a-novel-anchorperson-detection-algorithm-based-on-spatio-1flnuzskya.pdf

A Novel Anchorperson Detection Algorithm Based on Spatio-temporal Slice

Most current research on human action recognition in videos uses the bag-of-words (BoW) representations based on vector quantization on local spatial temporal features, due to the simplicity and good performance of such representations. In contrast to the BoW schemes, this paper explores a localized, continuous and probabilistic video representation. Specifically, the proposed representation encodes the visual and motion information of an ensemble of local spatial temporal (ST) features of a video into a distribution estimated by a generative probabilistic model such as the Gaussian Mixture Model. Furthermore, this probabilistic video representation naturally gives rise to an information-theoretic distance metric of videos. This makes the representation readily applicable as input to most discriminative classifiers, such as the nearest neighbor schemes and the kernel methods. The experiments on two datasets, KTH and UCF sports, show that the proposed approach could deliver promising results.

Sheng Tang

Papers

Large visual words for large scale image classification

Spatiotemporal Breast Mass Detection Network (MD-Net) in 4D DCE-MRI Images

Document Clustering Based on Spectral Clustering and Non-negative Matrix Factorization

A Novel Anchorperson Detection Algorithm Based on Spatio-temporal Slice

A distribution based video representation for human action recognition