We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. By a carefully crafted design, we increased the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multi-scale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.

/pdf/going-deeper-with-convolutions-1yobw2o2ds.pdf

Going deeper with convolutions

Multi-label learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously. During the past decade, significant amount of progresses have been made toward this emerging machine learning paradigm. This paper aims to provide a timely review on this area with emphasis on state-of-the-art multi-label learning algorithms. Firstly, fundamentals on multi-label learning including formal definition and evaluation metrics are given. Secondly and primarily, eight representative multi-label learning algorithms are scrutinized under common notations with relevant analyses and discussions. Thirdly, several related learning settings are briefly summarized. As a conclusion, online resources and open research problems on multi-label learning are outlined for reference purposes.

https://romisatriawahono.net/lecture/rm/survey/machine%20learning/Zhang%20-%20Multi-Label%20Learning%20Algorithms%20-%202013.pdf

A Review On Multi-Label Learning Algorithms

As robots and other computer-assisted technologies take over tasks previously performed by labor, there is increasing concern about the future of jobs and wages. We analyze the effect of the increase in industrial robot usage between 1990 and 2007 on US local labor markets. Using a model in which robots compete against human labor in the production of different tasks, we show that robots may reduce employment and wages, and that the local labor market effects of robots can be estimated by regressing the change in employment and wages on the exposure to robots in each local labor marketâ€”defined from the national penetration of robots into each industry and the local distribution of employment across industries. Using this approach, we estimate large and robust negative effects of robots on employment and wages across commuting zones. We bolster this evidence by showing that the commuting zones most exposed to robots in the post-1990 era do not exhibit any differential trends before 1990. The impact of robots is distinct from the impact of imports from China and Mexico, the decline of routine jobs, offshoring, other types of IT capital, and the total capital stock (in fact, exposure to robots is only weakly correlated with these other variables). According to our estimates, one more robot per thousand workers reduces the employment to population ratio by about 0.18-0.34 percentage points and wages by 0.25-0.5 percent.

Robots and Jobs: Evidence from US Labor Markets

While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image. Traditional approaches to multi-label image classification learn independent classifiers for each category and employ ranking or thresholding on the classification results. These techniques, although working well, fail to explicitly exploit the label dependencies in an image. In this paper, we utilize recurrent neural networks (RNNs) to address this problem. Combined with CNNs, the proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image-label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework. Experimental results on public benchmark datasets demonstrate that the proposed architecture achieves better performance than the state-of-the-art multi-label classification models.

/pdf/cnn-rnn-a-unified-framework-for-multi-label-image-4ckssxbf6c.pdf

CNN-RNN: A Unified Framework for Multi-label Image Classification

We provide evidence that democracy has a significant and robust positive effect on GDP per capita. Our empirical strategy controls for country fixed effects and the rich dynamics of GDP, which otherwise confound the effect of democracy on economic growth. To reduce measurement error, we introduce a new dichotomous measure of democracy that consolidates the information from several sources. Our baseline results use a dynamic panel model for GDP, and show that democratizations increase GDP per capita by about 20% in the long run. We find similar effects of democratizations on annual GDP when we control for the estimated propensity of a country to democratize based on past GDP dynamics. We obtain comparable estimates when we instrument democracy using regional waves of democratizations and reversals. Our results suggest that democracy increases GDP by encouraging investment, increasing schooling, inducing economic reforms, improving the provision of public goods, and reducing social unrest. We find little support for the view that democracy is a constraint on economic growth for less developed economies.

Democracy Does Cause Growth

Discriminative methods (e.g., kernel regression, SVM) have been extensively used to solve problems such as object recognition, image alignment and pose estimation from images. These methods typically map image features ( ${\mathbf X}$ ) to continuous (e.g., pose) or discrete (e.g., object category) values. A major drawback of existing discriminative methods is that samples are directly projected onto a subspace and hence fail to account for outliers common in realistic training sets due to occlusion, specular reflections or noise. It is important to notice that existing discriminative approaches assume the input variables ${\mathbf X}$ to be noise free. Thus, discriminative methods experience significant performance degradation when gross outliers are present. Despite its obvious importance, the problem of robust discriminative learning has been relatively unexplored in computer vision. This paper develops the theory of robust regression (RR) and presents an effective convex approach that uses recent advances on rank minimization. The framework applies to a variety of problems in computer vision including robust linear discriminant analysis, regression with missing data, and multi-label classification. Several synthetic and real examples with applications to head pose estimation from images, image and video classification and facial attribute classification with missing data are used to illustrate the benefits of RR.

Robust Regression

Low rank models have been widely used for the representation of shape, appearance or motion in computer vision problems. Traditional approaches to fit low rank models make use of an explicit bilinear factorization. These approaches benefit from fast numerical methods for optimization and easy kernelization. However, they suffer from serious local minima problems depending on the loss function and the amount/type of missing data. Recently, these low-rank models have alternatively been formulated as convex problems using the nuclear norm regularizer, unlike factorization methods, their numerical solvers are slow and it is unclear how to kernelize them or to impose a rank a priori. This paper proposes a unified approach to bilinear factorization and nuclear norm regularization, that inherits the benefits of both. We analyze the conditions under which these approaches are equivalent. Moreover, based on this analysis, we propose a new optimization algorithm and a "rank continuation'' strategy that outperform state-of-the-art approaches for Robust PCA, Structure from Motion and Photometric Stereo with outliers and missing data.

/pdf/unifying-nuclear-norm-and-bilinear-factorization-approaches-27mtp64f5c.pdf

Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition

Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data matrix that concatenates training and testing features with training labels. We propose two convex algorithms for matrix completion based on a Rank Minimization criterion specifically tailored to visual data, and prove its convergence properties. A major advantage of our approach w.r.t. standard discriminative classification methods for image categorization is its robustness to outliers, background noise and partial occlusions both in the feature and label space. Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes.

/pdf/matrix-completion-for-multi-label-image-classification-2i80rkde11.pdf

Matrix Completion for Multi-label Image Classification

In the last few years, image classification has become an incredibly active research topic, with widespread applications. Most methods for visual recognition are fully supervised, as they make use of bounding boxes or pixelwise segmentations to locate objects of interest. However, this type of manual labeling is time consuming, error prone and it has been shown that manual segmentations are not necessarily the optimal spatial enclosure for object classifiers. This paper proposes a weakly-supervised system for multi-label image classification. In this setting, training images are annotated with a set of keywords describing their contents, but the visual concepts are not explicitly segmented in the images. We formulate the weakly-supervised image classification as a low-rank matrix completion problem. Compared to previous work, our proposed framework has three advantages: (1) Unlike existing solutions based on multiple-instance learning methods, our model is convex. We propose two alternative algorithms for matrix completion specifically tailored to visual data, and prove their convergence. (2) Unlike existing discriminative methods, our algorithm is robust to labeling errors, background noise and partial occlusions. (3) Our method can potentially be used for semantic segmentation. Experimental validation on several data sets shows that our method outperforms state-of-the-art classification algorithms, while effectively capturing each class appearance.

Matrix Completion for Weakly-Supervised Multi-Label Image Classification

This paper presents a system to reconstruct piecewise planar and compact floorplans from images, which are then converted to high quality texture-mapped models for free- viewpoint visualization. There are two main challenges in image-based floorplan reconstruction. The first is the lack of 3D information that can be extracted from images by Structure from Motion and Multi-View Stereo, as indoor scenes abound with non-diffuse and homogeneous surfaces plus clutter. The second challenge is the need of a sophisti- cated regularization technique that enforces piecewise pla- narity, to suppress clutter and yield high quality texture mapped models. Our technical contributions are twofold. First, we propose a novel structure classification technique to classify each pixel to three regions (floor, ceiling, and wall), which provide 3D cues even from a single image. Second, we cast floorplan reconstruction as a shortest path problem on a specially crafted graph, which enables us to enforce piecewise planarity. Besides producing compact piecewise planar models, this formulation allows us to di- rectly control the number of vertices (i.e., density) of the output mesh. We evaluate our system on real indoor scenes, and show that our texture mapped mesh models provide compelling free-viewpoint visualization experiences, when compared against the state-of-the-art and ground truth.

Ricardo Cabral

Papers

Robust Regression

Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition

Matrix Completion for Multi-label Image Classification

Matrix Completion for Weakly-Supervised Multi-Label Image Classification

Piecewise Planar and Compact Floorplan Reconstruction from Images