Author
Shih-Fu Chang
Other affiliations: Eastman Kodak Company, Nanyang Technological University, Hunter College ...read more
Bio: Shih-Fu Chang is an academic researcher from Columbia University. The author has contributed to research in topics: Large Hadron Collider & Image retrieval. The author has an hindex of 130, co-authored 917 publications receiving 72346 citations. Previous affiliations of Shih-Fu Chang include Eastman Kodak Company & Nanyang Technological University.
Papers published on a yearly basis
Papers
More filters
TL;DR: In this paper, results from searches for the standard model Higgs boson in proton-proton collisions at 7 and 8 TeV in the CMS experiment at the LHC, using data samples corresponding to integrated luminosities of up to 5.8 standard deviations.
Abstract: Results are presented from searches for the standard model Higgs boson in proton-proton collisions at sqrt(s)=7 and 8 TeV in the CMS experiment at the LHC, using data samples corresponding to integrated luminosities of up to 5.1 inverse femtobarns at 7 TeV and 5.3 inverse femtobarns at 8 TeV. The search is performed in five decay modes: gamma gamma, ZZ, WW, tau tau, and b b-bar. An excess of events is observed above the expected background, a local significance of 5.0 standard deviations, at a mass near 125 GeV, signalling the production of a new particle. The expected significance for a standard model Higgs boson of that mass is 5.8 standard deviations. The excess is most significant in the two decay modes with the best mass resolution, gamma gamma and ZZ; a fit to these signals gives a mass of 125.3 +/- 0.4 (stat.) +/- 0.5 (syst.) GeV. The decay to two photons indicates that the new particle is a boson with spin different from one.
8,857 citations
01 Feb 1997
TL;DR: The VisualSEEk system is novel in that the user forms the queries by diagramming spatial arrangements of color regions by utilizing color information, region sizes and absolute and relative spatial locations.
Abstract: We describe a highly functional prototype system for searching by visual features in an image database. The VisualSEEk system is novel in that the user forms the queries by diagramming spatial arrangements of color regions. The system nds the images that contain the most similar arrangements of similar regions. Prior to the queries, the system automatically extracts and indexes salient color regions from the images. By utilizing e cient indexing techniques for color information, region sizes and absolute and relative spatial locations, a wide variety of complex joint color/spatial queries may be computed.
2,084 citations
16 Jun 2012
TL;DR: A novel kernel-based supervised hashing model which requires a limited amount of supervised information, i.e., similar and dissimilar data pairs, and a feasible training cost in achieving high quality hashing, and significantly outperforms the state-of-the-arts in searching both metric distance neighbors and semantically similar neighbors is proposed.
Abstract: Recent years have witnessed the growing popularity of hashing in large-scale vision problems. It has been shown that the hashing quality could be boosted by leveraging supervised information into hash function learning. However, the existing supervised methods either lack adequate performance or often incur cumbersome model training. In this paper, we propose a novel kernel-based supervised hashing model which requires a limited amount of supervised information, i.e., similar and dissimilar data pairs, and a feasible training cost in achieving high quality hashing. The idea is to map the data to compact binary codes whose Hamming distances are minimized on similar pairs and simultaneously maximized on dissimilar pairs. Our approach is distinct from prior works by utilizing the equivalence between optimizing the code inner products and the Hamming distances. This enables us to sequentially and efficiently train the hash functions one bit at a time, yielding very short yet discriminative codes. We carry out extensive experiments on two image benchmarks with up to one million samples, demonstrating that our approach significantly outperforms the state-of-the-arts in searching both metric distance neighbors and semantically similar neighbors, with accuracy gains ranging from 13% to 46%.
1,461 citations
Proceedings Article•
[...]
28 Jun 2011
TL;DR: This paper proposes a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes and describes a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy.
Abstract: Hashing is becoming increasingly popular for efficient nearest neighbor search in massive databases. However, learning short codes that yield good search performance is still a challenge. Moreover, in many cases real-world data lives on a low-dimensional manifold, which should be taken into account to capture meaningful nearest neighbors. In this paper, we propose a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes. To make such an approach computationally feasible, we utilize Anchor Graphs to obtain tractable low-rank adjacency matrices. Our formulation allows constant time hashing of a new data point by extrapolating graph Laplacian eigenvectors to eigenfunctions. Finally, we describe a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy. Experimental comparison with the other state-of-the-art methods on two large datasets demonstrates the efficacy of the proposed method.
1,058 citations
TL;DR: This work proposes a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets and presents three different semi- supervised hashing methods, including orthogonal hashing, nonorthogonal hash, and sequential hashing.
Abstract: Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.
834 citations
Cited by
More filters
[...]
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality.
Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …
33,785 citations
Journal Article•
28,685 citations
[...]
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).
13,246 citations
[...]
TL;DR: In this paper, a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) is presented.
Abstract: Deposits of clastic carbonate-dominated (calciclastic) sedimentary slope systems in the rock record have been identified mostly as linearly-consistent carbonate apron deposits, even though most ancient clastic carbonate slope deposits fit the submarine fan systems better. Calciclastic submarine fans are consequently rarely described and are poorly understood. Subsequently, very little is known especially in mud-dominated calciclastic submarine fan systems. Presented in this study are a sedimentological core and petrographic characterisation of samples from eleven boreholes from the Lower Carboniferous of Bowland Basin (Northwest England) that reveals a >250 m thick calciturbidite complex deposited in a calciclastic submarine fan setting. Seven facies are recognised from core and thin section characterisation and are grouped into three carbonate turbidite sequences. They include: 1) Calciturbidites, comprising mostly of highto low-density, wavy-laminated bioclast-rich facies; 2) low-density densite mudstones which are characterised by planar laminated and unlaminated muddominated facies; and 3) Calcidebrites which are muddy or hyper-concentrated debrisflow deposits occurring as poorly-sorted, chaotic, mud-supported floatstones. These
9,929 citations
Journal Article•
9,185 citations