Institution
IBM
Company•Armonk, New York, United States•
About: IBM is a company organization based out in Armonk, New York, United States. It is known for research contribution in the topics: Layer (electronics) & Signal. The organization has 134567 authors who have published 253905 publications receiving 7458795 citations. The organization is also known as: International Business Machines Corporation & Big Blue.
Papers published on a yearly basis
Papers
More filters
••
IBM1
TL;DR: An empirical interatomic potential for covalent systems is proposed, incorporating bond order in an intuitive way, and a model for Si accurately describes bonding and geometry for may structures, including highly rebonded surfaces.
Abstract: An empirical interatomic potential for covalent systems is proposed, incorporating bond order in an intuitive way. The potential has the form of a Morse pair potential, but with the bond-strength parameter depending upon local environment. A model for Si accurately describes bonding and geometry for may structures, including highly rebonded surfaces.
1,134 citations
••
IBM1
TL;DR: New techniques for outlier detection which find the outliers by studying the behavior of projections from the data set are discussed.
Abstract: The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based definitions. Consequently, for high dimensional data, the notion of finding meaningful outliers becomes substantially more complex and non-obvious. In this paper, we discuss new techniques for outlier detection which find the outliers by studying the behavior of projections from the data set.
1,132 citations
••
IBM1
TL;DR: The subject of this paper is the systematic enumeration of over 100,000 emerging communities from a Web crawl, motivating a graph-theoretic approach to locating such communities, and describing the algorithms and algorithmic engineering necessary to find structures that subscribe to this notion.
Abstract: The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment. © 1999 Published by Elsevier Science B.V. All rights reserved.
1,126 citations
••
01 Jun 1998TL;DR: This work has developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained and its technique also adapts gracefully to the fraction of neighboring documents having known topics.
Abstract: A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain high-quality semantic clues that are lost upon a purely term-based classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented with pre-classified samples from Yahoo!1 and the US Patent Database2. In previous work, we developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained. This classifier misclassified 36% of the patents, indicating that classifying hypertext can be more difficult than classifying text. Naively using terms in neighboring documents increased error to 38%; our hypertext classifier reduced it to 21%. Results with the Yahoo! sample were more dramatic: the text classifier showed 68% error, whereas our hypertext classifier reduced this to only 21%.
1,124 citations
••
TL;DR: Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Abstract: In an online convex optimization problem a decision-maker makes a sequence of decisions, i.e., chooses a sequence of points in Euclidean space, from a fixed feasible set. After each point is chosen, it encounters a sequence of (possibly unrelated) convex cost functions. Zinkevich (ICML 2003) introduced this framework, which models many natural repeated decision-making problems and generalizes many existing problems such as Prediction from Expert Advice and Cover's Universal Portfolios. Zinkevich showed that a simple online gradient descent algorithm achieves additive regret $O(\sqrt{T})$ , for an arbitrary sequence of T convex cost functions (of bounded gradients), with respect to the best single decision in hindsight.
In this paper, we give algorithms that achieve regret O(log?(T)) for an arbitrary sequence of strictly convex functions (with bounded first and second derivatives). This mirrors what has been done for the special cases of prediction from expert advice by Kivinen and Warmuth (EuroCOLT 1999), and Universal Portfolios by Cover (Math. Finance 1:1---19, 1991). We propose several algorithms achieving logarithmic regret, which besides being more general are also much more efficient to implement.
The main new ideas give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field. Our analysis shows a surprising connection between the natural follow-the-leader approach and the Newton method. We also analyze other algorithms, which tie together several different previous approaches including follow-the-leader, exponential weighting, Cover's algorithm and gradient descent.
1,124 citations
Authors
Showing all 134658 results
Name | H-index | Papers | Citations |
---|---|---|---|
Zhong Lin Wang | 245 | 2529 | 259003 |
Anil K. Jain | 183 | 1016 | 192151 |
Hyun-Chul Kim | 176 | 4076 | 183227 |
Rodney S. Ruoff | 164 | 666 | 194902 |
Tobin J. Marks | 159 | 1621 | 111604 |
Jean M. J. Fréchet | 154 | 726 | 90295 |
Albert-László Barabási | 152 | 438 | 200119 |
György Buzsáki | 150 | 446 | 96433 |
Stanislas Dehaene | 149 | 456 | 86539 |
Philip S. Yu | 148 | 1914 | 107374 |
James M. Tour | 143 | 859 | 91364 |
Thomas P. Russell | 141 | 1012 | 80055 |
Naomi J. Halas | 140 | 435 | 82040 |
Steven G. Louie | 137 | 777 | 88794 |
Daphne Koller | 135 | 367 | 71073 |