scispace - formally typeset
Search or ask a question
Institution

IBM

CompanyArmonk, New York, United States
About: IBM is a company organization based out in Armonk, New York, United States. It is known for research contribution in the topics: Layer (electronics) & Signal. The organization has 134567 authors who have published 253905 publications receiving 7458795 citations. The organization is also known as: International Business Machines Corporation & Big Blue.


Papers
More filters
Journal ArticleDOI
Jerry Tersoff1
TL;DR: An empirical interatomic potential for covalent systems is proposed, incorporating bond order in an intuitive way, and a model for Si accurately describes bonding and geometry for may structures, including highly rebonded surfaces.
Abstract: An empirical interatomic potential for covalent systems is proposed, incorporating bond order in an intuitive way. The potential has the form of a Morse pair potential, but with the bond-strength parameter depending upon local environment. A model for Si accurately describes bonding and geometry for may structures, including highly rebonded surfaces.

1,134 citations

Proceedings ArticleDOI
Charu C. Aggarwal1, Philip S. Yu1
01 May 2001
TL;DR: New techniques for outlier detection which find the outliers by studying the behavior of projections from the data set are discussed.
Abstract: The outlier detection problem has important applications in the field of fraud detection, network robustness analysis, and intrusion detection. Most such applications are high dimensional domains in which the data can contain hundreds of dimensions. Many recent algorithms use concepts of proximity in order to find outliers based on their relationship to the rest of the data. However, in high dimensional space, the data is sparse and the notion of proximity fails to retain its meaningfulness. In fact, the sparsity of high dimensional data implies that every point is an almost equally good outlier from the perspective of proximity-based definitions. Consequently, for high dimensional data, the notion of finding meaningful outliers becomes substantially more complex and non-obvious. In this paper, we discuss new techniques for outlier detection which find the outliers by studying the behavior of projections from the data set.

1,132 citations

Journal ArticleDOI
17 May 1999
TL;DR: The subject of this paper is the systematic enumeration of over 100,000 emerging communities from a Web crawl, motivating a graph-theoretic approach to locating such communities, and describing the algorithms and algorithmic engineering necessary to find structures that subscribe to this notion.
Abstract: The Web harbors a large number of communities — groups of content-creators sharing a common interest — each of which manifests itself as a set of interlinked Web pages. Newgroups and commercial Web directories together contain of the order of 20,000 such communities; our particular interest here is on emerging communities — those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a Web crawl: we call our process trawling. We motivate a graph-theoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment. © 1999 Published by Elsevier Science B.V. All rights reserved.

1,126 citations

Proceedings ArticleDOI
01 Jun 1998
TL;DR: This work has developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained and its technique also adapts gracefully to the fraction of neighboring documents having known topics.
Abstract: A major challenge in indexing unstructured hypertext databases is to automatically extract meta-data that enables structured search using topic taxonomies, circumvents keyword ambiguity, and improves the quality of search and profile-based routing and filtering. Therefore, an accurate classifier is an essential component of a hypertext database. Hyperlinks pose new problems not addressed in the extensive text classification literature. Links clearly contain high-quality semantic clues that are lost upon a purely term-based classifier, but exploiting link information is non-trivial because it is noisy. Naive use of terms in the link neighborhood of a document can even degrade accuracy. Our contribution is to propose robust statistical models and a relaxation labeling technique for better classification by exploiting link information in a small neighborhood around documents. Our technique also adapts gracefully to the fraction of neighboring documents having known topics. We experimented with pre-classified samples from Yahoo!1 and the US Patent Database2. In previous work, we developed a text classifier that misclassified only 13% of the documents in the well-known Reuters benchmark; this was comparable to the best results ever obtained. This classifier misclassified 36% of the patents, indicating that classifying hypertext can be more difficult than classifying text. Naively using terms in neighboring documents increased error to 38%; our hypertext classifier reduced it to 21%. Results with the Yahoo! sample were more dramatic: the text classifier showed 68% error, whereas our hypertext classifier reduced this to only 21%.

1,124 citations

Journal ArticleDOI
TL;DR: Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.
Abstract: In an online convex optimization problem a decision-maker makes a sequence of decisions, i.e., chooses a sequence of points in Euclidean space, from a fixed feasible set. After each point is chosen, it encounters a sequence of (possibly unrelated) convex cost functions. Zinkevich (ICML 2003) introduced this framework, which models many natural repeated decision-making problems and generalizes many existing problems such as Prediction from Expert Advice and Cover's Universal Portfolios. Zinkevich showed that a simple online gradient descent algorithm achieves additive regret $O(\sqrt{T})$ , for an arbitrary sequence of T convex cost functions (of bounded gradients), with respect to the best single decision in hindsight. In this paper, we give algorithms that achieve regret O(log?(T)) for an arbitrary sequence of strictly convex functions (with bounded first and second derivatives). This mirrors what has been done for the special cases of prediction from expert advice by Kivinen and Warmuth (EuroCOLT 1999), and Universal Portfolios by Cover (Math. Finance 1:1---19, 1991). We propose several algorithms achieving logarithmic regret, which besides being more general are also much more efficient to implement. The main new ideas give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field. Our analysis shows a surprising connection between the natural follow-the-leader approach and the Newton method. We also analyze other algorithms, which tie together several different previous approaches including follow-the-leader, exponential weighting, Cover's algorithm and gradient descent.

1,124 citations


Authors

Showing all 134658 results

NameH-indexPapersCitations
Zhong Lin Wang2452529259003
Anil K. Jain1831016192151
Hyun-Chul Kim1764076183227
Rodney S. Ruoff164666194902
Tobin J. Marks1591621111604
Jean M. J. Fréchet15472690295
Albert-László Barabási152438200119
György Buzsáki15044696433
Stanislas Dehaene14945686539
Philip S. Yu1481914107374
James M. Tour14385991364
Thomas P. Russell141101280055
Naomi J. Halas14043582040
Steven G. Louie13777788794
Daphne Koller13536771073
Network Information
Related Institutions (5)
Carnegie Mellon University
104.3K papers, 5.9M citations

93% related

Georgia Institute of Technology
119K papers, 4.6M citations

92% related

Bell Labs
59.8K papers, 3.1M citations

90% related

Microsoft
86.9K papers, 4.1M citations

89% related

Massachusetts Institute of Technology
268K papers, 18.2M citations

88% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
202330
2022137
20213,163
20206,336
20196,427
20186,278