scispace - formally typeset
Search or ask a question
Institution

Amazon.com

CompanySeattle, Washington, United States
About: Amazon.com is a company organization based out in Seattle, Washington, United States. It is known for research contribution in the topics: Computer science & Service (business). The organization has 13363 authors who have published 17317 publications receiving 266589 citations.


Papers
More filters
Patent
29 Mar 2006
TL;DR: In this article, user interaction for a plurality of users with the web site is collected in a database, and the database is mined to extract relationships between probability and references of select attributes in probability models.
Abstract: An apparatus and methods advantageously select content items for dynamically-generated web pages in an intelligent and virtually autonomous manner. This permits the operator of the web site to rapidly identify and respond to trends, thereby advantageously updating the web site relatively quickly and efficiently without or with less time consuming and expensive manual labor. User interaction for a plurality of users with the web site is collected in a database. For various content items, the database is mined to extract relationships between probability and references of select attributes in probability models. When a new web page is requested, attributes, which can include attributes associated with a user, are used as references to the applicable probability models of selected content items, combined with value weighting to generate expected values, and selected for use in the web page at least partially based on the expected values.

155 citations

Journal ArticleDOI
TL;DR: It is found rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation, which suggest exon-level expression analysis may be inadvisable, and the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq is shown.
Abstract: Background: RNA-seq is a powerful technique for identifying and quantifying transcription and splicing events, both known and novel. However, given its recent development and the proliferation of library construction methods, understanding the bias it introduces is incomplete but critical to realizing its value. Results: We present a method, in vitro transcription sequencing (IVT-seq), for identifying and assessing the technical biases in RNA-seq library generation and sequencing at scale. We created a pool of over 1,000 in vitro transcribed RNAs from a full-length human cDNA library and sequenced them with polyA and total RNA-seq, the most common protocols. Because each cDNA is full length, and we show in vitro transcription is incredibly processive, each base in each transcript should be equivalently represented. However, with common RNA-seq applications and platforms, we find 50% of transcripts have more than two-fold and 10% have more than 10-fold differences in within-transcript sequence coverage. We also find greater than 6% of transcripts have regions of dramatically unpredictable sequencing coverage between samples, confounding accurate determination of their expression. We use a combination of experimental and computational approaches to show rRNA depletion is responsible for the most significant variability in coverage, and several sequence determinants also strongly influence representation. Conclusions: These results show the utility of IVT-seq for promoting better understanding of bias introduced by RNA-seq. We find rRNA depletion is responsible for substantial, unappreciated biases in coverage introduced during library preparation. These biases suggest exon-level expression analysis may be inadvisable, and we recommend caution when interpreting RNA-seq results.

155 citations

Proceedings ArticleDOI
22 Apr 2006
TL;DR: This paper describes a study of forty-eight academics and the techniques and tools they use to manage their digital and material archiving of papers, emails, documents, internet bookmarks, correspondence, and other artifacts, and presents two sets of results.
Abstract: The personal archive is not only about efficient storage and retrieval of information. This paper describes a study of forty-eight academics and the techniques and tools they use to manage their digital and material archiving of papers, emails, documents, internet bookmarks, correspondence, and other artifacts. We present two sets of results: we first discuss rationales behind subjects' archiving, which go beyond information retrieval to include creating a legacy, sharing resources, confronting fears and anxieties, and identity construction. We then show how these rationales were mapped into our subjects' physical, social and electronic spaces, and discuss implications for development of digital tools that allow for personal archiving.

155 citations

Proceedings Article
01 Jan 2019
TL;DR: This paper evaluates on several standard retrieval datasets such as CAR-196, CUB-200-2011, Stanford Online Product, and In-Shop datasets for image retrieval and clustering, and establishes that the classification-based approach is competitive across different feature dimensions and base feature networks.
Abstract: Deep metric learning aims to learn a function mapping image pixels to embedding feature vectors that model the similarity between images. Two major applications of metric learning are content-based image retrieval and face verification. For the retrieval tasks, the majority of current state-of-the-art (SOTA) approaches are triplet-based non-parametric training. For the face verification tasks, however, recent SOTA approaches have adopted classification-based parametric training. In this paper, we look into the effectiveness of classification based approaches on image retrieval datasets. We evaluate on several standard retrieval datasets such as CAR-196, CUB-200-2011, Stanford Online Product, and In-Shop datasets for image retrieval and clustering, and establish that our classification-based approach is competitive across different feature dimensions and base feature networks. We further provide insights into the performance effects of subsampling classes for scalable classification-based training, and the effects of binarization, enabling efficient storage and computation for practical applications.

155 citations

Posted ContentDOI
TL;DR: It is shown that pre-norm residual connections (PRENORM) and smaller initializations enable warmup-free, validation-based training with large learning rates and proposed l2 normalization with a single scale parameter (SCALENORN) for faster training and better performance.
Abstract: We evaluate three simple, normalization-centric changes to improve Transformer training. First, we show that pre-norm residual connections (PreNorm) and smaller initializations enable warmup-free, validation-based training with large learning rates. Second, we propose $\ell_2$ normalization with a single scale parameter (ScaleNorm) for faster training and better performance. Finally, we reaffirm the effectiveness of normalizing word embeddings to a fixed length (FixNorm). On five low-resource translation pairs from TED Talks-based corpora, these changes always converge, giving an average +1.1 BLEU over state-of-the-art bilingual baselines and a new 32.8 BLEU on IWSLT'15 English-Vietnamese. We observe sharper performance curves, more consistent gradient norms, and a linear relationship between activation scaling and decoder depth. Surprisingly, in the high-resource setting (WMT'14 English-German), ScaleNorm and FixNorm remain competitive but PreNorm degrades performance.

154 citations


Authors

Showing all 13498 results

NameH-indexPapersCitations
Jiawei Han1681233143427
Bernhard Schölkopf1481092149492
Christos Faloutsos12778977746
Alexander J. Smola122434110222
Rama Chellappa120103162865
William F. Laurance11847056464
Andrew McCallum11347278240
Michael J. Black11242951810
David Heckerman10948362668
Larry S. Davis10769349714
Chris M. Wood10279543076
Pietro Perona10241494870
Guido W. Imbens9735264430
W. Bruce Croft9742639918
Chunhua Shen9368137468
Network Information
Related Institutions (5)
Microsoft
86.9K papers, 4.1M citations

89% related

Google
39.8K papers, 2.1M citations

88% related

Carnegie Mellon University
104.3K papers, 5.9M citations

87% related

ETH Zurich
122.4K papers, 5.1M citations

82% related

University of Maryland, College Park
155.9K papers, 7.2M citations

82% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20234
2022168
20212,015
20202,596
20192,002
20181,189