scispace - formally typeset
Search or ask a question
Author

J. Shane Culpepper

Other affiliations: University of Melbourne
Bio: J. Shane Culpepper is an academic researcher from RMIT University. The author has contributed to research in topics: Ranking & Ranking (information retrieval). The author has an hindex of 24, co-authored 122 publications receiving 1803 citations. Previous affiliations of J. Shane Culpepper include University of Melbourne.


Papers
More filters
Journal ArticleDOI
TL;DR: This article investigates intersection techniques that make use of both uncompressed “integer” representations, as well as compressed arrangements, and proposes a simple hybrid method that provides both compact storage and faster intersection computations for conjunctive querying than is possible even with uncompressed representations.
Abstract: Conjunctive Boolean queries are a key component of modern information retrieval systems, especially when Web-scale repositories are being searched. A conjunctive query q is equivalent to a vqv-way intersection over ordered sets of integers, where each set represents the documents containing one of the terms, and each integer in each set is an ordinal document identifier. As is the case with many computing applications, there is tension between the way in which the data is represented, and the ways in which it is to be manipulated. In particular, the sets representing index data for typical document collections are highly compressible, but are processed using random access techniques, meaning that methods for carrying out set intersections must be alert to issues to do with access patterns and data representation. Our purpose in this article is to explore these trade-offs, by investigating intersection techniques that make use of both uncompressed “integer” representations, as well as compressed arrangements. We also propose a simple hybrid method that provides both compact storage, and also faster intersection computations for conjunctive querying than is possible even with uncompressed representations.

173 citations

Journal ArticleDOI
31 Aug 2018
TL;DR: The intent is that this description of open problems will help to inspire researchers and graduate students to address the questions, and will provide funding agencies data to focus and coordinate support for information retrieval research.
Abstract: The purpose of the Strategic Workshop in Information Retrieval in Lorne is to explore the long-range issues of the Information Retrieval field, to recognize challenges that are on - or even over - the horizon, to build consensus on some of the key challenges, and to disseminate the resulting information to the research community The intent is that this description of open problems will help to inspire researchers and graduate students to address the questions, and will provide funding agencies data to focus and coordinate support for information retrieval research

116 citations

Posted Content
TL;DR: This survey comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering, and explores four closely related analytical tasks commonly used with trajectory data in interactive or real-time processing.
Abstract: Recent advances in sensor and mobile devices have enabled an unprecedented increase in the availability and collection of urban trajectory data, thus increasing the demand for more efficient ways to manage and analyze the data being produced. In this survey, we comprehensively review recent research trends in trajectory data management, ranging from trajectory pre-processing, storage, common trajectory analytic tools, such as querying spatial-only and spatial-textual trajectory data, and trajectory clustering. We also explore four closely related analytical tasks commonly used with trajectory data in interactive or real-time processing. Deep trajectory learning is also reviewed for the first time. Finally, we outline the essential qualities that a trajectory data management system should possess in order to maximize flexibility.

85 citations

Book ChapterDOI
06 Sep 2010
TL;DR: This paper presents two new algorithms for ranking documents against a query without making any assumptions on the structure of the underlying text, significantly faster than existing methods in RAM and even three times faster than a state-of-the-art inverted file implementation for English text when word queries are issued.
Abstract: Text search engines return a set of k documents ranked by similarity to a query. Typically, documents and queries are drawn from natural language text, which can readily be partitioned into words, allowing optimizations of data structures and algorithms for ranking. However, in many new search domains (DNA, multimedia, OCR texts, Far East languages) there is often no obvious definition of words and traditional indexing approaches are not so easily adapted, or break down entirely. We present two new algorithms for ranking documents against a query without making any assumptions on the structure of the underlying text. We build on existing theoretical techniques, which we have implemented and compared empirically with new approaches introduced in this paper. Our best approach is significantly faster than existing methods in RAM, and is even three times faster than a state-of-the-art inverted file implementation for English text when word queries are issued.

81 citations

Proceedings ArticleDOI
27 Jun 2018
TL;DR: This paper proposes a general end-to-end query performance prediction framework based on neural networks, called NeuralQPP, which significantly outperforms state-of-the-art baselines, in nearly every case.
Abstract: Predicting the performance of a search engine for a given query is a fundamental and challenging task in information retrieval. Accurate performance predictors can be used in various ways, such as triggering an action, choosing the most effective ranking function per query, or selecting the best variant from multiple query formulations. In this paper, we propose a general end-to-end query performance prediction framework based on neural networks, called NeuralQPP. Our framework consists of multiple components, each learning a representation suitable for performance prediction. These representations are then aggregated and fed into a prediction sub-network. We train our models with multiple weak supervision signals, which is an unsupervised learning approach that uses the existing unsupervised performance predictors using weak labels. We also propose a simple yet effective component dropout technique to regularize our model. Our experiments on four newswire and web collections demonstrate that NeuralQPP significantly outperforms state-of-the-art baselines, in nearly every case. Furthermore, we thoroughly analyze the effectiveness of each component, each weak supervision signal, and all resulting combinations in our experiments.

80 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This review covers the literature published in 2014 for marine natural products, with 1116 citations referring to compounds isolated from marine microorganisms and phytoplankton, green, brown and red algae, sponges, cnidarians, bryozoans, molluscs, tunicates, echinoderms, mangroves and other intertidal plants and microorganisms.

4,649 citations

Journal ArticleDOI
TL;DR: In this Review, the fundamental characteristics of azide chemistry and current developments are presented and the focus will be placed on cycloadditions (Huisgen reaction), aza ylide chemistry, and the synthesis of heterocycles.
Abstract: Since the discovery of organic azides by Peter Griess more than 140 years ago, numerous syntheses of these energy-rich molecules have been developed. In more recent times in particular, completely new perspectives have been developed for their use in peptide chemistry, combinatorial chemistry, and heterocyclic synthesis. Organic azides have assumed an important position at the interface between chemistry, biology, medicine, and materials science. In this Review, the fundamental characteristics of azide chemistry and current developments are presented. The focus will be placed on cycloadditions (Huisgen reaction), aza ylide chemistry, and the synthesis of heterocycles. Further reactions such as the aza-Wittig reaction, the Sundberg rearrangement, the Staudinger ligation, the Boyer and Boyer-Aube rearrangements, the Curtius rearrangement, the Schmidt rearrangement, and the Hemetsberger rearrangement bear witness to the versatility of modern azide chemistry.

1,766 citations

Journal ArticleDOI
TL;DR: This tutorial introduces the key techniques in the area of text indexing, describing both a core implementation and how the core can be enhanced through a range of extensions.
Abstract: The technology underlying text search engines has advanced dramatically in the past decade. The development of a family of new index representations has led to a wide range of innovations in index storage, index construction, and query evaluation. While some of these developments have been consolidated in textbooks, many specific techniques are not widely known or the textbook descriptions are out of date. In this tutorial, we introduce the key techniques in the area, describing both a core implementation and how the core can be enhanced through a range of extensions. We conclude with a comprehensive bibliography of text indexing literature.

1,218 citations