Showing papers by "Jun Yang published in 2014"

PDF

Open Access

Journal Article•DOI•

[...]

You Wu¹, Pankaj K. Agarwal¹, Chengkai Li², Jun Yang¹, Cong Yu³ - Show less +1 more•Institutions (3)

Duke University¹, University of Texas at Arlington², Google³

01 Mar 2014

TL;DR: A framework that models claims based on structured data as parameterized queries as well as an algorithmic framework that enables efficient instantiations of "meta" algorithms by supplying appropriate algorithmic building blocks is proposed.

...read moreread less

Abstract: Our news are saturated with claims of "facts" made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-picking"? This paper proposes a framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate practical fact-checking tasks---reverse-engineering (often intentionally) vague claims, and countering questionable claims---as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of "meta" algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

...read moreread less

128 citations

Journal Article•DOI•

Data in, fact out: automated monitoring of facts by FactWatcher

[...]

Naeemul Hassan¹, Afroza Sultana¹, You Wu², Gensheng Zhang¹, Chengkai Li¹, Jun Yang², Cong Yu³ - Show less +3 more•Institutions (3)

University of Texas at Arlington¹, Duke University², Google³

01 Aug 2014

TL;DR: FactWatcher as mentioned in this paper is a system that helps journalists identify data-backed, attention-seizing facts which serve as leads to news stories, including situational facts, one-of-the-few facts, and prominent streaks, through a unified suite of data model, algorithm framework and fact ranking measure.

...read moreread less

Abstract: Towards computational journalism, we present FactWatcher, a system that helps journalists identify data-backed, attention-seizing facts which serve as leads to news stories. FactWatcher discovers three types of facts, including situational facts, one-of-the-few facts, and prominent streaks, through a unified suite of data model, algorithm framework, and fact ranking measure. Given an append-only database, upon the arrival of a new tuple, FactWatcher monitors if the tuple triggers any new facts. Its algorithms efficiently search for facts without exhaustively testing all possible ones. Furthermore, FactWatcher provides multiple features in striving for an end-to-end system, including fact ranking, fact-to-statement translation and keyword-based fact search.

...read moreread less

45 citations

Proceedings Article•DOI•

Incremental discovery of prominent situational facts

[...]

Afroza Sultana¹, Naeemul Hassan¹, Chengkai Li¹, Jun Yang², Cong Yu³ - Show less +1 more•Institutions (3)

University of Texas at Arlington¹, Duke University², Google³

19 May 2014

TL;DR: This work designs algorithms in response to the novel problem of finding new, prominent situational facts, which are emerging statements about objects that stand out within certain contexts using three corresponding ideas-tuple reduction, constraint pruning, and sharing computation across measure subspaces.

...read moreread less

Abstract: We study the novel problem of finding new, prominent situational facts, which are emerging statements about objects that stand out within certain contexts. Many such facts are newsworthy—e.g., an athlete's outstanding performance in a game, or a viral video's impressive popularity. Effective and efficient identification of these facts assists journalists in reporting, one of the main goals of computational journalism. Technically, we consider an ever-growing table of objects with dimension and measure attributes. A situational fact is a “contextual” skyline tuple that stands out against historical tuples in a context, specified by a conjunctive constraint involving dimension attributes, when a set of measure attributes are compared. New tuples are constantly added to the table, reflecting events happening in the real world. Our goal is to discover constraint-measure pairs that qualify a new tuple as a contextual skyline tuple, and discover them quickly before the event becomes yesterday's news. A brute-force approach requires exhaustive comparison with every tuple, under every constraint, and in every measure subspace. We design algorithms in response to these challenges using three corresponding ideas—tuple reduction, constraint pruning, and sharing computation across measure subspaces. We also adopt a simple prominence measure to rank the discovered facts when they are numerous. Experiments over two real datasets validate the effectiveness and efficiency of our techniques.

...read moreread less

16 citations

Proceedings Article•DOI•

Top-k preferences in high dimensions

[...]

Albert Yu¹, Pankaj K. Agarwal¹, Jun Yang¹•Institutions (1)

Duke University¹

19 May 2014

TL;DR: This paper develops a solution for much higher dimensions (up to high tens), if many preferences exhibit sparsity-i.e., each specifies non-zero weights for only a handful of attributes (though the subsets of such attributes and their weights can vary greatly).

...read moreread less

Abstract: Given a set of objects O, each with d numeric attributes, a top-k preference scores these objects using a linear combination of their attribute values, where the weight on each attribute reflects the interest in this attribute. Given a query preference q, a top-k query finds the k objects in O with highest scores with respect to q. Given a query object o and a set of preferences Q, a reverse top-k query finds all preferences q ∈ Q for which o becomes one of the top k objects with respect to q. Previous solutions to these problems are effective only in low dimensions. In this paper, we develop a solution for much higher dimensions (up to high tens), if many preferences exhibit sparsity—i.e., each specifies non-zero weights for only a handful (say 5–7) of attributes (though the subsets of such attributes and their weights can vary greatly). Our idea is to select carefully a set of low-dimensional core subspaces to “cover” the sparse preferences in a workload. These subspaces allow us to index them more effectively than the full-dimensional space. Being multi-dimensional, each subspace covers many possible preferences; furthermore, multiple subspaces can jointly cover a preference, thereby expanding the coverage beyond each subspace's dimensionality. Experimental evaluation validates our solution's effectiveness and advantages over previous solutions.

...read moreread less

14 citations

Proceedings Article•DOI•

iCheck: computationally combating "lies, d--ned lies, and statistics"

[...]

You Wu¹, Brett Walenz¹, Peggy Li¹, Andrew Shim¹, Emre Sonmez¹, Pankaj K. Agarwal¹, Chengkai Li², Jun Yang¹, Cong Yu³ - Show less +5 more•Institutions (3)

Duke University¹, University of Texas at Arlington², Google³

18 Jun 2014

TL;DR: A system to automatically assess the quality of claims and counter misleading claims that cherry-pick data to advance their conclusions is presented.

...read moreread less

Abstract: Are you fed up with "lies, d---ned lies, and statistics" made up from data in our media? For claims based on structured data, we present a system to automatically assess the quality of claims (beyond their correctness) and counter misleading claims that cherry-pick data to advance their conclusions. The key insight is to model such claims as parameterized queries and consider how parameter perturbations affect their results. We demonstrate our system on claims drawn from U.S. congressional voting records, sports statistics, and publication records of database researchers.

...read moreread less

10 citations

Patent•

Adaptive microphone array compensation

[...]

Jun Yang¹•Institutions (1)

Wilmington University¹

10 Feb 2014

TL;DR: In this article, each input microphone signal can be calibrated to a reference based on the energy of the microphone signal in comparison to an energy indicated by the reference, where each gain is calculated as a ratio of a energy reference to the energy measured by the input signal.

...read moreread less

Abstract: An audio-based system may perform audio beamforming and/or sound source localization based on multiple input microphone signals. Each input microphone signal can be calibrated to a reference based on the energy of the microphone signal in comparison to an energy indicated by the reference. Specifically, respective gains may be applied to each input microphone signal, wherein each gain is calculated as a ratio of a energy reference to the energy of the input microphone signal.

...read moreread less

6 citations

Journal Article•

Cumulon: Cloud-Based Statistical Analysis from Users Perspective.

[...]

Botong Huang¹, Nicholas W. D. Jarrett¹, Shivnath Babu¹, Sayan Mukherjee¹, Jun Yang¹ - Show less +1 more•Institutions (1)

Duke University¹

01 Jan 2014-IEEE Data(base) Engineering Bulletin

TL;DR: An overview of Cumulon is presented, aimed at simplifying the development and deployment of statistical analysis of big data on public clouds, and the challenges encountered in building this system are presented.

...read moreread less

Abstract: Cumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data on public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and software platforms. Given user-specified requirements in terms of time, money, and risk tolerance, Cumulon finds the optimal implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings—such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for such resources. This paper presents an overview of Cumulon and the challenges encountered in building this system.

...read moreread less

3 citations

Patent•

Psychoacoustic hearing and masking thresholds-based noise compensator system

[...]

Jun Yang¹•Institutions (1)

Amazon.com¹

25 Aug 2014

TL;DR: In this paper, features are disclosed for performing noise compensation so a level of noise may be audible in an output signal, and a noise signal based on the first noise level can be determined, wherein the noise level of the noise signal is configured to be above a hearing threshold.

...read moreread less

Abstract: Features are disclosed for performing noise compensation so a level of noise may be audible in an output signal. For example, a first noise level of a first signal can be estimated. The first signal can be processed (e.g., by residual echo suppression) to determine a second signal, and a second noise level of the second signal can be estimated. Residual echo suppression can sometimes cause background noise to be eliminated, causing silence. If the second noise level is less than a product of the first noise level and a noise threshold, then a noise signal based on the first noise level can be determined, wherein the noise level of the noise signal is configured to be above a hearing threshold. The noise signal can be combined with the second signal to generate an output signal.

...read moreread less

2 citations