scispace - formally typeset
Search or ask a question

Showing papers by "Jun Yang published in 2014"


Journal ArticleDOI
01 Mar 2014
TL;DR: A framework that models claims based on structured data as parameterized queries as well as an algorithmic framework that enables efficient instantiations of "meta" algorithms by supplying appropriate algorithmic building blocks is proposed.
Abstract: Our news are saturated with claims of "facts" made from data. Database research has in the past focused on how to answer queries, but has not devoted much attention to discerning more subtle qualities of the resulting claims, e.g., is a claim "cherry-picking"? This paper proposes a framework that models claims based on structured data as parameterized queries. A key insight is that we can learn a lot about a claim by perturbing its parameters and seeing how its conclusion changes. This framework lets us formulate practical fact-checking tasks---reverse-engineering (often intentionally) vague claims, and countering questionable claims---as computational problems. Along with the modeling framework, we develop an algorithmic framework that enables efficient instantiations of "meta" algorithms by supplying appropriate algorithmic building blocks. We present real-world examples and experiments that demonstrate the power of our model, efficiency of our algorithms, and usefulness of their results.

128 citations


Journal ArticleDOI
01 Aug 2014
TL;DR: FactWatcher as mentioned in this paper is a system that helps journalists identify data-backed, attention-seizing facts which serve as leads to news stories, including situational facts, one-of-the-few facts, and prominent streaks, through a unified suite of data model, algorithm framework and fact ranking measure.
Abstract: Towards computational journalism, we present FactWatcher, a system that helps journalists identify data-backed, attention-seizing facts which serve as leads to news stories. FactWatcher discovers three types of facts, including situational facts, one-of-the-few facts, and prominent streaks, through a unified suite of data model, algorithm framework, and fact ranking measure. Given an append-only database, upon the arrival of a new tuple, FactWatcher monitors if the tuple triggers any new facts. Its algorithms efficiently search for facts without exhaustively testing all possible ones. Furthermore, FactWatcher provides multiple features in striving for an end-to-end system, including fact ranking, fact-to-statement translation and keyword-based fact search.

45 citations


Proceedings ArticleDOI
19 May 2014
TL;DR: This work designs algorithms in response to the novel problem of finding new, prominent situational facts, which are emerging statements about objects that stand out within certain contexts using three corresponding ideas-tuple reduction, constraint pruning, and sharing computation across measure subspaces.
Abstract: We study the novel problem of finding new, prominent situational facts, which are emerging statements about objects that stand out within certain contexts. Many such facts are newsworthy—e.g., an athlete's outstanding performance in a game, or a viral video's impressive popularity. Effective and efficient identification of these facts assists journalists in reporting, one of the main goals of computational journalism. Technically, we consider an ever-growing table of objects with dimension and measure attributes. A situational fact is a “contextual” skyline tuple that stands out against historical tuples in a context, specified by a conjunctive constraint involving dimension attributes, when a set of measure attributes are compared. New tuples are constantly added to the table, reflecting events happening in the real world. Our goal is to discover constraint-measure pairs that qualify a new tuple as a contextual skyline tuple, and discover them quickly before the event becomes yesterday's news. A brute-force approach requires exhaustive comparison with every tuple, under every constraint, and in every measure subspace. We design algorithms in response to these challenges using three corresponding ideas—tuple reduction, constraint pruning, and sharing computation across measure subspaces. We also adopt a simple prominence measure to rank the discovered facts when they are numerous. Experiments over two real datasets validate the effectiveness and efficiency of our techniques.

16 citations


Proceedings ArticleDOI
19 May 2014
TL;DR: This paper develops a solution for much higher dimensions (up to high tens), if many preferences exhibit sparsity-i.e., each specifies non-zero weights for only a handful of attributes (though the subsets of such attributes and their weights can vary greatly).
Abstract: Given a set of objects O, each with d numeric attributes, a top-k preference scores these objects using a linear combination of their attribute values, where the weight on each attribute reflects the interest in this attribute. Given a query preference q, a top-k query finds the k objects in O with highest scores with respect to q. Given a query object o and a set of preferences Q, a reverse top-k query finds all preferences q ∈ Q for which o becomes one of the top k objects with respect to q. Previous solutions to these problems are effective only in low dimensions. In this paper, we develop a solution for much higher dimensions (up to high tens), if many preferences exhibit sparsity—i.e., each specifies non-zero weights for only a handful (say 5–7) of attributes (though the subsets of such attributes and their weights can vary greatly). Our idea is to select carefully a set of low-dimensional core subspaces to “cover” the sparse preferences in a workload. These subspaces allow us to index them more effectively than the full-dimensional space. Being multi-dimensional, each subspace covers many possible preferences; furthermore, multiple subspaces can jointly cover a preference, thereby expanding the coverage beyond each subspace's dimensionality. Experimental evaluation validates our solution's effectiveness and advantages over previous solutions.

14 citations


Proceedings ArticleDOI
18 Jun 2014
TL;DR: A system to automatically assess the quality of claims and counter misleading claims that cherry-pick data to advance their conclusions is presented.
Abstract: Are you fed up with "lies, d---ned lies, and statistics" made up from data in our media? For claims based on structured data, we present a system to automatically assess the quality of claims (beyond their correctness) and counter misleading claims that cherry-pick data to advance their conclusions. The key insight is to model such claims as parameterized queries and consider how parameter perturbations affect their results. We demonstrate our system on claims drawn from U.S. congressional voting records, sports statistics, and publication records of database researchers.

10 citations


Patent
Jun Yang1
10 Feb 2014
TL;DR: In this article, each input microphone signal can be calibrated to a reference based on the energy of the microphone signal in comparison to an energy indicated by the reference, where each gain is calculated as a ratio of a energy reference to the energy measured by the input signal.
Abstract: An audio-based system may perform audio beamforming and/or sound source localization based on multiple input microphone signals. Each input microphone signal can be calibrated to a reference based on the energy of the microphone signal in comparison to an energy indicated by the reference. Specifically, respective gains may be applied to each input microphone signal, wherein each gain is calculated as a ratio of a energy reference to the energy of the input microphone signal.

6 citations


Journal Article
TL;DR: An overview of Cumulon is presented, aimed at simplifying the development and deployment of statistical analysis of big data on public clouds, and the challenges encountered in building this system are presented.
Abstract: Cumulon is a system aimed at simplifying the development and deployment of statistical analysis of big data on public clouds. Cumulon allows users to program in their familiar language of matrices and linear algebra, without worrying about how to map data and computation to specific hardware and software platforms. Given user-specified requirements in terms of time, money, and risk tolerance, Cumulon finds the optimal implementation alternatives, execution parameters, as well as hardware provisioning and configuration settings—such as what type of machines and how many of them to acquire. Cumulon also supports clouds with auction-based markets: it effectively utilizes computing resources whose availability varies according to market conditions, and suggests best bidding strategies for such resources. This paper presents an overview of Cumulon and the challenges encountered in building this system.

3 citations


Patent
Jun Yang1
25 Aug 2014
TL;DR: In this paper, features are disclosed for performing noise compensation so a level of noise may be audible in an output signal, and a noise signal based on the first noise level can be determined, wherein the noise level of the noise signal is configured to be above a hearing threshold.
Abstract: Features are disclosed for performing noise compensation so a level of noise may be audible in an output signal. For example, a first noise level of a first signal can be estimated. The first signal can be processed (e.g., by residual echo suppression) to determine a second signal, and a second noise level of the second signal can be estimated. Residual echo suppression can sometimes cause background noise to be eliminated, causing silence. If the second noise level is less than a product of the first noise level and a noise threshold, then a noise signal based on the first noise level can be determined, wherein the noise level of the noise signal is configured to be above a hearing threshold. The noise signal can be combined with the second signal to generate an output signal.

2 citations