scispace - formally typeset
Search or ask a question

Showing papers by "AT&T Labs published in 2012"


Proceedings ArticleDOI
25 Jun 2012
TL;DR: This paper develops the first empirically derived comprehensive power model of a commercial LTE network with less than 6% error rate and state transitions matching the specifications, and identifies that the performance bottleneck for web-based applications lies less in the network, compared to the previous study in 3G.
Abstract: With the recent advent of 4G LTE networks, there has been increasing interest to better understand the performance and power characteristics, compared with 3G/WiFi networks. In this paper, we take one of the first steps in this direction.Using a publicly deployed tool we designed for Android called 4GTest attracting more than 3000 users within 2 months and extensive local experiments, we study the network performance of LTE networks and compare with other types of mobile networks. We observe LTE generally has significantly higher downlink and uplink throughput than 3G and even WiFi, with a median value of 13Mbps and 6Mbps, respectively. We develop the first empirically derived comprehensive power model of a commercial LTE network with less than 6% error rate and state transitions matching the specifications. Using a comprehensive data set consisting of 5-month traces of 20 smartphone users, we carefully investigate the energy usage in 3G, LTE, and WiFi networks and evaluate the impact of configuring LTE-related parameters. Despite several new power saving improvements, we find that LTE is as much as 23 times less power efficient compared with WiFi, and even less power efficient than 3G, based on the user traces and the long high power tail is found to be a key contributor. In addition, we perform case studies of several popular applications on Android in LTE and identify that the performance bottleneck for web-based applications lies less in the network, compared to our previous study in 3G [24]. Instead, the device's processing power, despite the significant improvement compared to our analysis two years ago, becomes more of a bottleneck.

1,029 citations


Proceedings ArticleDOI
08 Oct 2012
TL;DR: The prototype of COMET (Code Offload by Migrating Execution Transparently), a realization of this design built on top of the Dalvik Virtual Machine, leverages the underlying memory model of the runtime to implement distributed shared memory (DSM) with as few interactions between machines as possible.
Abstract: In this paper we introduce a runtime system to allow unmodified multi-threaded applications to use multiple machines. The system allows threads to migrate freely between machines depending on the workload. Our prototype, COMET (Code Offload by Migrating Execution Transparently), is a realization of this design built on top of the Dalvik Virtual Machine. COMET leverages the underlying memory model of our runtime to implement distributed shared memory (DSM) with as few interactions between machines as possible. Making use of a new VM-synchronization primitive, COMET imposes little restriction on when migration can occur. Additionally, enough information is maintained so one machine may resume computation after a network failure.We target our efforts towards augmenting smartphones or tablets with machines available in the network. We demonstrate the effectiveness of COMET on several real applications available on Google Play. These applications include image editors, turn-based games, a trip planner, and math tools. Utilizing a server-class machine, COMET can offer significant speed-ups on these real applications when run on a modern smartphone. With WiFi and 3G networks, we observe geometric mean speed-ups of 2.88× and 1.27× relative to the Dalvik interpreter across the set of applications with speed-ups as high as 15× on some applications.

399 citations


Book
16 Jan 2012
TL;DR: Approximate Query Processing (AQP) as mentioned in this paper uses a lossy, compact synopsis of the data, and then executes the query of interest against the synopsis rather than the entire dataset.
Abstract: Methods for Approximate Query Processing (AQP) are essential for dealing with massive data. They are often the only means of providing interactive response times when exploring massive datasets, and are also needed to handle high speed data streams. These methods proceed by computing a lossy, compact synopsis of the data, and then executing the query of interest against the synopsis rather than the entire dataset. We describe basic principles and recent developments in AQP. We focus on four key synopses: random samples, histograms, wavelets, and sketches. We consider issues such as accuracy, space and time efficiency, optimality, practicality, range of applicability, error bounds on query answers, and incremental maintenance. We also discuss the trade-offs between the different synopsis types.

361 citations


Proceedings ArticleDOI
25 Jun 2012
TL;DR: The location of screen taps on modern smartphones and tablets can be identified from accelerometer and gyroscope readings, and TapPrints, a framework for inferring the location of taps on mobile device touch-screens using motion sensor data combined with machine learning analysis is presented.
Abstract: This paper shows that the location of screen taps on modern smartphones and tablets can be identified from accelerometer and gyroscope readings. Our findings have serious implications, as we demonstrate that an attacker can launch a background process on commodity smartphones and tablets, and silently monitor the user's inputs, such as keyboard presses and icon taps. While precise tap detection is nontrivial, requiring machine learning algorithms to identify fingerprints of closely spaced keys, sensitive sensors on modern devices aid the process. We present TapPrints, a framework for inferring the location of taps on mobile device touch-screens using motion sensor data combined with machine learning analysis. By running tests on two different off-the-shelf smartphones and a tablet computer we show that identifying tap locations on the screen and inferring English letters could be done with up to 90% and 80% accuracy, respectively. By optimizing the core tap detection capability with additional information, such as contextual priors, we are able to further magnify the core threat.

358 citations


Proceedings ArticleDOI
25 Jun 2012
TL;DR: A novel approach to modeling how large populations move within different metropolitan areas, which takes as input spatial and temporal probability distributions drawn from empirical data, such as Call Detail Records from a cellular telephone network, and produces synthetic CDRs for a synthetic population.
Abstract: Models of human mobility have broad applicability in fields such as mobile computing, urban planning, and ecology. This paper proposes and evaluates WHERE, a novel approach to modeling how large populations move within different metropolitan areas. WHERE takes as input spatial and temporal probability distributions drawn from empirical data, such as Call Detail Records (CDRs) from a cellular telephone network, and produces synthetic CDRs for a synthetic population. We have validated WHERE against billions of anonymous location samples for hundreds of thousands of phones in the New York and Los Angeles metropolitan areas. We found that WHERE offers significantly higher fidelity than other modeling approaches. For example, daily range of travel statistics fall within one mile of their true values, an improvement of more than 14 times over a Weighted Random Waypoint model. Our modeling techniques and synthetic CDRs can be applied to a wide range of problems while avoiding many of the privacy concerns surrounding real CDRs.

286 citations


Proceedings ArticleDOI
13 Aug 2012
TL;DR: A first-of-its-kind and in-depth analysis of one of the largest IXPs worldwide based on nine months' worth of sFlow records collected at that IXP in 2011 suggests that these large IXPs can be viewed as a microcosm of the Internet ecosystem itself and argues for a re-assessment of the mental picture the community has about this ecosystem.
Abstract: The largest IXPs carry on a daily basis traffic volumes in the petabyte range, similar to what some of the largest global ISPs reportedly handle. This little-known fact is due to a few hundreds of member ASes exchanging traffic with one another over the IXP's infrastructure. This paper reports on a first-of-its-kind and in-depth analysis of one of the largest IXPs worldwide based on nine months' worth of sFlow records collected at that IXP in 2011.A main finding of our study is that the number of actual peering links at this single IXP exceeds the number of total AS links of the peer-peer type in the entire Internet known as of 2010! To explain such a surprisingly rich peering fabric, we examine in detail this IXP's ecosystem and highlight the diversity of networks that are members at this IXP and connect there with other member ASes for reasons that are similarly diverse, but can be partially inferred from their business types and observed traffic patterns. In the process, we investigate this IXP's traffic matrix and illustrate what its temporal and structural properties can tell us about the member ASes that generated the traffic in the first place. While our results suggest that these large IXPs can be viewed as a microcosm of the Internet ecosystem itself, they also argue for a re-assessment of the mental picture that our community has about this ecosystem.

278 citations


Proceedings ArticleDOI
11 Jun 2012
TL;DR: These and other findings suggest that better protocol design, more careful spectrum allocation, and modified pricing schemes may be needed to accommodate the rise of M2M devices.
Abstract: Cellular network based Machine-to-Machine (M2M) communication is fast becoming a market-changing force for a wide spectrum of businesses and applications such as telematics, smart metering, point-of-sale terminals, and home security and automation systems. In this paper, we aim to answer the following important question: Does traffic generated by M2M devices impose new requirements and challenges for cellular network design and management? To answer this question, we take a first look at the characteristics of M2M traffic and compare it with traditional smartphone traffic. We have conducted our measurement analysis using a week-long traffic trace collected from a tier-1 cellular network in the United States. We characterize M2M traffic from a wide range of perspectives, including temporal dynamics, device mobility, application usage, and network performance.Our experimental results show that M2M traffic exhibits significantly different patterns than smartphone traffic in multiple aspects. For instance, M2M devices have a much larger ratio of uplink to downlink traffic volume, their traffic typically exhibits different diurnal patterns, they are more likely to generate synchronized traffic resulting in bursty aggregate traffic volumes, and are less mobile compared to smartphones. On the other hand, we also find that M2M devices are generally competing with smartphones for network resources in co-located geographical regions. These and other findings suggest that better protocol design, more careful spectrum allocation, and modified pricing schemes may be needed to accommodate the rise of M2M devices.

274 citations


Proceedings ArticleDOI
08 Jan 2012
TL;DR: In this paper, the authors present a verifiable computation prover that runs in O(S(n) log S(n)), where S is the size of an arithmetic circuit computing the function of interest; this compares favorably to the poly(S n) runtime for the prover promised in [19].
Abstract: When delegating computation to a service provider, as in the cloud computing paradigm, we seek some reassurance that the output is correct and complete. Yet recomputing the output as a check is inefficient and expensive, and it may not even be feasible to store all the data locally. We are therefore interested in what can be validated by a streaming (sublinear space) user, who cannot store the full input, or perform the full computation herself. Our aim in this work is to advance a recent line of work on "proof systems" in which the service provider proves the correctness of its output to a user. The goal is to minimize the time and space costs of both parties in generating and checking the proof. Only very recently have there been attempts to implement such proof systems, and thus far these have been quite limited in functionality.Here, our approach is two-fold. First, we describe a carefully chosen instantiation of one of the most efficient general-purpose constructions for arbitrary computations (streaming or otherwise), due to Goldwasser, Kalai, and Rothblum [19]. This requires several new insights and enhancements to move the methodology from a theoretical result to a practical possibility. Our main contribution is in achieving a prover that runs in time O(S(n) log S(n)), where S(n) is the size of an arithmetic circuit computing the function of interest; this compares favorably to the poly(S(n)) runtime for the prover promised in [19]. Our experimental results demonstrate that a practical general-purpose protocol for verifiable computation may be significantly closer to reality than previously realized.Second, we describe a set of techniques that achieve genuine scalability for protocols fine-tuned for specific important problems in streaming and database processing. Focusing in particular on non-interactive protocols for problems ranging from matrix-vector multiplication to bipartite perfect matching, we build on prior work [8, 5] to achieve a prover that runs in nearly linear-time, while obtaining optimal tradeoffs between communication cost and the user's working memory. Existing techniques required (substantially) superlinear time for the prover. Finally, we develop improved interactive protocols for specific problems based on a linearization technique originally due to Shen [33]. We argue that even if general-purpose methods improve, fine-tuned protocols will remain valuable in real-world settings for key problems, and hence special attention to specific problems is warranted.

227 citations


Journal ArticleDOI
01 Dec 2012
TL;DR: This study studies truthfulness of Deep Web data in two domains where data are fairly clean and data quality is important to people's lives: Stock and Flight to observe a large amount of inconsistency on data from different sources and also some sources with quite low accuracy.
Abstract: The amount of useful information available on the Web has been growing at a dramatic pace in recent years and people rely more and more on the Web to fulfill their information needs. In this paper, we study truthfulness of Deep Web data in two domains where we believed data are fairly clean and data quality is important to people's lives: Stock and Flight. To our surprise, we observed a large amount of inconsistency on data from different sources and also some sources with quite low accuracy. We further applied on these two data sets state-of-the-art data fusion methods that aim at resolving conflicts and finding the truth, analyzed their strengths and limitations, and suggested promising research directions. We wish our study can increase awareness of the seriousness of conflicting data on the Web and in turn inspire more research in our community to tackle this problem.

227 citations


Journal ArticleDOI
TL;DR: This paper presents an algorithm that has at its heart the same ideas espoused in compressive sensing, but adapted to the problem of network datasets, and shows how this algorithm can be used in a variety of ways to solve problems such as simple interpolation of missing values, traffic matrix inference from link data, prediction, and anomaly detection.
Abstract: Despite advances in measurement technology, it is still challenging to reliably compile large-scale network datasets. For example, because of flaws in the measurement systems or difficulties posed by the measurement problem itself, missing, ambiguous, or indirect data are common. In the case where such data have spatio-temporal structure, it is natural to try to leverage this structure to deal with the challenges posed by the problematic nature of the data. Our work involving network datasets draws on ideas from the area of compressive sensing and matrix completion, where sparsity is exploited in estimating quantities of interest. However, the standard results on compressive sensing are: 1) reliant on conditions that generally do not hold for network datasets; and 2) do not allow us to exploit all we know about their spatio-temporal structure. In this paper, we overcome these limitations with an algorithm that has at its heart the same ideas espoused in compressive sensing, but adapted to the problem of network datasets. We show how this algorithm can be used in a variety of ways, in particular on traffic data, to solve problems such as simple interpolation of missing values, traffic matrix inference from link data, prediction, and anomaly detection. The elegance of the approach lies in the fact that it unifies all of these tasks and allows them to be performed even when as much as 98% of the data is missing.

211 citations


Journal ArticleDOI
TL;DR: This paper proposes an analytical model for evaluating Strict FFR and Soft Frequency Reuse (SFR) deployments based on the spatial Poisson point process and results both capture the non-uniformity of heterogeneous deployments and produce tractable expressions which can be used for system design with StrictFFR and SFR.
Abstract: Interference management techniques are critical to the performance of heterogeneous cellular networks, which will have dense and overlapping coverage areas, and experience high levels of interference. Fractional frequency reuse (FFR) is an attractive interference management technique due to its low complexity and overhead, and significant coverage improvement for low-percentile (cell-edge) users. Instead of relying on system simulations based on deterministic access point locations, this paper instead proposes an analytical model for evaluating Strict FFR and Soft Frequency Reuse (SFR) deployments based on the spatial Poisson point process. Our results both capture the non-uniformity of heterogeneous deployments and produce tractable expressions which can be used for system design with Strict FFR and SFR. We observe that the use of Strict FFR bands reserved for the users of each tier with the lowest average \sinr provides the highest gains in terms of coverage and rate, while the use of SFR allows for more efficient use of shared spectrum between the tiers, while still mitigating much of the interference. Additionally, in the context of multi-tier networks with closed access in some tiers, the proposed framework shows the impact of cross-tier interference on closed access FFR, and informs the selection of key FFR parameters in open access.

Journal ArticleDOI
01 Dec 2012
TL;DR: This paper proposes a randomized solution for selecting sources for fusion and shows empirically its effectiveness and scalability on both real-world data and synthetic data.
Abstract: We are often thrilled by the abundance of information surrounding us and wish to integrate data from as many sources as possible. However, understanding, analyzing, and using these data are often hard. Too much data can introduce a huge integration cost, such as expenses for purchasing data and resources for integration and cleaning. Furthermore, including low-quality data can even deteriorate the quality of integration results instead of bringing the desired quality gain. Thus, "the more the better" does not always hold for data integration and often "less is more".In this paper, we study how to select a subset of sources before integration such that we can balance the quality of integrated data and integration cost. Inspired by the Marginalism principle in economic theory, we wish to integrate a new source only if its marginal gain, often a function of improved integration quality, is higher than the marginal cost, associated with data-purchase expense and integration resources. As a first step towards this goal, we focus on data fusion tasks, where the goal is to resolve conflicts from different sources. We propose a randomized solution for selecting sources for fusion and show empirically its effectiveness and scalability on both real-world data and synthetic data.

Proceedings ArticleDOI
08 Feb 2012
TL;DR: This work proposes novel models which approximately optimize NDCG for the recommendation task, essentially variations on matrix factorization models where the features associated with the users and the items for the ranking task are learned.
Abstract: Typical recommender systems use the root mean squared error (RMSE) between the predicted and actual ratings as the evaluation metric. We argue that RMSE is not an optimal choice for this task, especially when we will only recommend a few (top) items to any user. Instead, we propose using a ranking metric, namely normalized discounted cumulative gain (NDCG), as a better evaluation metric for this task. Borrowing ideas from the learning to rank community for web search, we propose novel models which approximately optimize NDCG for the recommendation task. Our models are essentially variations on matrix factorization models where we also additionally learn the features associated with the users and the items for the ranking task. Experimental results on a number of standard collaborative filtering data sets validate our claims. The results also show the accuracy and efficiency of our models and the benefits of learning features for ranking.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: A minimum energy sensing scheduling problem is formally defined and a polynomial-time algorithm to obtain optimal solutions is presented, which can be used to show energy savings that can potentially be achieved by using collaborative sensing in mobile phone sensing applications, and can also serve as a benchmark for performance evaluation.
Abstract: Mobile phones with a rich set of embedded sensors enable sensing applications in various domains. In this paper, we propose to leverage cloud-assisted collaborative sensing to reduce sensing energy consumption for mobile phone sensing applications. We formally define a minimum energy sensing scheduling problem and present a polynomial-time algorithm to obtain optimal solutions, which can be used to show energy savings that can potentially be achieved by using collaborative sensing in mobile phone sensing applications, and can also serve as a benchmark for performance evaluation. We also address individual energy consumption and fairness by presenting an algorithm to find fair energy-efficient sensing schedules. Under realistic assumptions, we present two practical and effective heuristic algorithms to find energy-efficient sensing schedules. It has been shown by simulation results based on real energy consumption (measured by the Monsoon power monitor) and location (collected from the Google Map) data that collaborative sensing significantly reduces energy consumption compared to a traditional approach without collaborations, and the proposed heuristic algorithm performs well in terms of both total energy consumption and fairness.

Journal ArticleDOI
01 Feb 2012
TL;DR: This is the first work to study the efficient maintenance of dense subgraphs under streaming edge weight updates of real-time story identification, and proposes a novel algorithm, DynDens, which outperforms adaptations of existing techniques to this setting, and yields meaningful results.
Abstract: Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, every day, millions of blog posts, micro-blog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightly-coupled real-world entities, namely the people, locations, products, etc., that are involved in the story. The sheer scale, and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time.The main challenge in real-time story identification is the maintenance of dense subgraphs (corresponding to groups of tightly-coupled entities) under streaming edge weight updates (resulting from a stream of user-generated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DynDens, which outperforms adaptations of existing techniques to this setting, and yields meaningful results. Our approach is validated by a thorough experimental evaluation on large-scale real and synthetic datasets.

Proceedings ArticleDOI
21 May 2012
TL;DR: This paper demonstrates that the MG and the SpaceSaving summaries for heavy hitters are indeed mergeable or can be made mergeable after appropriate modifications, and provides the best known randomized streaming bound for ε-approximate quantiles that depends only on ε, of size O(1 overε log 3/21 over ε).
Abstract: We study the mergeability of data summaries. Informally speaking, mergeability requires that, given two summaries on two data sets, there is a way to merge the two summaries into a single summary on the union of the two data sets, while preserving the error and size guarantees. This property means that the summaries can be merged in a way like other algebraic operators such as sum and max, which is especially useful for computing summaries on massive distributed data. Several data summaries are trivially mergeable by construction, most notably all the sketches that are linear functions of the data sets. But some other fundamental ones like those for heavy hitters and quantiles, are not (known to be) mergeable. In this paper, we demonstrate that these summaries are indeed mergeable or can be made mergeable after appropriate modifications. Specifically, we show that for e-approximate heavy hitters, there is a deterministic mergeable summary of size O(1/e) for e-approximate quantiles, there is a deterministic summary of size O(1 over e log(en))that has a restricted form of mergeability, and a randomized one of size O(1 over e log 3/21 over e) with full mergeability. We also extend our results to geometric summaries such as e-approximations and ekernels.We also achieve two results of independent interest: (1) we provide the best known randomized streaming bound for e-approximate quantiles that depends only on e, of size O(1 over e log 3/21 over e, and (2) we demonstrate that the MG and the SpaceSaving summaries for heavy hitters are isomorphic.

Journal ArticleDOI
TL;DR: A multi-population biased random-key genetic algorithm (BRKGA) for the single container loading problem (3D-CLP) where several rectangular boxes of different sizes are loaded into a single rectangular container using a maximal-space representation to manage the free spaces in the container.

Journal ArticleDOI
TL;DR: In this paper, the authors show that the simplest possible tabulation hashing provides unexpectedly strong guarantees, such as Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing.
Abstract: Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees.The scheme itself dates back to Zobrist in 1970 who used it for game playing programs. Keys are viewed as consisting of c characters. We initialize c tables H1, ..., Hc mapping characters to random hash codes. A key x = (x1, ..., xc) is hashed to H1[x1] ⊕ c ⊕ Hc[xc], where ⊕ denotes bit-wise exclusive-or.While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, for example, Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing.

Journal ArticleDOI
TL;DR: Two fast local search routines are introduced that can determine in linear time whether a maximal solution can be improved by replacing a single vertex with two others and a more elaborate heuristic that successfully applies local search to find near-optimum solutions to a wide variety of instances is presented.
Abstract: Given a graph G=(V,E), the independent set problem is that of finding a maximum-cardinality subset S of V such that no two vertices in S are adjacent. We introduce two fast local search routines for this problem. The first can determine in linear time whether a maximal solution can be improved by replacing a single vertex with two others. The second routine can determine in O(mΔ) time (where Δ is the highest degree in the graph) whether there are two solution vertices than can be replaced by a set of three. We also present a more elaborate heuristic that successfully applies local search to find near-optimum solutions to a wide variety of instances. We test our algorithms on instances from the literature as well as on new ones proposed in this paper.

Journal ArticleDOI
TL;DR: In this paper, the advantages of spatial superchannels for future terabit networks based on space-division multiplexing (SDM) are discussed, and a coherent receiver utilizing joint digital signal processing (DSP) is demonstrated.
Abstract: We discuss the advantages of spatial superchannels for future terabit networks based on space-division multiplexing (SDM), and demonstrate reception of spatial superchannels by a coherent receiver utilizing joint digital signal processing (DSP). In a spatial superchannel, the SDM modes at a given wavelength are routed together, allowing a simplified design of both transponders and optical routing equipment. For example, common-mode impairments can be exploited to streamline the receiver's DSP. Our laboratory measurements reveal that the phase fluctuations between the cores of a multicore fiber are strongly correlated, and therefore constitute such a common-mode impairment. We implement master-slave phase recovery of two simultaneous 112-Gbps subchannels in a seven-core fiber, demonstrating reduced processing complexity with no increase in the bit-error ratio. Furthermore, we investigate the feasibility of applying this technique to subchannels carried on separate single-mode fibers, a potential transition strategy to evolve today's fiber networks toward future networks using multicore fibers.

Journal ArticleDOI
TL;DR: In this paper, a hole-assisted few-mode multicore fiber in which each core supports both the LP01 mode and the two degenerate LP11 modes has been designed and fabricated for the first time.
Abstract: A seven-core few-mode multicore fiber in which each core supports both the LP01 mode and the two degenerate LP11 modes has been designed and fabricated for the first time, to the best of our knowledge. The hole-assisted structure enables low inter-core crosstalk and high mode density at the same time. LP01 inter-core crosstalk has been measured to be lower than -60 dB/km. LP11 inter-core crosstalk has been measured to be around -40 dB/km using a different setup. The LP11 free-space excitation-induced crosstalk is simulated and analyzed. This fiber allows multiplexed transmission of 21 spatial modes per polarization per wavelength. Data transmission in LP01/LP11 mode over 1 km of this fiber has been demonstrated with negligible penalty.

Proceedings ArticleDOI
16 Apr 2012
TL;DR: A Mutual Reinforcement-based Label Propagation (MRLP) algorithm is proposed to predict question quality in CQA and it is found that the interaction between askers and topics results in the differences of question quality.
Abstract: Users tend to ask and answer questions in community question answering (CQA) services to seek information and share knowledge. A corollary is that myriad of questions and answers appear in CQA service. Accordingly, volumes of studies have been taken to explore the answer quality so as to provide a preliminary screening for better answers. However, to our knowledge, less attention has so far been paid to question quality in CQA. Knowing question quality provides us with finding and recommending good questions together with identifying bad ones which hinder the CQA service. In this paper, we are conducting two studies to investigate the question quality issue. The first study analyzes the factors of question quality and finds that the interaction between askers and topics results in the differences of question quality. Based on this finding, in the second study we propose a Mutual Reinforcement-based Label Propagation (MRLP) algorithm to predict question quality. We experiment with Yahoo!~Answers data and the results demonstrate the effectiveness of our algorithm in distinguishing high-quality questions from low-quality ones.

Proceedings ArticleDOI
25 Jun 2012
TL;DR: This work performs the first network-wide study of the redundant transfers caused by inefficient web caching on handsets, using a dataset collected from 3 million smartphone users of a large commercial cellular carrier, as well as another five-month-long trace contributed by 20 smartphone users.
Abstract: Web caching in mobile networks is critical due to the unprecedented cellular traffic growth that far exceeds the deployment of cellular infrastructures. Caching on handsets is particularly important as it eliminates all network-related overheads. We perform the first network-wide study of the redundant transfers caused by inefficient web caching on handsets, using a dataset collected from 3 million smartphone users of a large commercial cellular carrier, as well as another five-month-long trace contributed by 20 smartphone users. Our findings suggest that redundant transfers contribute 18% and 20% of the total HTTP traffic volume in the two datasets. Also they are responsible for 17% of the bytes, 7% of the radio energy consumption, 6% of the signaling load, and 9% of the radio resource utilization of all cellular data traffic in the second dataset. Most of such redundant transfers are caused by the smartphone web caching implementation that does not fully support or strictly follow the protocol specification, or by developers not fully utilizing the caching support provided by the libraries. This is further confirmed by our caching tests of 10 popular HTTP libraries and mobile browsers. Improving the cache implementation will bring considerable reduction of network traffic volume, cellular resource consumption, handset energy consumption, and user-perceived latency, benefiting both cellular carriers and customers.

Proceedings ArticleDOI
25 Jun 2012
TL;DR: A vision of a future in which mobile devices become a core component of mobile cloud computing architectures, and a world where mobile devices will be capable of forming mobile clouds, or mClouds, to accomplish tasks locally without relying, when possible, on costly and, sometimes, inefficient backend communication.
Abstract: When we think of mobile cloud computing today, we typically refer to empowering mobile devices - in particular smartphones and tablets - with the capabilities of stationary resources residing in giant data centers. But what happens when these mobile devices become as powerful as our personal computers or more? This paper presents our vision of a future in which mobile devices become a core component of mobile cloud computing architectures. We envision a world where mobile devices will be capable of forming mobile clouds, or mClouds, to accomplish tasks locally without relying, when possible, on costly and, sometimes, inefficient backend communication. We discuss a possible mClouds architecture, its benefits and tradeoffs, and the user incentive scheme to support the mCloud design.

Journal ArticleDOI
TL;DR: The different multiple antenna techniques introduced in LTE-Advanced are discussed, and the main enabling solutions introduced for downlink and uplink transmissions are presented.
Abstract: In this article we discuss the different multiple antenna techniques introduced in LTE-Advanced. Rather than describing the technical details of the adopted solutions, we approach the problem starting from the design targets and the antenna deployments prioritized by the operators. Then we present the main enabling solutions introduced for downlink and uplink transmissions, and subsequently assess the performance of these solutions in different scenarios. Finally, we discuss some possible future developments.

Proceedings ArticleDOI
16 Apr 2012
TL;DR: This work performs the first network-wide, large-scale investigation of a particular type of application traffic pattern called periodic transfers where a handset periodically exchanges some data with a remote server every t seconds, found that periodic transfers are very prevalent in today's smartphone traffic.
Abstract: Cellular networks employ a specific radio resource management policy distinguishing them from wired and Wi-Fi networks. A lack of awareness of this important mechanism potentially leads to resource-inefficient mobile applications. We perform the first network-wide, large-scale investigation of a particular type of application traffic pattern called periodic transfers where a handset periodically exchanges some data with a remote server every t seconds. Using packet traces containing 1.5 billion packets collected from a commercial cellular carrier, we found that periodic transfers are very prevalent in today's smartphone traffic. However, they are extremely resource-inefficient for both the network and end-user devices even though they predominantly generate very little traffic. This somewhat counter-intuitive behavior is a direct consequence of the adverse interaction between such periodic transfer patterns and the cellular network radio resource management policy. For example, for popular smartphone applications such as Facebook, periodic transfers account for only 1.7% of the overall traffic volume but contribute to 30% of the total handset radio energy consumption. We found periodic transfers are generated for various reasons such as keep-alive, polling, and user behavior measurements. We further investigate the potential of various traffic shaping and resource control algorithms. Depending on their traffic patterns, applications exhibit disparate responses to optimization strategies. Jointly using several strategies with moderate aggressiveness can eliminate almost all energy impact of periodic transfers for popular applications such as Facebook and Pandora.

Proceedings ArticleDOI
16 Oct 2012
TL;DR: A new self-service cloud (SSC) computing model that splits administrative privileges between a system-wide domain and per-client administrative domains, and allows providers and clients to establish mutually trusted services that can check regulatory compliance while respecting client privacy.
Abstract: Modern cloud computing infrastructures use virtual machine monitors (VMMs) that often include a large and complex administrative domain with privileges to inspect client VM state. Attacks against or misuse of the administrative domain can compromise client security and privacy. Moreover, these VMMs provide clients inflexible control over their own VMs, as a result of which clients have to rely on the cloud provider to deploy useful services, such as VM introspection-based security tools.We introduce a new self-service cloud (SSC) computing model that addresses these two shortcomings. SSC splits administrative privileges between a system-wide domain and per-client administrative domains. Each client can manage and perform privileged system tasks on its own VMs, thereby providing flexibility. The system-wide administrative domain cannot inspect the code, data or computation of client VMs, thereby ensuring security and privacy. SSC also allows providers and clients to establish mutually trusted services that can check regulatory compliance while respecting client privacy. We have implemented SSC by modifying the Xen hypervisor. We demonstrate its utility by building user domains to perform privileged tasks such as memory introspection, storage intrusion detection, and anomaly detection.

Journal ArticleDOI
TL;DR: This paper addresses the major innovations developed in Phase 1 of the program by the team led by Telcordia and AT&T with the ultimate goal to transfer the technology to commercial and government networks for deployment in the next few years.
Abstract: The Core Optical Networks (CORONET) program addresses the development of architectures, protocols, and network control and management to support the future advanced requirements of both commercial and government networks, with a focus on highly dynamic and highly resilient multi-terabit core networks. CORONET encompasses a global network supporting a combination of IP and wavelength services. Satisfying the aggressive requirements of the program required a comprehensive approach addressing connection setup, restoration, quality of service, network design, and nodal architecture. This paper addresses the major innovations developed in Phase 1 of the program by the team led by Telcordia and AT&T. The ultimate goal is to transfer the technology to commercial and government networks for deployment in the next few years.

Proceedings ArticleDOI
25 Mar 2012
TL;DR: This paper provides the first fine-grained characterization of the geospatial dynamics of application usage in a 3G cellular data network and presents cellular network operators with fine- grained insights that can be leveraged to tune network parameter settings.
Abstract: Recent studies on cellular network measurement have provided the evidence that significant geospatial correlations, in terms of traffic volume and application access, exist in cellular network usage. Such geospatial correlation patterns provide local optimization opportunities to cellular network operators for handling the explosive growth in the traffic volume observed in recent years. To the best of our knowledge, in this paper, we provide the first fine-grained characterization of the geospatial dynamics of application usage in a 3G cellular data network. Our analysis is based on two simultaneously collected traces from the radio access network (containing location records) and the core network (containing traffic records) of a tier-1 cellular network in the United States. To better understand the application usage in our data, we first cluster cell locations based on their application distributions and then study the geospatial dynamics of application usage across different geographical regions. The results of our measurement study present cellular network operators with fine-grained insights that can be leveraged to tune network parameter settings.

Journal ArticleDOI
Pamela Zave1
29 Mar 2012
TL;DR: By combining the right selection of pseudocode and textual hints from several papers, and fixing flaws revealed by analysis, it is possible to get a version of Chord that may be correct.
Abstract: Correctness of the Chord ring-maintenance protocol would mean that the protocol can eventually repair all disruptions in the ring structure, given ample time and no further disruptions while it is working. In other words, it is "eventual reachability." Under the same assumptions about failure behavior as made in the Chord papers, no published version of Chord is correct. This result is based on modeling the protocol in Alloy and analyzing it with the Alloy Analyzer. By combining the right selection of pseudocode and textual hints from several papers, and fixing flaws revealed by analysis, it is possible to get a version that may be correct. The paper also discusses the significance of these results, describes briefly how Alloy is used to model and reason about Chord, and compares Alloy analysis to model-checking.