Showing papers by "AT&T Labs published in 2009"

PDF

Open Access

Journal Article•DOI•

The Case for VM-Based Cloudlets in Mobile Computing

[...]

Mahadev Satyanarayanan¹, Paramvir Bahl², Ramón Cáceres³, Nigel Davies⁴•Institutions (4)

Carnegie Mellon University¹, Microsoft², AT&T Labs³, Lancaster University⁴

01 Oct 2009-IEEE Pervasive Computing

TL;DR: The results from a proof-of-concept prototype suggest that VM technology can indeed help meet the need for rapid customization of infrastructure for diverse applications, and this article discusses the technical obstacles to these transformations and proposes a new architecture for overcoming them.

...read moreread less

Abstract: Mobile computing continuously evolve through the sustained effort of many researchers. It seamlessly augments users' cognitive abilities via compute-intensive capabilities such as speech recognition, natural language processing, etc. By thus empowering mobile users, we could transform many areas of human activity. This article discusses the technical obstacles to these transformations and proposes a new architecture for overcoming them. In this architecture, a mobile user exploits virtual machine (VM) technology to rapidly instantiate customized service software on a nearby cloudlet and then uses that service over a wireless LAN; the mobile device typically functions as a thin client with respect to the service. A cloudlet is a trusted, resource-rich computer or cluster of computers that's well-connected to the Internet and available for use by nearby mobile devices. Our strategy of leveraging transiently customized proximate infrastructure as a mobile device moves with its user through the physical world is called cloudlet-based, resource-rich, mobile computing. Crisp interactive response, which is essential for seamless augmentation of human cognition, is easily achieved in this architecture because of the cloudlet's physical proximity and one-hop network latency. Using a cloudlet also simplifies the challenge of meeting the peak bandwidth demand of multiple users interactively generating and receiving media such as high-definition video and high-resolution images. Rapid customization of infrastructure for diverse applications emerges as a critical requirement, and our results from a proof-of-concept prototype suggest that VM technology can indeed help meet this requirement.

...read moreread less

3,599 citations

Journal Article•

Determining what individual SUS scores mean: adding an adjective rating scale

[...]

Aaron Bangor¹, Philip Kortum², James T. Miller¹•Institutions (2)

AT&T Labs¹, Rice University²

01 May 2009-Journal of Usability Studies archive

TL;DR: A seven-point adjective-anchored Likert scale was added as an eleventh question to nearly 1,000 System Usability Scale (SUS) surveys as mentioned in this paper.

...read moreread less

Abstract: The System Usability Scale (SUS) is an inexpensive, yet effective tool for assessing the usability of a product, including Web sites, cell phones, interactive voice response systems, TV applications, and more. It provides an easy-to-understand score from 0 (negative) to 100 (positive). While a 100-point scale is intuitive in many respects and allows for relative judgments, information describing how the numeric score translates into an absolute judgment of usability is not known. To help answer that question, a seven-point adjective-anchored Likert scale was added as an eleventh question to nearly 1,000 SUS surveys. Results show that the Likert scale scores correlate extremely well with the SUS scores (r=0.822). The addition of the adjective rating scale to the SUS may help practitioners interpret individual SUS scores and aid in explaining the results to non-human factors professionals.

...read moreread less

2,592 citations

Journal Article•DOI•

Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data

[...]

Steven J. Phillips¹, Miroslav Dudík², Jane Elith³, Catherine H. Graham⁴, Anthony Lehmann⁵, John R. Leathwick⁶, Simon Ferrier - Show less +3 more•Institutions (6)

AT&T Labs¹, Princeton University², University of Melbourne³, Stony Brook University⁴, University of Geneva⁵, National Institute of Water and Atmospheric Research⁶

01 Jan 2009-Ecological Applications

TL;DR: It is argued that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions and as large an effect on predictive performance as the choice of modeling method.

...read moreread less

Abstract: Most methods for modeling species distributions from occurrence records require additional data representing the range of environmental conditions in the modeled region. These data, called background or pseudo-absence data, are usually drawn at random from the entire region, whereas occurrence collection is often spatially biased toward easily accessed areas. Since the spatial bias generally results in environmental bias, the difference between occurrence collection and background sampling may lead to inaccurate models. To correct the estimation, we propose choosing background data with the same bias as occurrence data. We investigate theoretical and practical implications of this approach. Accurate information about spatial bias is usually lacking, so explicit biased sampling of background sites may not be possible. However, it is likely that an entire target group of species observed by similar methods will share similar bias. We therefore explore the use of all occurrences within a target group as biased background data. We compare model performance using target-group background and randomly sampled background on a comprehensive collection of data for 226 species from diverse regions of the world. We find that target-group background improves average performance for all the modeling methods we consider, with the choice of background data having as large an effect on predictive performance as the choice of modeling method. The performance improvement due to target-group background is greatest when there is strong bias in the target-group presence records. Our approach applies to regression-based modeling methods that have been adapted for use with occurrence data, such as generalized linear or additive models and boosted regression trees, and to Maxent, a probability density estimation method. We argue that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions.

...read moreread less

2,307 citations

Journal Article•DOI•

Networked MIMO with clustered linear precoding

[...]

Jun Zhang¹, Chen Runhua², Jeffrey G. Andrews¹, Arunabha Ghosh³, Robert W. Heath¹ - Show less +1 more•Institutions (3)

University of Texas at Austin¹, Texas Instruments², AT&T Labs³

01 Apr 2009-IEEE Transactions on Wireless Communications

TL;DR: In this article, a clustered base transceiver station (BTS) coordination strategy is proposed for a large cellular MIMO network, which includes full intra-cluster coordination to enhance the sum rate and limited inter-clusters coordination to reduce interference for the cluster edge users.

...read moreread less

Abstract: A clustered base transceiver station (BTS) coordination strategy is proposed for a large cellular MIMO network, which includes full intra-cluster coordination-to enhance the sum rate-and limited inter-cluster coordination-to reduce interference for the cluster edge users. Multi-cell block diagonalization is used to coordinate the transmissions across multiple BTSs in the same cluster. To satisfy per-BTS power constraints, three combined precoder and power allocation algorithms are proposed with different performance and complexity tradeoffs. For inter-cluster coordination, the coordination area is chosen to balance fairness for edge users and the achievable sum rate. It is shown that a small cluster size (about 7 cells) is sufficient to obtain most of the sum rate benefits from clustered coordination while greatly relieving channel feedback requirement. Simulations show that the proposed coordination strategy efficiently reduces interference and provides a considerable sum rate gain for cellular MIMO networks.

...read moreread less

592 citations

Journal Article•DOI•

Integrating conflicting data: the role of source dependence

[...]

Xin Luna Dong¹, Laure Berti-Equille², Divesh Srivastava¹•Institutions (2)

AT&T Labs¹, University of Rennes²

01 Aug 2009

TL;DR: This paper applies Bayesian analysis to decide dependence between sources and design an algorithm that iteratively detects dependence and discovers truth from conflicting information and extends the model by considering accuracy of data sources and similarity between values.

...read moreread less

Abstract: Many data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, require integrating data from multiple sources. Each of these sources provides a set of values and different sources can often provide conflicting values. To present quality data to users, it is critical that data integration systems can resolve conflicts and discover true values. Typically, we expect a true value to be provided by more sources than any particular false one, so we can take the value provided by the majority of the sources as the truth. Unfortunately, a false value can be spread through copying and that makes truth discovery extremely tricky. In this paper, we consider how to find true values from conflicting information when there are a large number of sources, among which some may copy from others.We present a novel approach that considers dependence between data sources in truth discovery. Intuitively, if two data sources provide a large number of common values and many of these values are rarely provided by other sources (e.g., particular false values), it is very likely that one copies from the other. We apply Bayesian analysis to decide dependence between sources and design an algorithm that iteratively detects dependence and discovers truth from conflicting information. We also extend our model by considering accuracy of data sources and similarity between values. Our experiments on synthetic data as well as real-world data show that our algorithm can significantly improve accuracy of truth discovery and is scalable when there are a large number of data sources.

...read moreread less

439 citations

Proceedings Article•DOI•

On the leakage of personally identifiable information via online social networks

[...]

Balachander Krishnamurthy¹, Craig E. Wills²•Institutions (2)

AT&T Labs¹, Worcester Polytechnic Institute²

17 Aug 2009

TL;DR: This research shows that it is possible for third-parties to link PII, which is leaked via OSNs, with user actions both within OSN sites and else-where on non-OSN sites.

...read moreread less

Abstract: For purposes of this paper, we define "Personally identifiable information" (PII) as information which can be used to distinguish or trace an individual's identity either alone or when combined with other information that is linkable to a specific individual. The popularity of Online Social Networks (OSN) has accelerated the appearance of vast amounts of personal information on the Internet. Our research shows that it is possible for third-parties to link PII, which is leaked via OSNs, with user actions both within OSN sites and else-where on non-OSN sites. We refer to this ability to link PII and combine it with other information as "leakage". We have identified multiple ways by which such leakage occurs and discuss measures to prevent it.

...read moreread less

389 citations

Proceedings Article•DOI•

Spatio-temporal compressive sensing and internet traffic matrices

[...]

Yin Zhang¹, Matthew Roughan², Walter Willinger³, Lili Qiu¹•Institutions (3)

University of Texas at Austin¹, University of Adelaide², AT&T Labs³

16 Aug 2009

TL;DR: This work develops a novel spatio-temporal compressive sensing framework with two key components: a new technique called Sparsity Regularized Matrix Factorization (SRMF) that leverages the sparse or low-rank nature of real-world traffic matrices and their spatio/temporal properties, and a mechanism for combining low- rank approximations with local interpolation procedures.

...read moreread less

Abstract: Many basic network engineering tasks (e.g., traffic engineering, capacity planning, anomaly detection) rely heavily on the availability and accuracy of traffic matrices. However, in practice it is challenging to reliably measure traffic matrices. Missing values are common. This observation brings us into the realm of compressive sensing, a generic technique for dealing with missing values that exploits the presence of structure and redundancy in many real-world systems. Despite much recent progress made in compressive sensing, existing compressive-sensing solutions often perform poorly for traffic matrix interpolation, because real traffic matrices rarely satisfy the technical conditions required for these solutions.To address this problem, we develop a novel spatio-temporal compressive sensing framework with two key components: (i) a new technique called Sparsity Regularized Matrix Factorization (SRMF) that leverages the sparse or low-rank nature of real-world traffic matrices and their spatio-temporal properties, and (ii) a mechanism for combining low-rank approximations with local interpolation procedures. We illustrate our new framework and demonstrate its superior performance in problems involving interpolation with real traffic matrices where we can successfully replace up to 98% of the values. Evaluation in applications such as network tomography, traffic prediction, and anomaly detection confirms the flexibility and effectiveness of our approach.

...read moreread less

364 citations

Proceedings Article•DOI•

Understanding online social network usage from a network perspective

[...]

Fabian Schneider¹, Anja Feldmann¹, Balachander Krishnamurthy², Walter Willinger²•Institutions (2)

Technical University of Berlin¹, AT&T Labs²

04 Nov 2009

TL;DR: This paper characterization of user interactions within the OSN for four different OSNs (Facebook, LinkedIn, Hi5, and StudiVZ) focuses on feature popularity, session characteristics, and the dynamics within OSN sessions.

...read moreread less

Abstract: Online Social Networks (OSNs) have already attracted more than half a billion users. However, our understanding of which OSN features attract and keep the attention of these users is poor. Studies thus far have relied on surveys or interviews of OSN users or focused on static properties, e. g., the friendship graph, gathered via sampled crawls. In this paper, we study how users actually interact with OSNs by extracting clickstreams from passively monitored network traffic. Our characterization of user interactions within the OSN for four different OSNs (Facebook, LinkedIn, Hi5, and StudiVZ) focuses on feature popularity, session characteristics, and the dynamics within OSN sessions. We find, for example, that users commonly spend more than half an hour interacting with the OSNs while the byte contributions per OSN session are relatively small.

...read moreread less

314 citations

Journal Article•DOI•

Truth discovery and copying detection in a dynamic world

[...]

Xin Luna Dong¹, Laure Berti-Equille², Divesh Srivastava¹•Institutions (2)

AT&T Labs¹, University of Rennes²

01 Aug 2009

TL;DR: A Hidden Markov Model that decides whether a source is a copier of another source and identifies the specific moments at which it copies is developed, and a Bayesian model that aggregates information from the sources to decide the true value for a data item, and the evolution of the true values over time is developed.

...read moreread less

Abstract: Modern information management applications often require integrating data from a variety of data sources, some of which may copy or buy data from other sources. When these data sources model a dynamically changing world (e.g., people's contact information changes over time, restaurants open and go out of business), sources often provide out-of-date data. Errors can also creep into data when sources are updated often. Given out-of-date and erroneous data provided by different, possibly dependent, sources, it is challenging for data integration systems to provide the true values. Straightforward ways to resolve such inconsistencies (e.g., voting) may lead to noisy results, often with detrimental consequences.In this paper, we study the problem of finding true values and determining the copying relationship between sources, when the update history of the sources is known. We model the quality of sources over time by their coverage, exactness and freshness. Based on these measures, we conduct a probabilistic analysis. First, we develop a Hidden Markov Model that decides whether a source is a copier of another source and identifies the specific moments at which it copies. Second, we develop a Bayesian model that aggregates information from the sources to decide the true value for a data item, and the evolution of the true values over time. Experimental results on both real-world and synthetic data show high accuracy and scalability of our techniques.

...read moreread less

248 citations

Journal Article•DOI•

Class-based graph anonymization for social network data

[...]

Smriti Bhagat¹, Graham Cormode², Balachander Krishnamurthy², Divesh Srivastava²•Institutions (2)

Rutgers University¹, AT&T Labs²

01 Aug 2009

TL;DR: This work presents a new set of techniques for anonymizing social network data based on grouping the entities into classes, and masking the mapping between entities and the nodes that represent them in the anonymized graph.

...read moreread less

Abstract: The recent rise in popularity of social networks, such as Facebook and MySpace, has created large quantities of data about interactions within these networks. Such data contains many private details about individuals so anonymization is required prior to attempts to make the data more widely available for scientific research. Prior work has considered simple graph data to be anonymized by removing all non-graph information and adding or deleting some edges. Since social network data is richer in details about the users and their interactions, loss of details due to anonymization limits the possibility for analysis. We present a new set of techniques for anonymizing social network data based on grouping the entities into classes, and masking the mapping between entities and the nodes that represent them in the anonymized graph. Our techniques allow queries over the rich data to be evaluated with high accuracy while guaranteeing resilience to certain types of attack. To prevent inference of interactions, we rely on a critical "safety condition" when forming these classes. We demonstrate utility via empirical data from social networking settings. We give examples of complex queries that may be posed and show that they can be answered over the anonymized data efficiently and accurately.

...read moreread less

242 citations

Journal Article•DOI•

On unbiased sampling for unstructured peer-to-peer networks

[...]

Daniel Stutzbach, Reza Rejaie¹, Nick Duffield², Subhabrata Sen², Walter Willinger² - Show less +1 more•Institutions (2)

University of Oregon¹, AT&T Labs²

01 Apr 2009-IEEE ACM Transactions on Networking

TL;DR: The Metropolized Random Walk with Backtracking (MRWB) is proposed as a viable and promising technique for collecting nearly unbiased samples and an extensive simulation study is conducted to demonstrate that the technique works well for a wide variety of commonly-encountered peer-to-peer network conditions.

...read moreread less

Abstract: This paper presents a detailed examination of how the dynamic and heterogeneous nature of real-world peer-to-peer systems can introduce bias into the selection of representative samples of peer properties (e.g., degree, link bandwidth, number of files shared). We propose the Metropolized Random Walk with Backtracking (MRWB) as a viable and promising technique for collecting nearly unbiased samples and conduct an extensive simulation study to demonstrate that our technique works well for a wide variety of commonly-encountered peer-to-peer network conditions. We have implemented the MRWB algorithm for selecting peer addresses uniformly at random into a tool called ion-sampler. Using the Gnutella network, we empirically show that ion-sampler. yields more accurate samples than tools that rely on commonly-used sampling techniques and results in dramatic improvements in efficiency and scalability compared to performing a full crawl.

...read moreread less

Journal Article•DOI•

A random key based genetic algorithm for the resource constrained project scheduling problem

[...]

Jorge José de Magalhães Mendes¹, José Fernando Gonçalves², Mauricio G. C. Resende³•Institutions (3)

Instituto Superior de Engenharia do Porto¹, University of Porto², AT&T Labs³

01 Jan 2009-Computers & Operations Research

TL;DR: This paper presents a genetic algorithm for the Resource Constrained Project Scheduling Problem (RCPSP) using a heuristic priority rule in which the priorities of the activities are defined by the genetic algorithm.

...read moreread less

Proceedings Article•DOI•

Semantics of Ranking Queries for Probabilistic Data and Expected Ranks

[...]

Graham Cormode¹, Feifei Li², Ke Yi³•Institutions (3)

AT&T Labs¹, Florida State University², Hong Kong University of Science and Technology³

29 Mar 2009

TL;DR: This work is able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query, and provides efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty.

...read moreread less

Abstract: When dealing with massive quantities of data, top-k queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the top-k is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a top-k over deterministic data. Specifically, we define a number of fundamental properties, including exact-k, containment, unique-rank, value-invariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the well-founded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. For an uncertain relation of N tuples, the processing cost is O(N logN)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the top-k has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach.

...read moreread less

Journal Article•DOI•

Data fusion: resolving data conflicts for integration

[...]

Xin Luna Dong¹, Felix Naumann²•Institutions (2)

AT&T Labs¹, Hasso Plattner Institute²

01 Aug 2009

TL;DR: Modern data management applications often require integrating available data sources and providing a uniform interface for users to access data from different sources, and such requirements have been driving fruitful research on data integration over the last two decades.

...read moreread less

Abstract: The amount of information produced in the world increases by 30% every year and this rate will only go up. With advanced network technology, more and more sources are available either over the Internet or in enterprise intranets. Modern data management applications, such as setting up Web portals, managing enterprise data, managing community data, and sharing scientific data, often require integrating available data sources and providing a uniform interface for users to access data from different sources; such requirements have been driving fruitful research on data integration over the last two decades [11, 13].

...read moreread less

Journal Article•DOI•

An annotated bibliography of GRASP – Part I: Algorithms

[...]

Paola Festa, Mauricio G. C. Resende¹•Institutions (1)

AT&T Labs¹

01 Jan 2009-International Transactions in Operational Research

TL;DR: Algorithmic aspects of GRASP, a greedy randomized adaptive search procedure for combinatorial optimization, are covered, including construction phase and local search phase.

...read moreread less

Proceedings Article•DOI•

IXPs: mapped?

[...]

Brice Augustin¹, Balachander Krishnamurthy², Walter Willinger²•Institutions (2)

Pierre-and-Marie-Curie University¹, AT&T Labs²

04 Nov 2009

TL;DR: An Internet-wide traceroute study that was specifically designed to shed light on the unknown IXP-specific peering matrices and involves targeted traceroutes from publicly available and geographically dispersed vantage points is reported on.

...read moreread less

Abstract: Internet exchange points (IXPs) are an important ingredient of the Internet AS-level ecosystem - a logical fabric of the Internet made up of about 30,000 ASes and their mutual business relationships whose primary purpose is to control and manage the flow of traffic. Despite the IXPs' critical role in this fabric, little is known about them in terms of their peering matrices (i.e., who peers with whom at which IXP) and corresponding traffic matrices (i.e., how much traffic do the different ASes that peer at an IXP exchange with one another). In this paper, we report on an Internet-wide traceroute study that was specifically designed to shed light on the unknown IXP-specific peering matrices and involves targeted traceroutes from publicly available and geographically dispersed vantage points. Based on our method, we were able to discover and validate the existence of about 44K IXP-specific peering links - nearly 18K more links than were previously known. In the process, we also classified all known IXPs depending on the type of information required to detect them. Moreover, in view of the currently used inferred AS-level maps of the Internet that are known to miss a significant portion of the actual AS relationships of the peer-to-peer type, our study provides a new method for augmenting these maps with IXP-related peering links in a systematic and informed manner.

...read moreread less

Proceedings Article•

Making routers last longer with ViAggre

[...]

Hitesh Ballani¹, Paul Francis¹, Tuan Cao¹, Jia Wang²•Institutions (2)

Cornell University¹, AT&T Labs²

22 Apr 2009

TL;DR: The application of ViAggre to a few tier- 1 and tier-2 ISPs is evaluated and it is shown that it can reduce the routing table on routers by an order of magnitude while imposing almost no traffic stretch and negligible load increase across the routers.

...read moreread less

Abstract: This paper presents ViAggre (Virtual Aggregation), a "configuration-only" approach to shrinking the routing table on routers. ViAggre does not require any changes to router software and routing protocols and can be deployed independently and autonomously by any ISP. ViAggre is effectively a scalability technique that allows an ISP to modify its internal routing such that individual routers in the ISP's network only maintain a part of the global routing table. We evaluate the application of ViAggre to a few tier- 1 and tier-2 ISPs and show that it can reduce the routing table on routers by an order of magnitude while imposing almost no traffic stretch and negligible load increase across the routers. We also deploy Virtual Aggregation on a testbed comprising of Cisco routers and benchmark this deployment. Finally, to understand and address concerns regarding the configuration overhead that our proposal entails, we implement a configuration tool that automates ViAggre configuration. While it remains to be seen whether most, if not all, of the management concerns can be eliminated through such automated tools, we believe that the simplicity of the proposal and its possible short-term impact on routing scalability suggest that it is an alternative worth considering.

...read moreread less

Journal Article•DOI•

Multi-Level, Multi-Dimensional Coding for High-Speed and High-Spectral-Efficiency Optical Transmission

[...]

Xiang Zhou¹, Jianjun Yu²•Institutions (2)

AT&T Labs¹, Princeton University²

12 May 2009-Journal of Lightwave Technology

TL;DR: In this article, single carrier based multi-level and multi-dimensional coding (ML-MDC) technologies have been demonstrated for spectrally efficient 100-Gb/s transmission.

...read moreread less

Abstract: We review and study several single carrier based multi-level and multi-dimensional coding (ML-MDC) technologies recently demonstrated for spectrally-efficient 100-Gb/s transmission. These include 16-ary PDM-QPSK, 64-ary PDM-8PSK, 64-ary PDM-8QAM as well as 256-ary PDM-16 QAM. We show that high-speed QPSK, 8PSK, 8QAM, and 16QAM can all be generated using commercially available optical modulators using only binary electrical drive signals through novel synthesis methods, and that all of these modulation formats can be detected using a universal receiver front-end and digital coherent detection. We show that the constant modulus algorithm (CMA), which is highly effective for blind polarization recovery of PDM-QPSK and PDM-8PSK signals, is much less effective for PDM-8QAM and PDM-16 QAM. We then present a recently proposed, cascaded multi-modulus algorithm for these cases. In addition to the DSP algorithms used for constellation recovery, we also describe a DSP algorithm to improve the performance of a coherent receiver using single-ended photo-detection. The system impact of ASE noise, laser phase noise, narrowband optical filtering and fiber nonlinear effects has been investigated. For high-level modulation formats using full receiver-side digital compensation, it is shown that the requirement on LO phase noise is more stringent than the signal laser. We also show that RZ pulse shaping significantly improves filter- and fiber-nonlinear tolerance. Finally we present three high-spectral-efficiency and high-speed DWDM transmission experiments implementing these ML-MDC technologies.

...read moreread less

Journal Article•DOI•

An annotated bibliography of GRASP–Part II: Applications

[...]

Paola Festa¹, Mauricio G. C. Resende²•Institutions (2)

University of Naples Federico II¹, AT&T Labs²

01 Mar 2009-International Transactions in Operational Research

TL;DR: The literature where GRASP is applied to scheduling, routing, logic, partitioning, location, graph theory, assignment, manufacturing, transportation, telecommunications, biology and related fields, automatic drawing, power systems, and VLSI design is covered.

...read moreread less

Proceedings Article•DOI•

Using static analysis for Ajax intrusion detection

[...]

Arjun Guha¹, Shriram Krishnamurthi¹, Trevor Jim²•Institutions (2)

Brown University¹, AT&T Labs²

20 Apr 2009

TL;DR: This analysis tackles numerous challenges posed by modern web applications including asynchronous communication, frameworks, and dynamic code generation, and builds an intrusion-prevention proxy for the server that intercepts client requests and disables those that do not meet the expected behavior.

...read moreread less

Abstract: We present a static control-flow analysis for JavaScript programs running in a web browser. Our analysis tackles numerous challenges posed by modern web applications including asynchronous communication, frameworks, and dynamic code generation. We use our analysis to extract a model of expected client behavior as seen from the server, and build an intrusion-prevention proxy for the server: the proxy intercepts client requests and disables those that do not meet the expected behavior. We insert random asynchronous requests to foil mimicry attacks. Finally, we evaluate our technique against several real applications and show that it protects against an attack in a widely-used web application.

...read moreread less

Proceedings Article•

The case for enterprise-ready virtual private clouds

[...]

Timothy Wood¹, Alexandre Gerber², Kadangode K. Ramakrishnan², Prashant Shenoy¹, Jacobus Van der Merwe² - Show less +1 more•Institutions (2)

University of Massachusetts Amherst¹, AT&T Labs²

15 Jun 2009

TL;DR: This work argues that cloud computing has a great potential to change how enterprises run and manage their IT systems, but that to achieve this, more comprehensive control over network resources and security need to be provided for users.

...read moreread less

Abstract: Cloud computing platforms such as Amazon EC2 provide customers with flexible, on demand resources at low cost. However, while existing offerings are useful for providing basic computation and storage resources, they fail to provide the security and network controls that many customers would like. In this work we argue that cloud computing has a great potential to change how enterprises run and manage their IT systems, but that to achieve this, more comprehensive control over network resources and security need to be provided for users. Towards this goal, we propose CloudNet, a cloud platform architecture which utilizes virtual private networks to securely and seamlessly link cloud and enterprise sites.

...read moreread less

Proceedings Article•DOI•

Towards automated performance diagnosis in a large IPTV network

[...]

Ajay Mahimkar¹, Zihui Ge², Aman Shaikh², Jia Wang², Jennifer Yates², Yin Zhang¹, Qi Zhao² - Show less +3 more•Institutions (2)

University of Texas at Austin¹, AT&T Labs²

16 Aug 2009

TL;DR: This paper focuses on characterizing and troubleshooting performance issues in one of the largest IPTV networks in North America, and develops a novel diagnosis tool called Giza that is specifically tailored to the enormous scale and hierarchical structure of the IPTV network.

...read moreread less

Abstract: IPTV is increasingly being deployed and offered as a commercial service to residential broadband customers. Compared with traditional ISP networks, an IPTV distribution network (i) typically adopts a hierarchical instead of mesh-like structure, (ii) imposes more stringent requirements on both reliability and performance, (iii) has different distribution protocols (which make heavy use of IP multicast) and traffic patterns, and (iv) faces more serious scalability challenges in managing millions of network elements. These unique characteristics impose tremendous challenges in the effective management of IPTV network and service.In this paper, we focus on characterizing and troubleshooting performance issues in one of the largest IPTV networks in North America. We collect a large amount of measurement data from a wide range of sources, including device usage and error logs, user activity logs, video quality alarms, and customer trouble tickets. We develop a novel diagnosis tool called Giza that is specifically tailored to the enormous scale and hierarchical structure of the IPTV network. Giza applies multi-resolution data analysis to quickly detect and localize regions in the IPTV distribution hierarchy that are experiencing serious performance problems. Giza then uses several statistical data mining techniques to troubleshoot the identified problems and diagnose their root causes. Validation against operational experiences demonstrates the effectiveness of Giza in detecting important performance issues and identifying interesting dependencies. The methodology and algorithms in Giza promise to be of great use in IPTV network operations.

...read moreread less

Proceedings Article•DOI•

Geo-Centric Language Models for Local Business Voice Search

[...]

Amanda Stent¹, Ilija Zeljkovic¹, Diamantino Caseiro¹, Jay Wilpon¹•Institutions (1)

AT&T Labs¹

31 May 2009

TL;DR: A new algorithm for geo-centric language model generation for local business voice search for mobile users that provides a language model for any user in any location, and the geographic area covered by the language model is adapted to the local business density, giving high recognition accuracy.

...read moreread less

Abstract: Voice search is increasingly popular, especially for local business directory assistance. However, speech recognition accuracy on business listing names is still low, leading to user frustration. In this paper, we present a new algorithm for geo-centric language model generation for local business voice search for mobile users. Our algorithm has several advantages: it provides a language model for any user in any location; the geographic area covered by the language model is adapted to the local business density, giving high recognition accuracy; and the language models can be pre-compiled, giving fast recognition time. In an experiment using spoken business listing name queries from a business directory assistance service, we achieve a 16.8% absolute improvement in recognition accuracy and a 3-fold speedup in recognition time with geocentric language models when compared with a nationwide language model.

...read moreread less

Journal Article•DOI•

Minimizing movement

[...]

Erik D. Demaine¹, MohammadTaghi Hajiaghayi², Hamid Mahini³, Amin S. Sayedi-Roshkhar³, Shayan Oveisgharan³, Morteza Zadimoghaddam³ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, AT&T Labs², Sharif University of Technology³

04 Jul 2009-ACM Transactions on Algorithms

TL;DR: This work gives approximation algorithms and inapproximability results for a class of movement problems that involve planning the coordinated motion of a large collection of objects to achieve a global property of the network while minimizing the maximum or average movement.

...read moreread less

Abstract: We give approximation algorithms and inapproximability results for a class of movement problems. In general, these problems involve planning the coordinated motion of a large collection of objects (representing anything from a robot swarm or firefighter team to map labels or network messages) to achieve a global property of the network while minimizing the maximum or average movement. In particular, we consider the goals of achieving connectivity (undirected and directed), achieving connectivity between a given pair of vertices, achieving independence (a dispersion problem), and achieving a perfect matching (with applications to multicasting). This general family of movement problems encompasses an intriguing range of graph and geometric algorithms, with several real-world applications and a surprising range of approximability. In some cases, we obtain tight approximation and inapproximability results using direct techniques (without use of PCP), assuming just that P ≠ NP.

...read moreread less

Journal Article•DOI•

Ricean $K$ -Factors in Narrow-Band Fixed Wireless Channels: Theory, Experiments, and Statistical Models

[...]

Larry J. Greenstein¹, S.S. Ghassemzadeh², V. Erceg³, David G. Michelson⁴•Institutions (4)

Rutgers University¹, AT&T Labs², Broadcom³, University of British Columbia⁴

24 Mar 2009-IEEE Transactions on Vehicular Technology

TL;DR: K is found to be lognormal, with the median being a simple function of season, antenna height, antenna beamwidth, and distance and with a standard deviation of 8 dB, and plausible physical arguments to explain these observations are presented.

...read moreread less

Abstract: Fixed wireless channels in suburban macrocells are subject to fading due to scattering by moving objects such as windblown trees and foliage in the environment. When, as is often the case, the fading follows a Ricean distribution, the first-order statistics of fading are completely described by the corresponding average path gain and Ricean K-factor. Because such fading has important implications for the design of both narrow-band and wideband multipoint communication systems that are deployed in such environments, it must be well characterized. We conducted a set of 1.9-GHz experiments in suburban macrocell environments to generate a collective database from which we could construct a simple model for the probability distribution of K as experienced by fixed wireless users. Specifically, we find K to be lognormal, with the median being a simple function of season, antenna height, antenna beamwidth, and distance and with a standard deviation of 8 dB. We also present plausible physical arguments to explain these observations, elaborate on the variability of K with time, frequency, and location, and show the strong influence of wind conditions on K.

...read moreread less

Journal Article•DOI•

Certification of an optimal TSP tour through 85,900 cities

[...]

David Applegate¹, Robert E. Bixby², Vašek Chvátal³, William J. Cook⁴, Daniel Espinoza⁵, Marcos Goycoolea⁶, Keld Helsgaun⁷ - Show less +3 more•Institutions (7)

AT&T Labs¹, Saint Petersburg State University², Concordia University³, Georgia Institute of Technology⁴, University of Chile⁵, Adolfo Ibáñez University⁶, Roskilde University⁷

01 Jan 2009-Operations Research Letters

TL;DR: A computer code and data are described that together certify the optimality of a solution to the 85,900-city traveling salesman problem pla85900, the largest instance in the TSPLIB collection of challenge problems.

...read moreread less

Monograph•DOI•

The Shortest Path Problem: Ninth DIMACS Implementation Challenge

[...]

Camil Demetrescu¹, Andrew V. Goldberg², David S. Johnson³•Institutions (3)

Sapienza University of Rome¹, Microsoft², AT&T Labs³

28 Jul 2009

TL;DR: The purpose of the shortest-path algorithm is to calculate the distance of the longest path from v to each of the reachable nodes of the graph, which is the sum of the weights of the edges on the path.

...read moreread less

Abstract: Shortest path problems are among the most fundamental combinatorial optimization problems with many applications, both direct and as subroutines. They arise naturally in a remarkable number of real-world settings. A limited list includes transportation planning, network optimization, packet routing, image segmentation, speech recognition, document formatting, robotics, compilers, traffic information systems, and dataflow analysis. Shortest path algorithms have been studied since the 1950's and still remain an active area of research. This volume reports on the research carried out by participants during the Ninth DIMACS Implementation Challenge, which led to several improvements of the state of the art in shortest path algorithms. The infrastructure developed during the Challenge facilitated further research in the area, leading to substantial follow-up work as well as to better and more uniform experimental standards. The results of the Challenge included new cutting-edge techniques for emerging applications such as GPS navigation systems, providing experimental evidence of the most effective algorithms in several real-world settings.

...read moreread less

Book Chapter•DOI•

A cost-sensitive adaptation engine for server consolidation of multitier applications

[...]

Gueyoung Jung¹, Kaustubh Joshi², Matti Hiltunen², Richard D. Schlichting², Calton Pu¹ - Show less +1 more•Institutions (2)

Georgia Institute of Technology¹, AT&T Labs²

30 Nov 2009

TL;DR: Extensive experimental results show that the technique is able to maximize SLA fulfillment under typical time-of-day workload variations as well as flash crowds, and that it exhibits significantly improved transient behavior compared to approaches that do not account for adaptation costs.

...read moreread less

Abstract: Virtualization-based server consolidation requires runtime resource reconfiguration to ensure adequate application isolation and performance, especially for multitier services that have dynamic, rapidly changing workloads and responsiveness requirements. While virtualization makes reconfiguration easy, indiscriminate use of adaptations such as VM replication, VM migration, and capacity controls has performance implications. This paper demonstrates that ignoring these costs can have significant impacts on the ability to satisfy response-time-based SLAs, and proposes a solution in the form of a cost-sensitive adaptation engine that weighs the potential benefits of runtime reconfiguration decisions against their costs. Extensive experimental results based on live workload traces show that the technique is able to maximize SLA fulfillment under typical time-of-day workload variations as well as flash crowds, and that it exhibits significantly improved transient behavior compared to approaches that do not account for adaptation costs.

...read moreread less

Proceedings Article•DOI•

Network-aware forward caching

[...]

Jeffrey Erman¹, Alexandre Gerber¹, MohammadTaghi Hajiaghayi¹, Dan Pei¹, Oliver Spatscheck¹ - Show less +1 more•Institutions (1)

AT&T Labs¹

20 Apr 2009

TL;DR: It is shown in the simulation that a 37% increase to net benefits could be achieved over the standard method of full cache deployment to cache all POPs traffic, and that CDN traffic is much more efficient than P2P content and that there is large skew in the Air Miles between POP in a typical network.

...read moreread less

Abstract: This paper proposes and evaluates a Network Aware Forward Caching approach for determining the optimal deployment strategy of forward caches to a network. A key advantage of this approach is that we can reduce the network costs associated with forward caching to maximize the benefit obtained from their deployment. We show in our simulation that a 37% increase to net benefits could be achieved over the standard method of full cache deployment to cache all POPs traffic. In addition, we show that this maximal point occurs when only 68% of the total traffic is cached.Another contribution of this paper is the analysis we use to motivate and evaluate this problem. We characterize the Internet traffic of 100K subscribers of a US residential broadband provider. We use both layer 4 and layer 7 analysis to investigate the traffic volumes of the flows as well as study the general characteristics of the applications used. We show that HTTP is a dominant protocol and account for 68% of the total downstream traffic and that 34% of that traffic is multimedia. In addition, we show that multimedia content using HTTP exhibits a 83% annualized growth rate and other HTTP traffic has a 53% growth rate versus the 26% over all annual growth rate of broadband traffic. This shows that HTTP traffic will become ever more dominent and increase the potential caching opportunities. Furthermore, we characterize the core backbone traffic of this broadband provider to measure the distance travelled by content and traffic. We find that CDN traffic is much more efficient than P2P content and that there is large skew in the Air Miles between POP in a typical network. Our findings show that there are many opportunties in broadband provider networks to optimize how traffic is delivered and cached.

...read moreread less

Proceedings Article•DOI•

Modeling channel popularity dynamics in a large IPTV system

[...]

Tongqing Qiu¹, Zihui Ge², Seungjoon Lee², Jia Wang², Qi Zhao², Jun Xu¹ - Show less +2 more•Institutions (2)

Georgia Institute of Technology¹, AT&T Labs²

15 Jun 2009

TL;DR: This paper conducts in-depth analysis on channel popularity on a large collection of user channel access data from a nation-wide commercial IPTV network and chooses a stochastic model that finds good matches in all attributes of interest with respect to the channel popularity.

...read moreread less

Abstract: Understanding the channel popularity or content popularity is an important step in the workload characterization for modern information distribution systems (e.g., World Wide Web, peer-to-peer file-sharing systems, video-on-demand systems).In this paper, we focus on analyzing the channel popularity in the context of Internet Protocol Television (IPTV). In particular, we aim at capturing two important aspects of channel popularity - the distribution and temporal dynamics of the channel popularity. We conduct in-depth analysis on channel popularity on a large collection of user channel access data from a nation-wide commercial IPTV network. Based on the findings in our analysis, we choose a stochastic model that finds good matches in all attributes of interest with respect to the channel popularity. Furthermore, we propose a method to identify subsets of user population with inherently different channel interest.By tracking the change of population mixtures among different user classes, we extend our model to a multi-class population model, which enables us to capture the moderate diurnal popularity patterns exhibited in some channels. We also validate our channel popularity model using real user channel access data from commercial IPTV network.

...read moreread less

Collapse