scispace - formally typeset
Search or ask a question

Showing papers by "Carnegie Mellon University published in 2005"


Book
01 Jan 2005
TL;DR: This research presents a novel approach to planning and navigation algorithms that exploit statistics gleaned from uncertain, imperfect real-world environments to guide robots toward their goals and around obstacles.
Abstract: Planning and navigation algorithms exploit statistics gleaned from uncertain, imperfect real-world environments to guide robots toward their goals and around obstacles.

6,425 citations


Proceedings Article
01 Jun 2005
TL;DR: METEOR is described, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations and can be easily extended to include more advanced matching strategies.
Abstract: We describe METEOR, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations. Unigrams can be matched based on their surface forms, stemmed forms, and meanings; furthermore, METEOR can be easily extended to include more advanced matching strategies. Once all generalized unigram matches between the two strings have been found, METEOR computes a score for this matching using a combination of unigram-precision, unigram-recall, and a measure of fragmentation that is designed to directly capture how well-ordered the matched words in the machine translation are in relation to the reference. We evaluate METEOR by measuring the correlation between the metric scores and human judgments of translation quality. We compute the Pearson R correlation value between its scores and human quality assessments of the LDC TIDES 2003 Arabic-to-English and Chinese-to-English datasets. We perform segment-bysegment correlation, and show that METEOR gets an R correlation value of 0.347 on the Arabic data and 0.331 on the Chinese data. This is shown to be an improvement on using simply unigramprecision, unigram-recall and their harmonic F1 combination. We also perform experiments to show the relative contributions of the various mapping modules.

3,911 citations


Journal ArticleDOI
01 Jun 2005-Proteins
TL;DR: The development of a set of software applications that use the Data Model and its associated libraries, thus validating the approach and providing a pipeline for high‐throughput analysis of NMR data.
Abstract: To address data management and data exchange problems in the nuclear magnetic resonance (NMR) community, the Collaborative Computing Project for the NMR community (CCPN) created a "Data Model" that describes all the different types of information needed in an NMR structural study, from molecular structure and NMR parameters to coordinates. This paper describes the development of a set of software applications that use the Data Model and its associated libraries, thus validating the approach. These applications are freely available and provide a pipeline for high-throughput analysis of NMR data. Three programs work directly with the Data Model: CcpNmr Analysis, an entirely new analysis and interactive display program, the CcpNmr FormatConverter, which allows transfer of data from programs commonly used in NMR to and from the Data Model, and the CLOUDS software for automated structure calculation and assignment (Carnegie Mellon University), which was rewritten to interact directly with the Data Model. The ARIA 2.0 software for structure calculation (Institut Pasteur) and the QUEEN program for validation of restraints (University of Nijmegen) were extended to provide conversion of their data to the Data Model. During these developments the Data Model has been thoroughly tested and used, demonstrating that applications can successfully exchange data via the Data Model. The software architecture developed by CCPN is now ready for new developments, such as integration with additional software applications and extensions of the Data Model into other areas of research.

2,906 citations


Journal ArticleDOI
Joseph Adams1, Madan M. Aggarwal2, Zubayer Ahammed3, J. Amonett4  +363 moreInstitutions (46)
TL;DR: In this paper, the most important experimental results from the first three years of nucleus-nucleus collision studies at RHIC were reviewed, with emphasis on results of the STAR experiment.

2,750 citations


Proceedings ArticleDOI
21 Aug 2005
TL;DR: A new graph generator is provided, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
Abstract: How do real graphs evolve over time? What are "normal" growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in- and out-degree distributions, communities, small-world phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time.Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing super-linearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) or O(log(log n)).Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a "forest fire" spreading process, that has a simple, intuitive justification, requires very few parameters (like the "flammability" of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.

2,548 citations


Proceedings ArticleDOI
25 Jun 2005
TL;DR: A meta-algorithm is applied, based on a metric labeling formulation of the rating-inference problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels.
Abstract: We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must determine an author's evaluation with respect to a multi-point scale (e.g., one to five "stars"). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, "three stars" is intuitively closer to "four stars" than to "one star".We first evaluate human performance at the task. Then, we apply a meta-algorithm, based on a metric labeling formulation of the problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.

2,544 citations


Proceedings ArticleDOI
07 Nov 2005
TL;DR: This paper analyzes the online behavior of more than 4,000 Carnegie Mellon University students who have joined a popular social networking site catered to colleges and evaluates the amount of information they disclose and study their usage of the site's privacy settings.
Abstract: Participation in social networking sites has dramatically increased in recent years. Services such as Friendster, Tribe, or the Facebook allow millions of individuals to create online profiles and share personal information with vast networks of friends - and, often, unknown numbers of strangers. In this paper we study patterns of information revelation in online social networks and their privacy implications. We analyze the online behavior of more than 4,000 Carnegie Mellon University students who have joined a popular social networking site catered to colleges. We evaluate the amount of information they disclose and study their usage of the site's privacy settings. We highlight potential attacks on various aspects of their privacy, and we show that only a minimal percentage of users changes the highly permeable privacy preferences.

2,405 citations


Journal ArticleDOI
TL;DR: This review highlights consistent patterns in the literature associating positive affect (PA) and physical health and raises serious conceptual and methodological reservations, but suggests an association of trait PA and lower morbidity and of state and traitPA and decreased symptoms and pain.
Abstract: This review highlights consistent patterns in the literature associating positive affect (PA) and physical health. However, it also raises serious conceptual and methodological reservations. Evidence suggests an association of trait PA and lower morbidity and of state and trait PA and decreased symptoms and pain. Trait PA is also associated with increased longevity among older community-dwelling individuals. The literature on PA and surviving serious illness is inconsistent. Experimentally inducing intense bouts of activated state PA triggers short-term rises in physiological arousal and associated (potentially harmful) effects on immune, cardiovascular, and pulmonary function. However, arousing effects of state PA are not generally found in naturalistic ambulatory studies in which bouts of PA are typically less intense and often associated with health protective responses. A theoretical framework to guide further study is proposed.

1,890 citations


Journal ArticleDOI
TL;DR: In this article, the authors present an evolutionary model for starbursts, quasars, and spheroidal galaxies, in which mergers between gas-rich galaxies drive nuclear inflows of gas, producing intense starburst and feeding the buried growth of supermassive black holes (BHs) until feedback expels gas and renders a briefly visible optical quasar.
Abstract: We present an evolutionary model for starbursts, quasars, and spheroidal galaxies in which mergers between gas-rich galaxies drive nuclear inflows of gas, producing intense starbursts and feeding the buried growth of supermassive black holes (BHs) until feedback expels gas and renders a briefly visible optical quasar. The quasar lifetime and obscuring column density depend on both the instantaneous and peak luminosity of the quasar, and we determine this dependence using a large set of simulations of galaxy mergers varying host galaxy properties, orbital geometry, and gas physics. We use these fits to deconvolve observed quasar luminosity functions (LFs) and obtain the evolution of the formation rate of quasars with a certain peak luminosity, n(L_peak,z). Quasars spend extended periods of time at luminosities well below peak, and so n(L_peak) has a maximum corresponding to the 'break' in the observed LF, falling off at both brighter and fainter luminosities. From n(L_peak) and our simulation results, we obtain self-consistent fits to hard and soft X-ray and optical quasar LFs and predict many observables, including: column density distributions of optical and X-ray samples, the LF of broad-line quasars in X-ray samples and the broad-line fraction as a function of luminosity, active BH mass functions, the distribution of Eddington ratios at z~0-2, the z=0 mass function of relic BHs and total mass density of BHs, and the cosmic X-ray background. In every case, our predictions agree well with observed estimates, and unlike previous modeling attempts, we are able to reproduce them without invoking any ad hoc assumptions about source properties or distributions. We provide a library of Monte Carlo realizations of our models for comparison with observations. (Abridged)

1,820 citations


ReportDOI
TL;DR: Under this framework, it becomes clear why and where the “usual” volatility estimator fails when the returns are sampled at the highest frequencies, and a way of finding the optimal sampling frequency for any size of the noise.
Abstract: It is a common nancial practice to estimate volatility from the sum of frequently-sampled squared returns. However market microstructure poses challenge to this estimation approach, as evidenced by recent empirical studies in nance. This work attempts to lay out theoretical grounds that reconcile continuous-time modeling and discrete-time samples. We propose an estimation approach that takes advantage of the rich sources in tick-by-tick data while preserving the continuous-time assumption on the underlying returns. Under our framework, it becomes clear why and where the \usual" volatility estimator fails when the returns are sampled at the highest frequency.

1,724 citations


Proceedings Article
01 Jan 2005
TL;DR: TaintCheck as mentioned in this paper performs dynamic taint analysis by performing binary rewriting at run time, which can reliably detect most types of exploits and produces no false positives for any of the many different programs that were tested.
Abstract: Software vulnerabilities have had a devastating effect on the Internet. Worms such as CodeRed and Slammer can compromise hundreds of thousands of hosts within hours or even minutes, and cause millions of dollars of damage [26, 43]. To successfully combat these fast automatic Internet attacks, we need fast automatic attack detection and filtering mechanisms. In this paper we propose dynamic taint analysis for automatic detection of overwrite attacks, which include most types of exploits. This approach does not need source code or special compilation for the monitored program, and hence works on commodity software. To demonstrate this idea, we have implemented TaintCheck, a mechanism that can perform dynamic taint analysis by performing binary rewriting at run time. We show that TaintCheck reliably detects most types of exploits. We found that TaintCheck produced no false positives for any of the many different programs that we tested. Further, we describe how TaintCheck could improve automatic signature generation in

Proceedings ArticleDOI
17 Oct 2005
TL;DR: An efficient spectral method for finding consistent correspondences between two sets of features by using the principal eigenvector of M and imposing the mapping constraints required by the overall correspondence mapping.
Abstract: We present an efficient spectral method for finding consistent correspondences between two sets of features. We build the adjacency matrix M of a graph whose nodes represent the potential correspondences and the weights on the links represent pairwise agreements between potential correspondences. Correct assignments are likely to establish links among each other and thus form a strongly connected cluster. Incorrect correspondences establish links with the other correspondences only accidentally, so they are unlikely to belong to strongly connected clusters. We recover the correct assignments based on how strongly they belong to the main cluster of M, by using the principal eigenvector of M and imposing the mapping constraints required by the overall correspondence mapping (one-to-one or one-to-many). The experimental evaluation shows that our method is robust to outliers, accurate in terms of matching rate, while being much faster than existing methods

Journal ArticleDOI
TL;DR: This paper presents an online feature selection mechanism for evaluating multiple features while tracking and adjusting the set of features used to improve tracking performance, and notes susceptibility of the variance ratio feature selection method to distraction by spatially correlated background clutter.
Abstract: This paper presents an online feature selection mechanism for evaluating multiple features while tracking and adjusting the set of features used to improve tracking performance. Our hypothesis is that the features that best discriminate between object and background are also best for tracking the object. Given a set of seed features, we compute log likelihood ratios of class conditional sample densities from object and background to form a new set of candidate features tailored to the local object/background discrimination task. The two-class variance ratio is used to rank these new features according to how well they separate sample distributions of object and background pixels. This feature evaluation mechanism is embedded in a mean-shift tracking system that adaptively selects the top-ranked discriminative features for tracking. Examples are presented that demonstrate how this method adapts to changing appearances of both tracked object and scene background. We note susceptibility of the variance ratio feature selection method to distraction by spatially correlated background clutter and develop an additional approach that seeks to minimize the likelihood of distraction.

Journal ArticleDOI
TL;DR: This paper presents attacks against routing in ad hoc networks, and the design and performance evaluation of a new secure on-demand ad hoc network routing protocol, called Ariadne, which prevents attackers or compromised nodes from tampering with uncompromising routes consisting of uncompromised nodes.
Abstract: An ad hoc network is a group of wireless mobile computers (or nodes), in which individual nodes cooperate by forwarding packets for each other to allow nodes to communicate beyond direct wireless transmission range. Prior research in ad hoc networking has generally studied the routing problem in a non-adversarial setting, assuming a trusted environment. In this paper, we present attacks against routing in ad hoc networks, and we present the design and performance evaluation of a new secure on-demand ad hoc network routing protocol, called Ariadne. Ariadne prevents attackers or compromised nodes from tampering with uncompromised routes consisting of uncompromised nodes, and also prevents many types of Denial-of-Service attacks. In addition, Ariadne is efficient, using only highly efficient symmetric cryptographic primitives.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: This work treats object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics, and develops a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA).
Abstract: We seek to discover the object categories depicted in a set of unlabelled images. We achieve this using a model developed in the statistical text literature: probabilistic latent semantic analysis (pLSA). In text analysis, this is used to discover topics in a corpus using the bag-of-words document representation. Here we treat object categories as topics, so that an image containing instances of several categories is modeled as a mixture of topics. The model is applied to images by using a visual analogue of a word, formed by vector quantizing SIFT-like region descriptors. The topic discovery approach successfully translates to the visual domain: for a small set of objects, we show that both the object categories and their approximate spatial layout are found without supervision. Performance of this unsupervised method is compared to the supervised approach of Fergus et al. (2003) on a set of unseen images containing only one object per image. We also extend the bag-of-words vocabulary to include 'doublets' which encode spatially local co-occurring regions. It is demonstrated that this extended vocabulary gives a cleaner image segmentation. Finally, the classification and segmentation methods are applied to a set of images containing multiple objects per image. These results demonstrate that we can successfully build object class models from an unsupervised analysis of images.

Book ChapterDOI
22 Jun 2005
TL;DR: In this paper, the authors describe the Monte Carlo method for the simulation of grain growth and recrystallization, and present a small subset of the broader use of Monte Carlo methods for which an excellent overview can be found in the book.
Abstract: This chapter is aimed at describing the Monte Carlo method for the simulation of grain growth and recrystallization. It has also been extended to phase transformations and hybrid versions (Monte Carlo coupled with Cellular Automaton) of the model can also accommodate diffusion. If reading this chapter inspires you to program your own version of the algorithm and try to solve some problems, then we will have succeeded! The method is simple to implement and it is fairly straightforward to apply variable material properties such as anisotropic grain boundary energy and mobility. There are, however, some important limitations of the method that must be kept in mind. These limitations include an inherent lattice anisotropy that manifests itself in various ways. For many purposes, however, if you pay attention to what has been found to previous work, the model is robust and highly efficient from a computational perspective. In many circumstances, it is best to use the model to gain insight into a physical system and then obtain a new theoretical understanding, in preference to interpreting the results as being directly representative of a particular material. Please also keep in mind that the “Monte Carlo Method” described herein is a small subset of the broader use of Monte Carlo methods for which an excellent overview can be found in the book by Landau and Binder (2000).

Proceedings Article
05 Dec 2005
TL;DR: The correlated topic model (CTM) is developed, where the topic proportions exhibit correlation via the logistic normal distribution and a mean-field variational inference algorithm is derived for approximate posterior inference in this model, which is complicated by the fact that the Logistic normal is not conjugate to the multinomial.
Abstract: Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than x-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [1]. We derive a mean-field variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. The CTM gives a better fit than LDA on a collection of OCRed articles from the journal Science. Furthermore, the CTM provides a natural way of visualizing and exploring this and other unstructured data sets.

Journal ArticleDOI
01 Jan 2005
TL;DR: This research suggests that consumers often lack enough information to make privacy-sensitive decisions and, even with sufficient information, are likely to trade off long-term privacy for short-term benefits.
Abstract: Traditional theory suggests consumers should be able to manage their privacy. Yet, empirical and theoretical research suggests that consumers often lack enough information to make privacy-sensitive decisions and, even with sufficient information, are likely to trade off long-term privacy for short-term benefits

Proceedings ArticleDOI
13 Jun 2005
TL;DR: This work considers a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries, and modify the privacy analysis to real-valued functions f and arbitrary row types, greatly improving the bounds on noise required for privacy.
Abstract: We consider a statistical database in which a trusted administrator introduces noise to the query responses with the goal of maintaining privacy of individual database entries. In such a database, a query consists of a pair (S, f) where S is a set of rows in the database and f is a function mapping database rows to {0, 1}. The true answer is ΣieSf(di), and a noisy version is released as the response to the query. Results of Dinur, Dwork, and Nissim show that a strong form of privacy can be maintained using a surprisingly small amount of noise -- much less than the sampling error -- provided the total number of queries is sublinear in the number of database rows. We call this query and (slightly) noisy reply the SuLQ (Sub-Linear Queries) primitive. The assumption of sublinearity becomes reasonable as databases grow increasingly large.We extend this work in two ways. First, we modify the privacy analysis to real-valued functions f and arbitrary row types, as a consequence greatly improving the bounds on noise required for privacy. Second, we examine the computational power of the SuLQ primitive. We show that it is very powerful indeed, in that slightly noisy versions of the following computations can be carried out with very few invocations of the primitive: principal component analysis, k means clustering, the Perceptron Algorithm, the ID3 algorithm, and (apparently!) all algorithms that operate in the in the statistical query learning model [11].

Journal ArticleDOI
TL;DR: A taxonomy of languages and environments designed to make programming more accessible to novice programmers of all ages, organized by their primary goal, either to teach programming or to use programming to empower their users.
Abstract: Since the early 1960's, researchers have built a number of programming languages and environments with the intention of making programming accessible to a larger number of people. This article presents a taxonomy of languages and environments designed to make programming more accessible to novice programmers of all ages. The systems are organized by their primary goal, either to teach programming or to use programming to empower their users, and then, by each system's authors' approach, to making learning to program easier for novice programmers. The article explains all categories in the taxonomy, provides a brief description of the systems in each category, and suggests some avenues for future work in novice programming environments and languages.

Proceedings ArticleDOI
08 May 2005
TL;DR: Polygraph as mentioned in this paper is a signature generation system that successfully produces signatures that match polymorphic worms by using multiple disjoint content substrings, which correspond to protocol framing, return addresses, and poorly obfuscated code.
Abstract: It is widely believed that content-signature-based intrusion detection systems (IDS) are easily evaded by polymorphic worms, which vary their payload on every infection attempt. In this paper, we present Polygraph, a signature generation system that successfully produces signatures that match polymorphic worms. Polygraph generates signatures that consist of multiple disjoint content substrings. In doing so, Polygraph leverages our insight that for a real-world exploit to function properly, multiple invariant substrings must often be present in all variants of a payload; these substrings typically correspond to protocol framing, return addresses, and in some cases, poorly obfuscated code. We contribute a definition of the polymorphic signature generation problem; propose classes of signature suited for matching polymorphic worm payloads; and present algorithms for automatic generation of signatures in these classes. Our evaluation of these algorithms on a range of polymorphic worms demonstrates that Polygraph produces signatures for polymorphic worms that exhibit low false negatives and false positives.

Journal ArticleDOI
TL;DR: The use of block copolymers instead of homopolymers as the matrix is shown to afford opportunities for controlling the spatial and orientational distribution of the nanoelements, which allows much more sophisticated tailoring of the overall properties of the composite material.
Abstract: Heterogeneous materials in which the characteristic length scale of the filler material is in the nanometer range-i.e., nanocomposites-is currently one of the fastest growing areas of materials research. Polymer nanocomposites have expanded beyond the original scope of polymer-nanocrystal dispersions for refractive-index tuning or clay-filled homopolymers primarily pursued for mechanical reinforcement, to include a wide range of applications. This article highlights recent research efforts in the field of structure formation in block copolymer-based nanocomposite materials, and points out opportunities for novel materials based on inclusion of different types of nanoparticles. The use of block copolymers instead of homopolymers as the matrix is shown to afford opportunities for controlling the spatial and orientational distribution of the nanoelements. This, in turn, allows much more sophisticated tailoring of the overall properties of the composite material.

Journal ArticleDOI
27 Jun 2005
TL;DR: SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms.
Abstract: Fast changing, increasingly complex, and diverse computing platforms pose central problems in scientific computing: How to achieve, with reasonable effort, portable optimal performance? We present SPIRAL, which considers this problem for the performance-critical domain of linear digital signal processing (DSP) transforms. For a specified transform, SPIRAL automatically generates high-performance code that is tuned to the given platform. SPIRAL formulates the tuning as an optimization problem and exploits the domain-specific mathematical structure of transform algorithms to implement a feedback-driven optimizer. Similar to a human expert, for a specified transform, SPIRAL "intelligently" generates and explores algorithmic and implementation choices to find the best match to the computer's microarchitecture. The "intelligence" is provided by search and learning techniques that exploit the structure of the algorithm and implementation space to guide the exploration and optimization. SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms. Experimental results show that the code generated by SPIRAL competes with, and sometimes outperforms, the best available human tuned transform library code.

Journal ArticleDOI
TL;DR: In this paper, the authors present a timing attack against OpenSSL and demonstrate that timing attacks against network servers are practical and therefore security systems should defend against them, and they show that timing attack applies to general software systems.

Journal ArticleDOI
TL;DR: One hundred and three estimates of the marginal damage costs of carbon dioxide emissions were gathered from 28 published studies and combined to form a probability density function as discussed by the authors, and the uncertainty is strongly right-skewed.

Journal ArticleDOI
TL;DR: Support is found for the role of joint problem solving with suppliers in facilitating the acquisition of competitive capabilities and for the embedded ties that firms form in networks and alliances.
Abstract: We build on previous research that explores the external acquisition of competitive capabilities through the embedded ties that firms form in networks and alliances. While information sharing and trust have been theorized to be key features of the interorganizational ties that facilitate the acquisition of competitive capabilities, we argue that these mechanisms provide an incomplete explanation because they do not fully address the partially tacit nature of the knowledge that underlies competitive capabilities. Joint problem-solving arrangements play a prominent role in capability acquisition by promoting the transfer of complex and difficult-to-codify knowledge. Drawing on a set of case studies and a survey of 234 job shop manufacturers we find support for the role of joint problem solving with suppliers in facilitating the acquisition of competitive capabilities. Copyright © 2005 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: In this article, the authors show that the optimal sampling frequency is finite and derive its closed-form expression, and demonstrate that modelling the noise and using all the data is a better solution, even if one misspecifies the noise distribution.
Abstract: In theory, the sum of squares of log returns sampled at high frequency estimates their variance. When market microstructure noise is present but unaccounted for, however, we show that the optimal sampling frequency is finite and derive its closed-form expression. But even with optimal sampling, using say five minute returns when transactions are recorded every second, a vast amount of data is discarded, in contradiction to basic statistical principles. We demonstrate that modelling the noise and using all the data is a better solution, even if one misspecifies the noise distribution. So the answer is: sample as often as possible.

Journal ArticleDOI
06 Oct 2005
TL;DR: This work advocate a complete refactoring of the functionality and proposes three key principles--network-level objectives, network-wide views, and direct control--that it believes should underlie a new architecture, called 4D, after the architecture's four planes: decision, dissemination, discovery, and data.
Abstract: Today's data networks are surprisingly fragile and difficult to manage. We argue that the root of these problems lies in the complexity of the control and management planes--the software and protocols coordinating network elements--and particularly the way the decision logic and the distributed-systems issues are inexorably intertwined. We advocate a complete refactoring of the functionality and propose three key principles--network-level objectives, network-wide views, and direct control--that we believe should underlie a new architecture. Following these principles, we identify an extreme design point that we call "4D," after the architecture's four planes: decision, dissemination, discovery, and data. The 4D architecture completely separates an AS's decision logic from pro-tocols that govern the interaction among network elements. The AS-level objectives are specified in the decision plane, and en-forced through direct configuration of the state that drives how the data plane forwards packets. In the 4D architecture, the routers and switches simply forward packets at the behest of the decision plane, and collect measurement data to aid the decision plane in controlling the network. Although 4D would involve substantial changes to today's control and management planes, the format of data packets does not need to change; this eases the deployment path for the 4D architecture, while still enabling substantial innovation in network control and management. We hope that exploring an extreme design point will help focus the attention of the research and industrial communities on this crucially important and intellectually challenging area.

Proceedings ArticleDOI
17 Oct 2005
TL;DR: This work shows that it can estimate the coarse geometric properties of a scene by learning appearance-based models of geometric classes, even in cluttered natural scenes, and provides a multiple-hypothesis framework for robustly estimating scene structure from a single image and obtaining confidences for each geometric label.
Abstract: Many computer vision algorithms limit their performance by ignoring the underlying 3D geometric structure in the image. We show that we can estimate the coarse geometric properties of a scene by learning appearance-based models of geometric classes, even in cluttered natural scenes. Geometric classes describe the 3D orientation of an image region with respect to the camera. We provide a multiple-hypothesis framework for robustly estimating scene structure from a single image and obtaining confidences for each geometric label. These confidences can then be used to improve the performance of many other applications. We provide a thorough quantitative evaluation of our algorithm on a set of outdoor images and demonstrate its usefulness in two applications: object detection and automatic single-view reconstruction.

Proceedings ArticleDOI
08 May 2005
TL;DR: Experimental evaluation demonstrates that the malware-detection algorithm can detect variants of malware with a relatively low run-time overhead and the semantics-aware malware detection algorithm is resilient to common obfuscations used by hackers.
Abstract: A malware detector is a system that attempts to determine whether a program has malicious intent. In order to evade detection, malware writers (hackers) frequently use obfuscation to morph malware. Malware detectors that use a pattern-matching approach (such as commercial virus scanners) are susceptible to obfuscations used by hackers. The fundamental deficiency in the pattern-matching approach to malware detection is that it is purely syntactic and ignores the semantics of instructions. In this paper, we present a malware-detection algorithm that addresses this deficiency by incorporating instruction semantics to detect malicious program traits. Experimental evaluation demonstrates that our malware-detection algorithm can detect variants of malware with a relatively low run-time overhead. Moreover our semantics-aware malware detection algorithm is resilient to common obfuscations used by hackers.