scispace - formally typeset
Search or ask a question

Showing papers on "Latency (engineering) published in 2015"


Proceedings ArticleDOI
17 Aug 2015
TL;DR: TIMELY is the first delay-based congestion control protocol for use in the datacenter, and it achieves its results despite having an order of magnitude fewer RTT signals than earlier delay- based schemes such as Vegas.
Abstract: Datacenter transports aim to deliver low latency messaging together with high throughput. We show that simple packet delay, measured as round-trip times at hosts, is an effective congestion signal without the need for switch feedback. First, we show that advances in NIC hardware have made RTT measurement possible with microsecond accuracy, and that these RTTs are sufficient to estimate switch queueing. Then we describe how TIMELY can adjust transmission rates using RTT gradients to keep packet latency low while delivering high bandwidth. We implement our design in host software running over NICs with OS-bypass capabilities. We show using experiments with up to hundreds of machines on a Clos network topology that it provides excellent performance: turning on TIMELY for OS-bypass messaging over a fabric with PFC lowers 99 percentile tail latency by 9X while maintaining near line-rate throughput. Our system also outperforms DCTCP running in an optimized kernel, reducing tail latency by $13$X. To the best of our knowledge, TIMELY is the first delay-based congestion control protocol for use in the datacenter, and it achieves its results despite having an order of magnitude fewer RTT signals (due to NIC offload) than earlier delay-based schemes such as Vegas.

442 citations


Proceedings ArticleDOI
17 Aug 2015
TL;DR: The Pingmesh system for large-scale data center network latency measurement and analysis is developed to answer the question affirmatively: can the authors get network latency between any two servers at any time in large- scale data center networks?
Abstract: Can we get network latency between any two servers at any time in large-scale data center networks? The collected latency data can then be used to address a series of challenges: telling if an application perceived latency issue is caused by the network or not, defining and tracking network service level agreement (SLA), and automatic network troubleshooting. We have developed the Pingmesh system for large-scale data center network latency measurement and analysis to answer the above question affirmatively. Pingmesh has been running in Microsoft data centers for more than four years, and it collects tens of terabytes of latency data per day. Pingmesh is widely used by not only network software developers and engineers, but also application and service developers and operators.

336 citations


Journal ArticleDOI
TL;DR: It is shown that protein kinase C agonists in combination with bromodomain inhibitor JQ1 or histone deacetylase inhibitors robustly induce HIV-1 transcription and virus production when directly compared with maximum reactivation by T cell activation.
Abstract: Reversal of HIV-1 latency by small molecules is a potential cure strategy. This approach will likely require effective drug combinations to achieve high levels of latency reversal. Using resting CD4+ T cells (rCD4s) from infected individuals, we developed an experimental and theoretical framework to identify effective latency-reversing agent (LRA) combinations. Utilizing ex vivo assays for intracellular HIV-1 mRNA and virion production, we compared 2-drug combinations of leading candidate LRAs and identified multiple combinations that effectively reverse latency. We showed that protein kinase C agonists in combination with bromodomain inhibitor JQ1 or histone deacetylase inhibitors robustly induce HIV-1 transcription and virus production when directly compared with maximum reactivation by T cell activation. Using the Bliss independence model to quantitate combined drug effects, we demonstrated that these combinations synergize to induce HIV-1 transcription. This robust latency reversal occurred without release of proinflammatory cytokines by rCD4s. To extend the clinical utility of our findings, we applied a mathematical model that estimates in vivo changes in plasma HIV-1 RNA from ex vivo measurements of virus production. Our study reconciles diverse findings from previous studies, establishes a quantitative experimental approach to evaluate combinatorial LRA efficacy, and presents a model to predict in vivo responses to LRAs.

335 citations


Journal ArticleDOI
TL;DR: A framework for the joint optimization of the radio and computational resource usage exploiting the tradeoff between energy consumption and latency is provided and the minimization of the total consumed energy without latency constraints is analyzed.
Abstract: Providing femto access points (FAPs) with computational capabilities will allow (either total or partial) offloading of highly demanding applications from smartphones to the so-called femto-cloud. Such offloading promises to be beneficial in terms of battery savings at the mobile terminal (MT) and/or in latency reduction in the execution of applications. However, for this promise to become a reality, the energy and/or the time required for the communication process must be compensated by the energy and/or the time savings that result from the remote computation at the FAPs. For this problem, we provide in this paper a framework for the joint optimization of the radio and computational resource usage exploiting the tradeoff between energy consumption and latency. Multiple antennas are assumed to be available at the MT and the serving FAP. As a result of the optimization, the optimal communication strategy (e.g., transmission power, rate, and precoder) is obtained, as well as the optimal distribution of the computational load between the handset and the serving FAP. This paper also establishes the conditions under which total or no offloading is optimal, determines which is the minimum affordable latency in the execution of the application, and analyzes, as a particular case, the minimization of the total consumed energy without latency constraints.

330 citations


Journal ArticleDOI
TL;DR: P3 onset latency is shorter when stopping is successful, that it is highly correlated with SSRT, and that it coincides with the purported timing of the inhibition process (towards the end of SSRT).
Abstract: The frontocentral P3 event-related potential has been proposed as a neural marker of response inhibition. However, this association is disputed: some argue that P3 latency is too late relative to the timing of action stopping (stop-signal reaction time; SSRT) to index response inhibition. We tested whether P3 onset latency is a marker of response inhibition, and whether it coincides with the timing predicted by neurocomputational models. We measured EEG in 62 participants during the stop-signal task, and used independent component analysis and permutation statistics to measure the P3 onset in each participant. We show that P3 onset latency is shorter when stopping is successful, that it is highly correlated with SSRT, and that it coincides with the purported timing of the inhibition process (towards the end of SSRT). These results demonstrate the utility of P3 onset latency as a noninvasive, temporally precise neural marker of the response inhibition process.

205 citations


Journal ArticleDOI
26 Feb 2015-Cell
TL;DR: Synthetically decouple viral dependence on cellular environment from viral transcription and show that Tat feedback is sufficient to regulate latency independent of cellular activation, demonstrating that a largely autonomous, viral-encoded program underlies HIV latency.

200 citations


Proceedings Article
04 May 2015
TL;DR: It is shown that QJUMP achieves bounded latency and reduces in-network interference by up to 300×, outperforming Ethernet Flow Control (802.3x), ECN (WRED) and DCTCP and pFabric.
Abstract: QJUMP is a simple and immediately deployable approach to controlling network interference in datacenter networks. Network interference occurs when congestion from throughput-intensive applications causes queueing that delays traffic from latency-sensitive applications. To mitigate network interference, QJUMP applies Internet QoS-inspired techniques to datacenter applications. Each application is assigned to a latency sensitivity level (or class). Packets from higher levels are rate-limited in the end host, but once allowed into the network can "jump-the-queue" over packets from lower levels. In settings with known node counts and link speeds, QJUMP can support service levels ranging from strictly bounded latency (but with low rate) through to line-rate throughput (but with high latency variance). We have implemented QJUMP as a Linux Traffic Control module. We show that QJUMP achieves bounded latency and reduces in-network interference by up to 300×, outperforming Ethernet Flow Control (802.3x), ECN (WRED) and DCTCP. We also show that QJUMP improves average flow completion times, performing close to or better than DCTCP and pFabric.

176 citations


Proceedings Article
04 May 2015
TL;DR: The design and implementation of an adaptive replica selection mechanism, C3, that is robust to performance variability in the environment is presented and results show that C3 significantly improves the latencies along the mean, median, and tail and provides higher system throughput.
Abstract: Achieving predictable performance is critical for many distributed applications, yet difficult to achieve due to many factors that skew the tail of the latency distribution even in well-provisioned systems. In this paper, we present the fundamental challenges involved in designing a replica selection scheme that is robust in the face of performance fluctuations across servers. We illustrate these challenges through performance evaluations of the Cassandra distributed database on Amazon EC2. We then present the design and implementation of an adaptive replica selection mechanism, C3, that is robust to performance variability in the environment. We demonstrate C3's effectiveness in reducing the latency tail and improving throughput through extensive evaluations on Amazon EC2 and through simulations. Our results show that C3 significantly improves the latencies along the mean, median, and tail (up to 3 times improvement at the 99.9th percentile) and provides higher system throughput.

169 citations


Journal ArticleDOI
19 Nov 2015
TL;DR: This work analyses how task replication reduces latency, and proposes a heuristic algorithm to search for the best replication strategies when it is difficult to model the empirical behavior of task execution time and uses the proposed analysis techniques.
Abstract: In cloud computing jobs consisting of many tasks run in parallel, the tasks on the slowest machines (straggling tasks) become the bottleneck in the completion of the job. One way to combat the variability in machine response time is to add replicas of straggling tasks and wait for the earliest copy to finish. Using the theory of extreme order statistics, we analyze how task replication reduces latency, and its impact on the cost of computing resources. We also propose a heuristic algorithm to search for the best replication strategies when it is difficult to model the empirical behavior of task execution time and use the proposed analysis techniques. Evaluation of the heuristic policies on Google Trace data shows a significant latency reduction compared to the replication strategy used in MapReduce.

144 citations


Journal ArticleDOI
TL;DR: The model can be applied to describe the within-host dynamics of HBV, HIV, or HTLV-1 infection and it is found that the global stability of the chronic infection equilibrium might change in some special cases when the assumptions do not hold.
Abstract: A within-host viral infection model with both virus-to-cell and cell-to-cell transmissions and three distributed delays is investigated, in which the first distributed delay describes the intracellular latency for the virus-to-cell infection, the second delay represents the intracellular latency for the cell-to-cell infection, and the third delay describes the time period that viruses penetrated into cells and infected cells release new virions. The global stability analysis of the model is carried out in terms of the basic reproduction number R0. If R0≤1, the infection-free (semi-trivial) equilibrium is the unique equilibrium and is globally stable; if R0>1, the chronic infection (positive) equilibrium exists and is globally stable under certain assumptions. Examples and numerical simulations for several special cases are presented, including various within-host dynamics models with discrete or distributed delays that have been well-studied in the literature. It is found that the global stability of the chronic infection equilibrium might change in some special cases when the assumptions do not hold. The results show that the model can be applied to describe the within-host dynamics of HBV, HIV, or HTLV-1 infection.

137 citations


Proceedings ArticleDOI
01 Jun 2015
TL;DR: A model for estimating the latency of a data flow, when the degrees of parallelism of the tasks within are changed is introduced, and how it can be used to enforce latency guarantees, by determining appropriate scaling actions at runtime is described.
Abstract: Many Big Data applications in science and industry have arisen, that require large amounts of streamed or event data to be analyzed with low latency. This paper presents a reactive strategy to enforce latency guarantees in data flows running on scalable Stream Processing Engines (SPEs), while minimizing resource consumption. We introduce a model for estimating the latency of a data flow, when the degrees of parallelism of the tasks within are changed. We describe how to continuously measure the necessary performance metrics for the model, and how it can be used to enforce latency guarantees, by determining appropriate scaling actions at runtime. Therefore, it leverages the elasticity inherent to common cloud technology and cluster resource management systems. We have implemented our strategy as part of the Nephele SPE. To showcase the effectiveness of our approach, we provide an experimental evaluation on a large commodity cluster, using both a synthetic workload as well as an application performing real-time sentiment analysis on real-world social media data.

Proceedings ArticleDOI
14 Mar 2015
TL;DR: Few-to-Many (FM) incremental parallelization is introduced, which dynamically increases parallelism to reduce tail latency and improves tail latency by a factor of two compared to prior state-of-the-art parallelization.
Abstract: Interactive services, such as Web search, recommendations, games, and finance, must respond quickly to satisfy customers. Achieving this goal requires optimizing tail (e.g., 99th+ percentile) latency. Although every server is multicore, parallelizing individual requests to reduce tail latency is challenging because (1) service demand is unknown when requests arrive; (2) blindly parallelizing all requests quickly oversubscribes hardware resources; and (3) parallelizing the numerous short requests will not improve tail latency. This paper introduces Few-to-Many (FM) incremental parallelization, which dynamically increases parallelism to reduce tail latency. FM uses request service demand profiles and hardware parallelism in an offline phase to compute a policy, represented as an interval table, which specifies when and how much software parallelism to add. At runtime, FM adds parallelism as specified by the interval table indexed by dynamic system load and request execution time progress. The longer a request executes, the more parallelism FM adds. We evaluate FM in Lucene, an open-source enterprise search engine, and in Bing, a commercial Web search engine. FM improves the 99th percentile response time up to 32% in Lucene and up to 26% in Bing, compared to prior state-of-the-art parallelization. Compared to running requests sequentially in Bing, FM improves tail latency by a factor of two. These results illustrate that incremental parallelism is a powerful tool for reducing tail latency.

Journal ArticleDOI
26 Feb 2015-Cell
TL;DR: It is proposed that latency is an evolutionary "bet-hedging" strategy whose frequency has been optimized to maximize lentiviral transmission by reducing viral extinction during mucosal infections.

Proceedings ArticleDOI
24 Aug 2015
TL;DR: Her Hermes, a novel fully polynomial time problem approximation scheme (FPTAS) algorithm, is proposed to solve the problem to minimize the latency while meeting prescribed resource utilization constraints.
Abstract: With mobile devices increasingly able to connect to cloud servers from anywhere, resource-constrained devices can potentially perform offloading of computational tasks to either improve resource usage or improve performance. It is of interest to find optimal assignments of tasks to local and remote devices that can take into account the application-specific profile, availability of computational resources, and link connectivity, and find a balance between energy consumption costs of mobile devices and latency for delay-sensitive applications. Given an application described by a task dependency graph, we formulate an optimization problem to minimize the latency while meeting prescribed resource utilization constraints. Different from most of existing works that either rely on an integer linear programming formulation, which is NP-hard and not applicable to general task dependency graph for latency metrics, or on intuitively derived heuristics that offer no theoretical performance guarantees, we propose Hermes, a novel fully polynomial time problem approximation scheme (FPTAS) algorithm to solve this problem. Hermes pros vides a solution with latency no more than (1 + e) times of the minimum while incurring complexity that is an polynomial in problem size and //e We evaluate the performance by using real data set collected from several benchmarks, and show that Hermes improves the latency by 16% (36% for larger scale application) compared to a previously published heuristic and increases CPU computing time by only 0.4% of overall latency.

Proceedings Article
04 May 2015
TL;DR: CosTLO is designed to satisfy any application's goals for latency variance by estimating the latency variance offered by any particular configuration, efficiently searching through the configuration space to select a cost-effective configuration among the ones that can offer the desired latency variance.
Abstract: We present CosTLO, a system that reduces the high latency variance associated with cloud storage services by augmenting GET/PUT requests issued by end-hosts with redundant requests, so that the earliest response can be considered. To reduce the cost overhead imposed by redundancy, unlike prior efforts that have used this approach, CosTLO combines the use of multiple forms of redundancy. Since this results in a large number of configurations in which CosTLO can issue redundant requests, we conduct a comprehensive measurement study on S3 and Azure to identify the configurations that are viable in practice. Informed by this study, we design CosTLO to satisfy any application's goals for latency variance by 1) estimating the latency variance offered by any particular configuration, 2) efficiently searching through the configuration space to select a cost-effective configuration among the ones that can offer the desired latency variance, and 3) preserving data consistency despite CosTLO's use of redundant requests. We show that, for the median PlanetLab node, CosTLO can halve the latency variance associated with fetching content from Amazon S3, with only a 25% increase in cost.

Book ChapterDOI
19 Mar 2015
TL;DR: Data center network operators have to continually monitor path latency to quickly detect and re-route traffic away from high-delay path segments by passively capturing and aggregating traffic on network devices.
Abstract: Data center network operators have to continually monitor path latency to quickly detect and re-route traffic away from high-delay path segments. Existing latency monitoring techniques in data centers rely on either (1) actively sending probes from end-hosts, which is restricted in some cases and can only measure end-to-end latencies, or (2) passively capturing and aggregating traffic on network devices, which requires hardware modifications.

Proceedings ArticleDOI
18 Apr 2015
TL;DR: The first experiment extends previous efforts to measure latency perception by reporting on a unified study in which direct and indirect form-factors are compared for both tapping and dragging tasks, showing significant effects from both form-factor and task.
Abstract: This paper reports on two experiments designed to further our understanding of users' perception of latency in touch- based systems. The first experiment extends previous efforts to measure latency perception by reporting on a unified study in which direct and indirect form-factors are compared for both tapping and dragging tasks. Our results show significant effects from both form-factor and task, and inform system designers as to what input latencies they should aim to achieve in a variety of system types. A follow-up experiment investigates peoples' ability to perceive small improvements to latency in direct and indirect form-factors for tapping and dragging tasks. Our results provide guidance to system designers of the relative value of making improvements in latency that reduce but do not fully eliminate lag from their systems.

Proceedings Article
Changhyun Lee1, Chunjong Park1, Keon Jang2, Sue Moon1, Dongsu Han1 
08 Jul 2015
TL;DR: It is demonstrated that latency-based implicit feedback is accurate enough to signal a single packet's queuing delay in 10 Gbps networks, and the latency feedback can be used to perform practical and fine-grained congestion control in high-speed datacenter networks.
Abstract: The nature of congestion feedback largely governs the behavior of congestion control. In datacenter networks, where RTTs are in hundreds of microseconds, accurate feedback is crucial to achieve both high utilization and low queueing delay. Proposals for datacenter congestion control predominantly leverage ECN or even explicit in-network feedback (e.g., RCP-type feedback) to minimize the queuing delay. In this work we explore latency-based feedback as an alternative and show its advantages over ECN. Against the common belief that such implicit feed-back is noisy and inaccurate, we demonstrate that latency-based implicit feedback is accurate enough to signal a single packet's queuing delay in 10 Gbps networks. DX enables accurate queuing delay measurements whose error falls within 1.98 and 0.53 microseconds using software-based and hardware-based latency measurements, respectively. This enables us to design a new congestion control algorithm that performs fine-grained control to adjust the congestion window just enough to achieve very low queuing delay while attaining full utilization. Our extensive evaluation shows that 1) the latency measurement accurately reflects the one-way queuing delay in single packet level; 2) the latency feedback can be used to perform practical and fine-grained congestion control in high-speed datacenter networks; and 3) DX outperforms DCTCP with 5.33× smaller median queueing delay at 1 Gbps and 1.57× at 10 Gbps.

Journal ArticleDOI
TL;DR: An extensive and careful study of latent reservoir decay by Crooks et al is reported, reported in this issue of the Journal, and confirms that the stability of the latent reservoir is not determined by treatment regimens.
Abstract: The modern era of antiretroviral therapy (ART) for human immunodeficiency virus type 1 (HIV-1) infectionbegan in the mid-1990s with the introduction of 2 new classes of antiretroviral drugs, the protease inhibitors (PIs) and the nonnucleoside reverse-transcriptase inhibitors. Combinations consisting of 1 of these drugs along with 2 nucleoside analogue reverse-transcriptase inhibitors rapidly reduced plasma HIV-1 RNA levels to below the limit of detection of clinical assays [1, 2], leading to predictions that continued treatment for 2–3 years could cure the infection [3]. Although it did not prove curative, combination ART became the mainstay of HIV treatment, allowing durable control of viral replication and reversal or prevention of immunodeficiency [4]. A major reason why ART did not prove curative is the persistence of a latent form of the virus in a small population of resting memory CD4 T cells [5, 6]. In these cells, the viral genome is stably integrated into host cell DNA, but viral genes are not expressed at significant levels in part because of the absence of key host transcription factors that are recruited to the HIV prompter only after T-cell activation. The latent reservoir for HIV-1 was originally demonstrated using an assay in which resting cells from patients are activated to reverse latency [6]. Viruses released from individual latently infected cells are expanded in culture. This viral outgrowth assay (VOA) was used to demonstrate the remarkable stability of the latent reservoir [7–9]. The half-life of this pool of cells was shown to be 44 months. At this rate of decay, >70 years would be required for a pool of just 10 cells to decay completely [8, 9]. Initial studies of the decay of the latent reservoir were completed in 2003 [9]. Since that time, remarkable advances in ART have taken place, including the introduction of new classes of antiretroviral drugs, such as integrase inhibitors, and the development of simplified regimens in which multiple antiretroviral drugs are combined into a single pill that can be taken once daily [4]. In this context, an extensive and careful study of latent reservoir decay by Crooks et al [10], reported in this issue of the Journal, is of particular interest. The authors have reexamined the stability of the latent reservoir using longitudinal VOAs in a series of 37 patients, some of whom have been receiving treatment for most of the modern ART era. Despite the long duration of treatment in some patients and the changes in ART, the authors found that the decay rate of the latent reservoir is almost exactly the same as that reported in 2003. The half-life measured by Crooks et al is 43 months [10]. The fact that the decay rate measured in the present study is no different from that measured more than a decade ago confirms that the stability of the latent reservoir is not determined by treatment regimens. As long as the regimen produces a complete or near-complete arrest of new infection events, the decay of the reservoir is determined by the biology of the resting memory T cells that harbor persistent HIV-1. Pharmacodynamic studies indicate that the nonnucleoside reversetranscriptase inhibitors and PIs possess a remarkable potential to inhibit viral replication, a property that reflects an unexpected degree of cooperativity in their dose-response curves [11, 12]. At clinical concentrations, the best PIs can actually produce a 10 billion–fold inhibition of a single round of HIV-1 replication. Thus, even the early combination therapy regimens may have produced complete or near-complete inhibition of new infection events in drug-adherent patients. Subsequent improvements in ART have largely affected tolerability and convenience. Viewed in this light, the finding that the reservoir decay is constant is not surprising. The cures now being routinely achieved with direct-acting antiviral drugs Received and accepted 6 April 2015; electronically published 15 April 2015. Correspondence: Janet M. Siliciano, PhD, Johns Hopkins School of Medicine, 733 N Broadway, Miller Research Bldg 871, Baltimore, MD 21205 (jsilicia@jhmi.edu). The Journal of Infectious Diseases 2015;212:1345–7 © The Author 2015. Published by Oxford University Press on behalf of the Infectious Diseases Society of America. All rights reserved. For Permissions, please e-mail: journals. permissions@oup.com. DOI: 10.1093/infdis/jiv219

Proceedings ArticleDOI
18 Apr 2015
TL;DR: This work tested local latency in a variety of real-world gaming scenarios and carried out a controlled study focusing on targeting and tracking activities in an FPS game with varying degrees of local latency, showing that local latency is a real and substantial problem -- but games can mitigate the problem with appropriate compensation methods.
Abstract: Real-time games such as first-person shooters (FPS) are sensitive to even small amounts of lag. The effects of net-work latency have been studied, but less is known about local latency, the lag caused by input devices and displays. While local latency is important to gamers, we do not know how it affects aiming performance and whether we can reduce its negative effects. To explore these issues, we tested local latency in a variety of real-world gaming scenarios and carried out a controlled study focusing on targeting and tracking activities in an FPS game with varying degrees of local latency. In addition, we tested the ability of a lag compensation technique (based on aim assistance) to mitigate the negative effects. Our study found local latencies in the real-world range from 23 to 243 ms which cause significant and substantial degradation in performance (even for latencies as low as 41 ms). The study also showed that our compensation technique worked extremely well, reducing the problems caused by lag in the case of targeting, and removing the problem altogether in the case of tracking. Our work shows that local latency is a real and substantial problem -- but games can mitigate the problem with appropriate compensation methods.

Posted Content
TL;DR: A general redundancy strategy is designed that achieves a good latency-cost trade-off for an arbitrary service time distribution and generalizes and extends some results in the analysis of fork-join queues.
Abstract: In cloud computing systems, assigning a task to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers, and reduce latency. But adding redundancy may result in higher cost of computing resources, as well as an increase in queueing delay due to higher traffic load. This work helps understand when and how redundancy gives a cost-efficient reduction in latency. For a general task service time distribution, we compare different redundancy strategies in terms of the number of redundant tasks, and time when they are issued and canceled. We get the insight that the log-concavity of the task service time creates a dichotomy of when adding redundancy helps. If the service time distribution is log-convex (i.e. log of the tail probability is convex) then adding maximum redundancy reduces both latency and cost. And if it is log-concave (i.e. log of the tail probability is concave), then less redundancy, and early cancellation of redundant tasks is more effective. Using these insights, we design a general redundancy strategy that achieves a good latency-cost trade-off for an arbitrary service time distribution. This work also generalizes and extends some results in the analysis of fork-join queues.

Journal ArticleDOI
TL;DR: The role of miRNAs in virus latency and persistence, specifically focusing on herpesviruses is discussed, and potential areas of future research and how novel technologies may aid in determining how mi RNAs shape virus latency in the context of herpesvirus infections are discussed.
Abstract: The identification of virally encoded microRNAs (miRNAs) has had a major impact on the field of herpes virology. Given their ability to target cellular and viral transcripts, and the lack of immune response to small RNAs, miRNAs represent an ideal mechanism of gene regulation during viral latency and persistence. In this review, we discuss the role of miRNAs in virus latency and persistence, specifically focusing on herpesviruses. We cover the current knowledge on miRNAs in establishing and maintaining virus latency and promoting survival of infected cells through targeting of both viral and cellular transcripts, highlighting key publications in the field. We also discuss potential areas of future research and how novel technologies may aid in determining how miRNAs shape virus latency in the context of herpesvirus infections.

Book ChapterDOI
Chuangen Gao1, Hua Wang1, Fangjin Zhu1, Linbo Zhai1, Shanwen Yi1 
18 Nov 2015
TL;DR: A particle swarm optimization algorithm is proposed to solve the global latency controller placement problem with capacitated controllers, taking into consideration both the latency between controllers and the capacities of controllers.
Abstract: Software defined network (SDN) decouples the control plane from packet processing device and introduces the controller placement problem. The previous methods only focus on propagation latency between controllers and switches but ignore either the latency from controllers to controllers or the capacities of controllers, both of which are critical factors in real networks. In this paper, we define a global latency controller placement problem with capacitated controllers, taking into consideration both the latency between controllers and the capacities of controllers. And this paper proposes a particle swarm optimization algorithm to solve the problem for the first time. Simulation results show that the algorithm has better performance in propagation latency, computation time, and convergence.

Journal ArticleDOI
16 Sep 2015
TL;DR: This work analyzes the trade-off between latency and the cost of computing resources in queues with redundancy, without assuming exponential service time, and studies a generalized fork-join queueing model where finishing any k out of n tasks is sufficient to complete a job.
Abstract: A major advantage of cloud computing and storage is the large-scale sharing of resources, which provides scalability and flexibility. But resource-sharing causes variability in the latency experienced by the user, due to several factors such as virtualization, server outages, network congestion etc . This problem is further aggravated when a job consists of several parallel tasks, because the task run on the slowest machine becomes the latency bottleneck. A promising method to reduce latency is to assign a task to multiple machines and wait for the earliest to finish. Similarly, in cloud storage systems requests to download the content can be assigned to multiple replicas, such that it is sufficient to download any one replica. Although studied actively in systems in the past few years, there is little work on rigorous analysis of how redundancy affects latency. The effect of redundancy in queueing systems was first analyzed only recently in [2, 3, 6], assuming exponential service time. General service time distribution, in particular the effect of its tail, is considered in [7, 8]. This work analyzes the trade-off between latency and the cost of computing resources in queues with redundancy, without assuming exponential service time. We study a generalized fork-join queueing model where finishing any k out of n tasks is sufficient to complete a job. The redundant tasks can be canceled when any k tasks finish, or earlier, when any k tasks start service. For the k = 1 case, we get an elegant latency and cost analysis by identifying equivalences between systems without and with early redundancy cancellation to M/G/1 and M/G/n queues respectively. For general k, we derive bounds on the latency and cost. Please see [4] for an extended version of this work.

Proceedings ArticleDOI
05 Dec 2015
TL;DR: This paper proposed a novel asymmetric DRAM with capability to perform low cost data migration between subarrays with a simple management mechanism and explored many management related policies to achieve 7.25% and 11.77% performance improvement in single- and multi-programming workloads, respectively, over a system with traditional homogeneous DRAM.
Abstract: The evolution of DRAM technology has been driven by capacity and bandwidth during the last decade. In contrast, DRAM access latency stays relatively constant and is trending to increase. Much efforts have been devoted to tolerate memory access latency but these techniques have reached the point of diminishing returns. Having shorter bitline and wordline length in a DRAM device will reduce the access latency. However by doing so it will impact the array efficiency. In the mainstream market, manufacturers are not willing to trade capacity for latency. Prior works had proposed hybrid-bitline DRAM design to overcome this problem. However, those methods are either intrusive to the circuit and layout of the DRAM design, or there is no direct way to migrate data between the fast and slow levels. In this paper, we proposed a novel asymmetric DRAM with capability to perform low cost data migration between subarrays. Having this design we determined a simple management mechanism and explored many management related policies. We showed that with this new design and our simple management technique we could achieve 7.25% and 11.77% performance improvement in single- and multi-programming workloads, respectively, over a system with traditional homogeneous DRAM. This gain is above 80% of the potential performance gain of a system based on a hypothetical DRAM which is made out of short bitlines entirely.

Proceedings ArticleDOI
02 Feb 2015
TL;DR: The proposed prediction framework has a unique set of characteristics to predict long-running queries with high recall and improved precision and is effective in reducing the extreme tail latency compared to a start-of-the-art predictor and improves server throughput by more than 70% because of its improved precision.
Abstract: A commercial web search engine shards its index among many servers, and therefore the response time of a search query is dominated by the slowest server that processes the query. Prior approaches target improving responsiveness by reducing the tail latency of an individual search server. They predict query execution time, and if a query is predicted to be long-running, it runs in parallel, otherwise it runs sequentially. These approaches are, however, not accurate enough for reducing a high tail latency when responses are aggregated from many servers because this requires each server to reduce a substantially higher tail latency (e.g., the 99.99th-percentile), which we call extreme tail latency.We propose a prediction framework to reduce the extreme tail latency of search servers. The framework has a unique set of characteristics to predict long-running queries with high recall and improved precision. Specifically, prediction is delayed by a short duration to allow many short-running queries to complete without parallelization, and to allow the predictor to collect a set of dynamic features using runtime information. These features estimate query execution time with high accuracy. We also use them to estimate the prediction errors to override an uncertain prediction by selectively accelerating the query for a higher recall.We evaluate the proposed prediction framework to improve search engine performance in two scenarios using a simulation study: (1) query parallelization on a multicore processor, and (2) query scheduling on a heterogeneous processor. The results show that, for both scenarios, the proposed framework is effective in reducing the extreme tail latency compared to a start-of-the-art predictor because of its higher recall, and it improves server throughput by more than 70% because of its improved precision.

Proceedings ArticleDOI
01 Feb 2015
TL;DR: A combination of Early Read and Turbo Read can reduce the PCM read latency by 30%, improve the system performance by 21%, and reduce the Energy Delay Product (EDP) by 28%, while requiring minimal changes to the memory system.
Abstract: Phase Change Memory (PCM) is an emerging memory technology that can enable scalable high-density main memory systems. Unfortunately, PCM has higher read latency than DRAM, resulting in lower system performance. This paper investigates architectural techniques to improve the read latency of PCM. We observe that there is a wide distribution in cell resistance in both the SET state and the RESET state, and that the read latency of PCM is designed conservatively to handle the worst case cell. If PCM sensing can be tuned to exploit the variability in cell resistance, then we can get reduced read latency. We propose two schemes to enable better-than-worst-case read latency for PCM systems. Our first proposal, Early Read, reads the data earlier than the specified time period. Our key observation that Early Read causes only unidirectional errors (SET being read as RESET) allows us to efficiently detect data errors using Berger codes. In the uncommon case that Early Read causes data error(s), we simply retry the read operation with original latency. Our evaluations show that Early Read can reduce the read latency by 25% while incurring a storage overhead of only 10 bits per 64 byte line. Our second proposal, Turbo Read, reduces the sensing time for read operations by pumping higher current, at the expense of accidentally switching the PCM cell with small probability during the read operation. We analyze Error Correction Codes (ECC) and Probabilistic Row Scrubbing (PRS) for maintaining data integrity under Turbo Read. We show that a combination of Early Read and Turbo Read can reduce the PCM read latency by 30%, improve the system performance by 21%, and reduce the Energy Delay Product (EDP) by 28%, while requiring minimal changes to the memory system.

Journal ArticleDOI
TL;DR: The results show that while the AQM algorithms can significantly improve steady state performance, they exacerbate TCP flow unfairness and severely struggle to quickly control queueing latency at flow startup, which can lead to large latency spikes that hurt the perceived performance.

Proceedings ArticleDOI
03 Dec 2015
TL;DR: This work experimentally demonstrates the transmission of 48 20-MHz LTE signals with a CPRI-equivalent data rate of 59 Gb/s, achieving a low round-trip digital-signal-processing latency of <;2 μs and a low mean error-vector magnitude of ~2.5 % after fiber transmission.
Abstract: We experimentally demonstrate the transmission of 48 20-MHz LTE signals with a CPRI-equivalent data rate of 59 Gb/s, achieving a low round-trip digital-signal-processing latency of <2 μs and a low mean error-vector magnitude of ∼2.5 % after fiber transmission.

Proceedings ArticleDOI
15 Oct 2015
TL;DR: In this article, the authors analyzed how different redundancy strategies, for eg. number of replicas, and the time when they are issued and canceled, affect the latency and computing cost.
Abstract: In cloud computing systems, assigning a job to multiple servers and waiting for the earliest copy to finish is an effective method to combat the variability in response time of individual servers. Although adding redundant replicas always reduces service time, the total computing time spent per job may be higher, thus increasing waiting time in queue. The total time spent per job is also proportional to the cost of computing resources. We analyze how different redundancy strategies, for eg. number of replicas, and the time when they are issued and canceled, affect the latency and computing cost. We get the insight that the log-concavity of the service time distribution is a key factor in determining whether adding redundancy reduces latency and cost. If the service distribution is log-convex, then adding maximum redundancy reduces both latency and cost. And if it is log-concave, then having fewer replicas and canceling the redundant requests early is more effective.