scispace - formally typeset
Search or ask a question
Institution

National University of Defense Technology

EducationChangsha, China
About: National University of Defense Technology is a education organization based out in Changsha, China. It is known for research contribution in the topics: Computer science & Radar. The organization has 39430 authors who have published 40181 publications receiving 358979 citations. The organization is also known as: Guófáng Kēxuéjìshù Dàxué & NUDT.


Papers
More filters
Journal ArticleDOI
TL;DR: The idea is to design an Incast congestion Control for TCP (ICTCP) scheme on the receiver side that adjusts the TCP receive window proactively before packet loss occurs, and achieves almost zero timeouts and high goodput for TCP incast.
Abstract: Transport Control Protocol (TCP) incast congestion happens in high-bandwidth and low-latency networks when multiple synchronized servers send data to the same receiver in parallel. For many important data-center applications such as MapReduce and Search, this many-to-one traffic pattern is common. Hence TCP incast congestion may severely degrade their performances, e.g., by increasing response time. In this paper, we study TCP incast in detail by focusing on the relationships between TCP throughput, round-trip time (RTT), and receive window. Unlike previous approaches, which mitigate the impact of TCP incast congestion by using a fine-grained timeout value, our idea is to design an Incast congestion Control for TCP (ICTCP) scheme on the receiver side. In particular, our method adjusts the TCP receive window proactively before packet loss occurs. The implementation and experiments in our testbed demonstrate that we achieve almost zero timeouts and high goodput for TCP incast.

215 citations

Journal ArticleDOI
TL;DR: This study confirms that traditional approaches to bug triaging and code review are feasible for pull-request reviewer recommendations on GitHub, and their performance can be improved significantly by combining them with information extracted from prior social interactions between developers on GitHub.
Abstract: Context: The pull-based model, widely used in distributed software development, offers an extremely low barrier to entry for potential contributors (anyone can submit of contributions to any project, through pull-requests). Meanwhile, the project's core team must act as guardians of code quality, ensuring that pull-requests are carefully inspected before being merged into the main development line. However, with pull-requests becoming increasingly popular, the need for qualified reviewers also increases. GitHub facilitates this, by enabling the crowd-sourcing of pull-request reviews to a larger community of coders than just the project's core team, as a part of their social coding philosophy. However, having access to more potential reviewers does not necessarily mean that it's easier to find the right ones (the "needle in a haystack" problem). If left unsupervised, this process may result in communication overhead and delayed pull-request processing.Objective: This study aims to investigate whether and how previous approaches used in bug triaging and code review can be adapted to recommending reviewers for pull-requests, and how to improve the recommendation performance.Method: First, we extend three typical approaches used in bug triaging and code review for the new challenge of assigning reviewers to pull-requests. Second, we analyze social relations between contributors and reviewers, and propose a novel approach by mining each project's comment networks (CNs). Finally, we combine the CNs with traditional approaches, and evaluate the effectiveness of all these methods on 84 GitHub projects through both quantitative and qualitative analysis.Results: We find that CN-based recommendation can achieve, by itself, similar performance as the traditional approaches. However, the mixed approaches can achieve significant improvements compared to using either of them independently.Conclusion: Our study confirms that traditional approaches to bug triaging and code review are feasible for pull-request reviewer recommendations on GitHub. Furthermore, their performance can be improved significantly by combining them with information extracted from prior social interactions between developers on GitHub. These results prompt for novel tools to support process automation in social coding platforms, that combine social (e.g., common interests among developers) and technical factors (e.g., developers' expertise).

214 citations

Proceedings ArticleDOI
07 Dec 2011
TL;DR: This paper proposes an efficient parallel density-based clustering algorithm and implements it by a 4-stages MapReduce paradigm and adopts a quick partitioning strategy for large scale non-indexed data.
Abstract: Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. In this paper, we propose an efficient parallel density-based clustering algorithm and implement it by a 4-stages MapReduce paradigm. Furthermore, we adopt a quick partitioning strategy for large scale non-indexed data. We study the metric of merge among bordering partitions and make optimizations on it. At last, we evaluate our work on real large scale datasets using Hadoop platform. Results reveal that the speedup and scale up of our work are very efficient.

213 citations

Proceedings ArticleDOI
18 Sep 2012
TL;DR: This paper provides a control-theoretic solution to the dynamic capacity provisioning problem that minimizes the total energy cost while meeting the performance objective in terms of task scheduling delay, and uses Model Predictive Control (MPC) to find the optimal control policy.
Abstract: Data centers have recently gained significant popularity as a cost-effective platform for hosting large-scale service applications. While large data centers enjoy economies of scale by amortizing initial capital investment over large number of machines, they also incur tremendous energy cost in terms of power distribution and cooling. An effective approach for saving energy in data centers is to adjust dynamically the data center capacity by turning off unused machines. However, this dynamic capacity provisioning problem is known to be challenging as it requires a careful understanding of the resource demand characteristics as well as considerations to various cost factors, including task scheduling delay, machine reconfiguration cost and electricity price fluctuation.In this paper, we provide a control-theoretic solution to the dynamic capacity provisioning problem that minimizes the total energy cost while meeting the performance objective in terms of task scheduling delay. Specifically, we model this problem as a constrained discrete-time optimal control problem, and use Model Predictive Control (MPC) to find the optimal control policy. Through extensive analysis and simulation using real workload traces from Google's compute clusters, we show that our proposed framework can achieve significant reduction in energy cost, while maintaining an acceptable average scheduling delay for individual tasks.

213 citations

Journal ArticleDOI
TL;DR: This protocol explains how to use MAGeCKFlute to perform quality control (QC), normalization, batch effect removal, copy-number bias correction, gene hit identification and downstream functional enrichment analysis for CRISPR screens.
Abstract: Genome-wide screening using CRISPR coupled with nuclease Cas9 (CRISPR-Cas9) is a powerful technology for the systematic evaluation of gene function. Statistically principled analysis is needed for the accurate identification of gene hits and associated pathways. Here, we describe how to perform computational analysis of CRISPR screens using the MAGeCKFlute pipeline. MAGeCKFlute combines the MAGeCK and MAGeCK-VISPR algorithms and incorporates additional downstream analysis functionalities. MAGeCKFlute is distinguished from other currently available tools by its comprehensive pipeline, which contains a series of functions for analyzing CRISPR screen data. This protocol explains how to use MAGeCKFlute to perform quality control (QC), normalization, batch effect removal, copy-number bias correction, gene hit identification and downstream functional enrichment analysis for CRISPR screens. We also describe gene identification and data analysis in CRISPR screens involving drug treatment. Completing the entire MAGeCKFlute pipeline requires ~3 h on a desktop computer running Linux or Mac OS with R support.

211 citations


Authors

Showing all 39659 results

NameH-indexPapersCitations
Rui Zhang1512625107917
Jian Li133286387131
Chi Lin1251313102710
Wei Xu103149249624
Lei Liu98204151163
Xiang Li97147242301
Chang Liu97109939573
Jian Huang97118940362
Tao Wang97272055280
Wei Liu96153842459
Jian Chen96171852917
Wei Wang95354459660
Peng Li95154845198
Jianhong Wu9372636427
Jianhua Zhang9241528085
Network Information
Related Institutions (5)
Harbin Institute of Technology
109.2K papers, 1.6M citations

94% related

Tsinghua University
200.5K papers, 4.5M citations

91% related

University of Science and Technology of China
101K papers, 2.4M citations

90% related

City University of Hong Kong
60.1K papers, 1.7M citations

89% related

Dalian University of Technology
71.9K papers, 1.1M citations

89% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
20241
202397
2022469
20212,986
20203,468
20193,695