scispace - formally typeset
Search or ask a question
Author

Xiaodong Zhang

Bio: Xiaodong Zhang is an academic researcher from University of Texas MD Anderson Cancer Center. The author has contributed to research in topics: Cache & Proton therapy. The author has an hindex of 66, co-authored 390 publications receiving 16459 citations. Previous affiliations of Xiaodong Zhang include University of Texas at San Antonio & College of William & Mary.


Papers
More filters
Proceedings ArticleDOI
01 Jun 2002
TL;DR: LIRS effectively addresses the limits of LRU by using recency to evaluate Inter-Reference Recency (IRR) for making a replacement decision, and significantly outperforms LRU, and outperforms other existing replacement algorithms in most cases.
Abstract: Although LRU replacement policy has been commonly used in the buffer cache management, it is well known for its inability to cope with access patterns with weak locality. Previous work, such as LRU-K and 2Q, attempts to enhance LRU capacity by making use of additional history information of previous block references other than only the recency information used in LRU. These algorithms greatly increase complexity and/or can not consistently provide performance improvement. Many recently proposed policies, such as UBM and SEQ, improve replacement performance by exploiting access regularities in references. They only address LRU problems on certain specific and well-defined cases such as access patterns like sequences and loops. Motivated by the limits of previous studies, we propose an efficient buffer cache replacement policy, called Low Inter-reference Recency Set (LIRS). LIRS effectively addresses the limits of LRU by using recency to evaluate Inter-Reference Recency (IRR) for making a replacement decision. This is in contrast to what LRU does: directly using recency to predict next reference timing. At the same time, LIRS almost retains the same simple assumption of LRU to predict future access behavior of blocks. Our objectives are to effectively address the limits of LRU for a general purpose, to retain the low overhead merit of LRU, and to outperform those replacement policies relying on the access regularity detections. Conducting simulations with a variety of traces and a wide range of cache sizes, we show that LIRS significantly outperforms LRU, and outperforms other existing replacement algorithms in most cases. Furthermore, we show that the additional cost for implementing LIRS is trivial in comparison with LRU.

575 citations

Journal ArticleDOI
01 Aug 2013
TL;DR: Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop and integrated into Hive to support declarative spatial queries with an integrated architecture is presented.
Abstract: Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.

571 citations

Proceedings ArticleDOI
15 Jun 2009
TL;DR: This study reveals several unanticipated aspects in the performance dynamics of SSD technology that must be addressed by system designers and data-intensive application users in order to effectively place it in the storage hierarchy.
Abstract: Flash Memory based Solid State Drive (SSD) has been called a "pivotal technology" that could revolutionize data storage systems. Since SSD shares a common interface with the traditional hard disk drive (HDD), both physically and logically, an effective integration of SSD into the storage hierarchy is very important. However, details of SSD hardware implementations tend to be hidden behind such narrow interfaces. In fact, since sophisticated algorithms are usually, of necessity, adopted in SSD controller firmware, more complex performance dynamics are to be expected in SSD than in HDD systems. Most existing literature or product specifications on SSD just provide high-level descriptions and standard performance data, such as bandwidth and latency.In order to gain insight into the unique performance characteristics of SSD, we have conducted intensive experiments and measurements on different types of state-of-the-art SSDs, from low-end to high-end products. We have observed several unexpected performance issues and uncertain behavior of SSDs, which have not been reported in the literature. For example, we found that fragmentation could seriously impact performance -- by a factor of over 14 times on a recently announced SSD. Moreover, contrary to the common belief that accesses to SSD are uncorrelated with access patterns, we found a strong correlation between performance and the randomness of data accesses, for both reads and writes. In the worst case, average latency could increase by a factor of 89 and bandwidth could drop to only 0.025MB/sec. Our study reveals several unanticipated aspects in the performance dynamics of SSD technology that must be addressed by system designers and data-intensive application users in order to effectively place it in the storage hierarchy.

529 citations

Proceedings ArticleDOI
19 Oct 2005
TL;DR: An analysis of representative Bit-Torrent traffic provides several new findings regarding the limitations of BitTorrent systems: due to the exponentially decreasing peer arrival rate in reality, service availability in such systems becomes poor quickly, after which it is difficult for the file to be located and downloaded.
Abstract: Existing studies on BitTorrent systems are single-torrent based, while more than 85% of all peers participate in multiple torrents according to our trace analysis. In addition, these studies are not sufficiently insightful and accurate even for single-torrent models, due to some unrealistic assumptions. Our analysis of representative Bit-Torrent traffic provides several new findings regarding the limitations of BitTorrent systems: (1) Due to the exponentially decreasing peer arrival rate in reality, service availability in such systems becomes poor quickly, after which it is difficult for the file to be located and downloaded. (2) Client performance in the BitTorrent-like systems is unstable, and fluctuates widely with the peer population. (3) Existing systems could provide unfair services to peers, where peers with high downloading speed tend to download more and upload less. In this paper, we study these limitations on torrent evolution in realistic environments. Motivated by the analysis and modeling results, we further build a graph based multi-torrent model to study inter-torrent collaboration. Our model quantitatively provides strong motivation for inter-torrent collaboration instead of directly stimulating seeds to stay longer. We also discuss a system design to show the feasibility of multi-torrent collaboration.

432 citations

Proceedings ArticleDOI
24 Oct 2008
TL;DR: This paper has comprehensively evaluated several representative cache partitioning schemes with different optimization objectives, including performance, fairness, and quality of service (QoS) and provides new insights into dynamic behaviors and interaction effects.
Abstract: Cache partitioning and sharing is critical to the effective utilization of multicore processors. However, almost all existing studies have been evaluated by simulation that often has several limitations, such as excessive simulation time, absence of OS activities and proneness to simulation inaccuracy. To address these issues, we have taken an efficient software approach to supporting both static and dynamic cache partitioning in OS through memory address mapping. We have comprehensively evaluated several representative cache partitioning schemes with different optimization objectives, including performance, fairness, and quality of service (QoS). Our software approach makes it possible to run the SPEC CPU2006 benchmark suite to completion. Besides confirming important conclusions from previous work, we are able to gain several insights from whole-program executions, which are infeasible from simulation. For example, giving up some cache space in one program to help another one may improve the performance of both programs for certain workloads due to reduced contention for memory bandwidth. Our evaluation of previously proposed fairness metrics is also significantly different from a simulation-based study. The contributions of this study are threefold. (1) To the best of our knowledge, this is a highly comprehensive execution- and measurement-based study on multicore cache partitioning. This paper not only confirms important conclusions from simulation-based studies, but also provides new insights into dynamic behaviors and interaction effects. (2) Our approach provides a unique and efficient option for evaluating multicore cache partitioning. The implemented software layer can be used as a tool in multicore performance evaluation and hardware design. (3) The proposed schemes can be further refined for OS kernels to improve performance.

382 citations


Cited by
More filters
01 Jan 2014
TL;DR: Lymphedema is a common complication after treatment for breast cancer and factors associated with increased risk of lymphedEMA include extent of axillary surgery, axillary radiation, infection, and patient obesity.

1,988 citations

Journal ArticleDOI
01 Apr 2011-Science
TL;DR: An inventory of the world’s technological capacity from 1986 to 2007 reveals the evolution from analog to digital technologies, and the majority of the authors' technological memory has been in digital format since the early 2000s.
Abstract: We estimated the world’s technological capacity to store, communicate, and compute information, tracking 60 analog and digital technologies during the period from 1986 to 2007. In 2007, humankind was able to store 2.9 × 10 20 optimally compressed bytes, communicate almost 2 × 10 21 bytes, and carry out 6.4 × 10 18 instructions per second on general-purpose computers. General-purpose computing capacity grew at an annual rate of 58%. The world’s capacity for bidirectional telecommunication grew at 28% per year, closely followed by the increase in globally stored information (23%). Humankind’s capacity for unidirectional information diffusion through broadcasting channels has experienced comparatively modest annual growth (6%). Telecommunication has been dominated by digital technologies since 1990 (99.9% in digital format in 2007), and the majority of our technological memory has been in digital format since the early 2000s (94% digital in 2007).

1,450 citations

Journal ArticleDOI
TL;DR: This text is a general introduction to radiation biology and a complete, self-contained course especially for residents in diagnostic radiology and nuclear medicine that follows the Syllabus in Radiation Biology of the RSNA.
Abstract: The text consists of two sections, one for those studying or practicing diagnostic radiology, nuclear medicine and radiation oncology; the other for those engaged in the study or clinical practice of radiation oncology--a new chapter, on radiologic terrorism, is specifically for those in the radiation sciences who would manage exposed individuals in the event of a terrorist event. The 17 chapters in Section I represent a general introduction to radiation biology and a complete, self-contained course especially for residents in diagnostic radiology and nuclear medicine that follows the Syllabus in Radiation Biology of the RSNA. The 11 chapters in Section II address more in-depth topics in radiation oncology, such as cancer biology, retreatment after radiotherapy, chemotherapeutic agents and hyperthermia.

1,359 citations

Book
01 Jan 1996

1,170 citations

Book
05 Mar 2012
TL;DR: Computer Networking: A Top-Down Approach Featuring the Internet explains the engineering problems that are inherent in communicating digital information from point to point, and presents the mathematics that determine the best path, show some code that implements those algorithms, and illustrate the logic by using excellent conceptual diagrams.
Abstract: Certain data-communication protocols hog the spotlight, but all of them have a lot in common. Computer Networking: A Top-Down Approach Featuring the Internet explains the engineering problems that are inherent in communicating digital information from point to point. The top-down approach mentioned in the subtitle means that the book starts at the top of the protocol stack--at the application layer--and works its way down through the other layers, until it reaches bare wire. The authors, for the most part, shun the well-known seven-layer Open Systems Interconnection (OSI) protocol stack in favor of their own five-layer (application, transport, network, link, and physical) model. It's an effective approach that helps clear away some of the hand waving traditionally associated with the more obtuse layers in the OSI model. The approach is definitely theoretical--don't look here for instructions on configuring Windows 2000 or a Cisco router--but it's relevant to reality, and should help anyone who needs to understand networking as a programmer, system architect, or even administration guru.The treatment of the network layer, at which routing takes place, is typical of the overall style. In discussing routing, authors James Kurose and Keith Ross explain (by way of lots of clear, definition-packed text) what routing protocols need to do: find the best route to a destination. Then they present the mathematics that determine the best path, show some code that implements those algorithms, and illustrate the logic by using excellent conceptual diagrams. Real-life implementations of the algorithms--including Internet Protocol (both IPv4 and IPv6) and several popular IP routing protocols--help you to make the transition from pure theory to networking technologies. --David WallTopics covered: The theory behind data networks, with thorough discussion of the problems that are posed at each level (the application layer gets plenty of attention). For each layer, there's academic coverage of networking problems and solutions, followed by discussion of real technologies. Special sections deal with network security and transmission of digital multimedia.

1,079 citations