scispace - formally typeset
Search or ask a question

Showing papers in "ACM Queue in 2005"


Journal ArticleDOI
Herb Sutter1, James R. Larus1
TL;DR: The introductory article in this issue describes the hardware imperatives behind this shift in computer architecture from uniprocessors to multicore processors, also known as CMPs.
Abstract: Concurrency has long been touted as the "next big thing" and "the way of the future," but for the past 30 years, mainstream software development has been able to ignore it. Our parallel future has finally arrived: new machines will be parallel machines, and this will require major changes in the way we develop software. The introductory article in this issue describes the hardware imperatives behind this shift in computer architecture from uniprocessors to multicore processors, also known as CMPs.

565 citations


Journal ArticleDOI
Luiz Andre Barroso1
TL;DR: The high-computational demands that are inherent in most of Google’s services have led the research group to develop a deep understanding of the overall cost of computing, and continually to look for hardware/software designs that optimize performance per unit of cost.
Abstract: In the late 1990s, our research group at DEC was one of a growing number of teams advocating the CMP (chip multiprocessor) as an alternative to highly complex single-threaded CPUs. We were designing the Piranha system,1 which was a radical point in the CMP design space in that we used very simple cores (similar to the early RISC designs of the late ’80s) to provide a higher level of thread-level parallelism. Our main goal was to achieve the best commercial workload performance for a given silicon budget. Today, in developing Google’s computing infrastructure, our focus is broader than performance alone. The merits of a particular architecture are measured by answering the following question: Are you able to afford the computational capacity you need? The high-computational demands that are inherent in most of Google’s services have led us to develop a deep understanding of the overall cost of computing, and continually to look for hardware/software designs that optimize performance per unit of cost.

264 citations


Journal ArticleDOI
TL;DR: In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.
Abstract: In 2001 the U.S. Department of Labor was tasked with building a Web site that would help people find continuing education opportunities at community colleges, universities, and organizations across the country. The department wanted its Web site to support fielded Boolean searches over locations, dates, times, prerequisites, instructors, topic areas, and course descriptions. Ultimately it was also interested in mining its new database for patterns and educational trends. This was a major data-integration project, aiming to automatically gather detailed, structured information from tens of thousands of individual institutions every three months.

204 citations


Journal ArticleDOI
TL;DR: The performance of microprocessors that power modern computers has continued to increase exponentially over the years for two main reasons: the transistors that are the heart of the circuits in all processors and memory chips have simply become faster over time on a course described by Moore's law, and this directly affects the performance of processors built with those transistors.
Abstract: The performance of microprocessors that power modern computers has continued to increase exponentially over the years for two main reasons. First, the transistors that are the heart of the circuits in all processors and memory chips have simply become faster over time on a course described by Moore’s law, and this directly affects the performance of processors built with those transistors. Moreover, actual processor performance has increased faster than Moore’s law would predict, because processor designers have been able to harness the increasing numbers of transistors available on modern chips to extract more parallelism from software.

173 citations


Journal ArticleDOI
TL;DR: One of the greatest challenges facing people who use large information spaces is to remember and retrieve items that they have previously found and thought to be interesting.
Abstract: One of the greatest challenges facing people who use large information spaces is to remember and retrieve items that they have previously found and thought to be interesting. One approach to this problem is to allow individuals to save particular search strings to re-create the search in the future. Another approach has been to allow people to create personal collections of material. Collections of citations can be created manually by readers or through execution of (and alerting to) a saved search.

153 citations


Journal ArticleDOI
TL;DR: For multiple data systems to cooperate with each other, they must understand each other’s schemas; without such understanding, the multitude of data sources amounts to a digital version of the Tower of Babel.
Abstract: When independent parties develop database schemas for the same domain, they will almost always be quite different from each other. These differences are referred to as semantic heterogeneity, which also appears in the presence of multiple XML documents, Web services, and ontologies—or more broadly, whenever there is more than one way to structure a body of data. The presence of semi-structured data exacerbates semantic heterogeneity, because semi-structured schemas are much more flexible to start with. For multiple data systems to cooperate with each other, they must understand each other’s schemas. Without such understanding, the multitude of data sources amounts to a digital version of the Tower of Babel.

144 citations


Journal ArticleDOI
Dorian Birsan1
TL;DR: In a world of increasingly complex computing requirements, software developers are continually searching for that ultimate, universal architecture that allows us to productively develop high-quality applications.
Abstract: In a world of increasingly complex computing requirements, we as software developers are continually searching for that ultimate, universal architecture that allows us to productively develop high-quality applications. This quest has led to the adoption of many new abstractions and tools. Some of the most promising recent developments are the new pure plug-in architectures. What began as a callback mechanism to extend an application has become the very foundation of applications themselves. Plug-ins are no longer just add-ons to applications; today’s applications are made entirely of plug-ins. This field has matured quite a bit in the past few years, with significant contributions from a number of successful projects.

79 citations


Journal ArticleDOI
TL;DR: There is a similar degree of confusion in the IT industry today, as terms such as service-oriented architecture, grid, utility computing, on-demand, adaptive enterprise, data center automation, and virtualization are bandied about.
Abstract: In a well-known fable, a group of blind men are asked to describe an elephant. Each encounters a different part of the animal and, not surprisingly, provides a different description. We see a similar degree of confusion in the IT industry today, as terms such as service-oriented architecture, grid, utility computing, on-demand, adaptive enterprise, data center automation, and virtualization are bandied about. As when listening to the blind men, it can be difficult to know what reality lies behind the words, whether and how the different pieces fit together, and what we should be doing about the animal(s) that are being described. (Of course, in the case of the blind men, we did not also have marketing departments in the mix!)

66 citations


Journal ArticleDOI
Jim Gray1, Mark Compton
TL;DR: Column stores, which store data column-wise rather than record-wise, have enjoyed a rebirth, mostly to accommodate sparse tables, as well as to optimize bandwidth.
Abstract: We live in a time of extreme change, much of it precipitated by an avalanche of information that otherwise threatens to swallow us whole. Under the mounting onslaught, our traditional relational database constructs—always cumbersome at best—are now clearly at risk of collapsing altogether. In fact, rarely do you find a DBMS anymore that doesn’t make provisions for online analytic processing. Decision trees, Bayes nets, clustering, and time-series analysis have also become part of the standard package, with allowances for additional algorithms yet to come. Also, text, temporal, and spatial data access methods have been added—along with associated probabilistic logic, since a growing number of applications call for approximated results. Column stores, which store data column-wise rather than record-wise, have enjoyed a rebirth, mostly to accommodate sparse tables, as well as to optimize bandwidth.

65 citations


Journal ArticleDOI
TL;DR: Multicore is the new hot topic in the latest round of CPUs from Intel, AMD, Sun, etc.
Abstract: Multicore is the new hot topic in the latest round of CPUs from Intel, AMD, Sun, etc. With clock speed increases becoming more and more difficult to achieve, vendors have turned to multicore CPUs as the best way to gain additional performance. Customers are excited about the promise of more performance through parallel processors for the same real estate investment.

55 citations


Journal ArticleDOI
Dean Jacobs1
TL;DR: A wide range of online applications, including e-mail, human resources, business analytics, CRM (customer relationship management), and ERP (enterprise resource planning), are available.
Abstract: While the practice of outsourcing business functions such as payroll has been around for decades, its realization as online software services has only recently become popular. In the online service model, a provider develops an application and operates the servers that host it. Customers access the application over the Internet using industry-standard browsers or Web services clients. A wide range of online applications, including e-mail, human resources, business analytics, CRM (customer relationship management), and ERP (enterprise resource planning), are available.

Journal ArticleDOI
TL;DR: In this article, the authors show that spam is everywhere, clogging the inboxes of e-mail users worldwide, and it erodes the productivity gains afforded by the advent of information technology.
Abstract: Spam is everywhere, clogging the inboxes of e-mail users worldwide. Not only is it an annoyance, it erodes the productivity gains afforded by the advent of information technology. Workers plowing through hours of legitimate e-mail every day also must contend with removing a significant amount of illegitimate e-mail. Automated spam filters have dramatically reduced the amount of spam seen by the end users who employ them, but the amount of training required rivals the amount of time needed simply to delete the spam without the assistance of a filter.

Journal ArticleDOI
TL;DR: In this essay I take what might seem a paradoxical position: I endorse the techniques that some programmers claim make code self-documenting and encourage the development of programs that do “automatic documentation, but I also contend that these methods cannot provide the documentation necessary for reliable and maintainable code.
Abstract: In this essay I take what might seem a paradoxical position. I endorse the techniques that some programmers claim make code self-documenting and encourage the development of programs that do “automatic documentation.” Yet I also contend that these methods cannot provide the documentation necessary for reliable and maintainable code. They are only a rough aid, and even then help with only one or two aspects of documentation—not including the most important ones.

Journal ArticleDOI
TL;DR: Research confirms that the many and varied strains of UML Fever1 continue to spread worldwide, indiscriminately infecting software analysts, engineers, and managers alike, with a significant increase in both the cost and duration of developing software products.
Abstract: The Institute of Infectious Diseases has recently published research confirming that the many and varied strains of UML Fever1 continue to spread worldwide, indiscriminately infecting software analysts, engineers, and managers alike. One of the fever’s most serious side effects has been observed to be a significant increase in both the cost and duration of developing software products. This increase is largely attributable to a decrease in productivity resulting from fever-stricken individuals investing time and effort in activities that are of little or no value to producing deliverable products. For example, afflictees of Open Loop Fever continue to create UML (Unified Modeling Language) diagrams for unknown stakeholders. Victims of Comfort Zone Fever remain glued in the modeling space, postponing the development of software. And those suffering from Gnat’s Eyebrow Fever continue creating models that glorify each and every Boolean value of prospective software implementations.

Journal ArticleDOI
TL;DR: In that class I learned how to build a schema for my information, and the need for an a priori agreement on both the general structure of the information and the vocabularies used by all communities producing, processing, or consuming this information.
Abstract: In that class I learned how to build a schema for my information, and I learned that to obtain an accurate schema there must be a priori knowledge of the structure and properties of the information to be modeled. I also learned the ER (entity-relationship) model as a basic tool for all further data modeling, as well as the need for an a priori agreement on both the general structure of the information and the vocabularies used by all communities producing, processing, or consuming this information.

Journal ArticleDOI
TL;DR: Structuring concurrent software in a way that meets the increasing scalability requirements while remaining simple, structured, and safe enough to allow mortal programmers to construct ever-more complex systems is a major engineering challenge.
Abstract: Much of today’s software deals with multiple concurrent tasks. Web browsers support multiple concurrent HTTP connections, graphical user interfaces deal with multiple windows and input devices, and Web and DNS servers handle concurrent connections or transactions from large numbers of clients. The number of concurrent tasks that needs to be handled increases while software grows more complex. Structuring concurrent software in a way that meets the increasing scalability requirements while remaining simple, structured, and safe enough to allow mortal programmers to construct ever-more complex systems is a major engineering challenge.

Journal ArticleDOI
Paul Strong1
TL;DR: All of these terms capture some aspect of the big picture—they all describe parts of solutions that seek to address essentially the same problems in similar ways—but they’re never quite synonymous.
Abstract: I have to admit a great measure of sympathy for the IT populace at large, when it is confronted by the barrage of hype around grid technology, particularly within the enterprise. Individual vendors have attempted to plant their flags in the notionally virgin technological territory and proclaim it as their own, using terms such as grid, autonomic, self-healing, self-managing, adaptive, utility, and so forth. Analysts, well, analyze and try to make sense of it all, and in the process each independently creates his or her own map of this terra incognita, naming it policy-based computing, organic computing, and so on. Unfortunately, this serves only to further muddy the waters for most people. All of these terms capture some aspect of the big picture—they all describe parts of solutions that seek to address essentially the same problems in similar ways—but they’re never quite synonymous.

Journal ArticleDOI
Stuart Feldman1
TL;DR: Quality assurance isn’t just testing, or analysis, or wishful thinking; it is a way of life.
Abstract: Quality assurance isn’t just testing, or analysis, or wishful thinking. Although it can be boring, difficult, and tedious, QA is nonetheless essential. Ensuring that a system will work when delivered requires much planning and discipline. Convincing others that the system will function properly requires even more careful and thoughtful effort. QA is performed through all stages of the project, not just slapped on at the end. It is a way of life.

Journal ArticleDOI
TL;DR: An overview of what’s happening on the Internet right now, and what the authors expect to happen in the coming months.
Abstract: Counterpane Internet Security Inc monitors more than 450 networks in 35 countries, in every time zone In 2004 we saw 523 billion network events, and our analysts investigated 648,000 security “tickets” What follows is an overview of what’s happening on the Internet right now, and what we expect to happen in the coming months

Journal ArticleDOI
TL;DR: The National Center for Biotechnology Information is responsible for massive amounts of data, which includes the largest public bibliographic database in biomedicine, the U.S. national DNA sequence database, an online free full text research article database, assembly, annotation, and distribution of a reference set of genes, genomes, and chromosomes, online text search and retrieval systems, and specialized molecular biology data search engines.
Abstract: The National Center for Biotechnology Information is responsible for massive amounts of data. A partial list includes the largest public bibliographic database in biomedicine, the U.S. national DNA sequence database, an online free full text research article database, assembly, annotation, and distribution of a reference set of genes, genomes, and chromosomes, online text search and retrieval systems, and specialized molecular biology data search engines. At this writing, NCBI receives about 50 million Web hits per day, at peak rates of about 1,900 hits per second, and about 400,000 BLAST searches per day from about 2.5 million users. The Web site transfers about 0.6 terabytes per day, and people interested in local copies of bulk data FTP about 1.2 terabytes per day.

Journal ArticleDOI
Keith Stobie1
TL;DR: The increasing size and complexity of software, coupled with concurrency and distributed systems, has made apparent the ineffectiveness of using only handcrafted tests, so good design, static checking, and good unit testing are needed.
Abstract: The increasing size and complexity of software, coupled with concurrency and distributed systems, has made apparent the ineffectiveness of using only handcrafted tests. The misuse of code coverage and avoidance of random testing has exacerbated the problem. We must start again, beginning with good design (including dependency analysis), good static checking (including model property checking), and good unit testing (including good input selection). Code coverage can help select and prioritize tests to make you more efficient, as can the all-pairs technique for controlling the number of configurations. Finally, testers can use models to generate test coverage and good stochastic tests, and to act as test oracles.

Journal ArticleDOI
TL;DR: I would like to start out this article with an odd, yet surprisingly uncontroversial assertion, which is this: programmers are human, and use this as a premise to explore how to improve the programmer’s lot.
Abstract: I would like to start out this article with an odd, yet surprisingly uncontroversial assertion, which is this: programmers are human. I wish to use this as a premise to explore how to improve the programmer’s lot. So, please, no matter your opinion on the subject, grant me this assumption for the sake of argument.

Journal ArticleDOI
TL;DR: The advent of SMP (symmetric multiprocessing) added a new degree of scalability to computer systems by leveraging multiple processors to obtain large gains in total system performance.
Abstract: The advent of SMP (symmetric multiprocessing) added a new degree of scalability to computer systems. Rather than deriving additional performance from an incrementally faster microprocessor, an SMP system leverages multiple processors to obtain large gains in total system performance. Parallelism in software allows multiple jobs to execute concurrently on the system, increasing system throughput accordingly. Given sufficient software parallelism, these systems have proved to scale to several hundred processors.

Journal ArticleDOI
TL;DR: A look at a few places where the world could easily be a better place, but isn’t, and some insight is built as to why.
Abstract: There are plenty of security problems that have solutions. Yet, our security problems don’t seem to be going away. What’s wrong here? Are consumers being offered snake oil and rejecting it? Are they not adopting solutions they should be adopting? Or, is there something else at work, entirely? We’ll look at a few places where the world could easily be a better place, but isn’t, and build some insight as to why.

Journal ArticleDOI
Joseph G. Dadzie1
TL;DR: This document explains how software patching is an increasingly important aspect of today’s computing environment as the volume, complexity, and number of configurations under which a piece of software runs have grown considerably.
Abstract: Software patching is an increasingly important aspect of today’s computing environment as the volume, complexity, and number of configurations under which a piece of software runs have grown considerably. Software architects and developers do everything they can to build secure, bug-free software products. To ensure quality, development teams leverage all the tools and techniques at their disposal. For example, software architects incorporate security threat models into their designs, and QA engineers develop automated test suites that include sophisticated code-defect analysis tools.

Journal ArticleDOI
Kevin Fall1, Steve McCanne
TL;DR: Why is it that an application that works fine in your office can become virtually useless over the WAN?
Abstract: Why does an application that works just fine over a LAN come to a grinding halt across the wide-area network? You may have experienced this firsthand when trying to open a document from a remote file share or remotely logging in over a VPN to an application running in headquarters. Why is it that an application that works fine in your office can become virtually useless over the WAN? If you think it’s simply because there’s not enough bandwidth in the WAN, then you don’t know jack about network performance.

Journal ArticleDOI
TL;DR: In the extreme case, an XML vocabulary can effectively say that there are no rules at all beyond those required of all well-formed XML as mentioned in this paper, and XML storage systems are typically built to handle sparse data gracefully.
Abstract: Vocabulary designers can require XML data to be perfectly regular, or they can allow a little variation, or a lot. In the extreme case, an XML vocabulary can effectively say that there are no rules at all beyond those required of all well-formed XML. Because XML syntax records only what is present, not everything that might be present, sparse data does not make the XML representation awkward; XML storage systems are typically built to handle sparse data gracefully.


Journal ArticleDOI
Fred Kitson1
TL;DR: Some of the current state-of-the-art "magic" and the research challenges of Context-aware services, a new generation of rich, interactive media services for mobile devices, are revealed.
Abstract: Many future mobile applications are predicated on the existence of rich, interactive media services. The promise and challenge of such services is to provide applications under the most hostile conditions - and at low cost to a user community that has high expectations. Context-aware services require information about who, where, when, and what a user is doing and must be delivered in a timely manner with minimum latency. This article reveals some of the current state-of-the-art "magic" and the research challenges.

Journal ArticleDOI
Bill Hoffman1
TL;DR: Internet services are becoming more and more a part of their daily lives and are now beginning to assume their ubiquity as the authors do the phone system and electricity grid.
Abstract: Internet services are becoming more and more a part of our daily lives. We derive value from them, depend on them, and are now beginning to assume their ubiquity as we do the phone system and electricity grid. The implementation of Internet services, though, is an unsolved problem, and Internet services remain far from fulfilling their potential in our world.