scispace - formally typeset
Search or ask a question

Showing papers by "Rob Kooper published in 2011"


Journal ArticleDOI
TL;DR: A prototype Web-based virtual sensor system at NCSA that creates real-time customized data streams from raw sensor data that can be utilized to provide customized real- time access to significant data resources such as the NEXRAD system is developed.
Abstract: With the advent of new instrumentation and sensors, more diverse types and increasing amounts of data are becoming available to environmental researchers and practitioners. However, accessing and integrating these data into forms usable for environmental analysis and modeling can be highly time-consuming and challenging, particularly in real time. For example, radar-rainfall data are a valuable resource for hydrologic modeling because of their high resolution and pervasive coverage. However, radar-rainfall data from the Next Generation Radar (NEXRAD) system continue to be underutilized outside of the operational environment because of limitations in access and availability of research-quality data products, especially in real time. This paper addresses these issues through the development of a prototype Web-based virtual sensor system at NCSA that creates real-time customized data streams from raw sensor data. These data streams are supported by metadata, including provenance information. The system uses workflow composition and publishing tools to facilitate creation and publication (as Web services) of user-created virtual sensors. To demonstrate the system, two case studies are presented. In the first case study, a network of point-based virtual precipitation sensors is deployed to analyze the relationship between radar-rainfall measurements, and in the second case study, a network of polygon-based virtual precipitation sensors is deployed to be used as input to urban flooding models. These case studies illustrate how, with the addition of some application-specific information, this general-purpose system can be utilized to provide customized real-time access to significant data resources such as the NEXRAD system. Additionally, the creation of new types of virtual sensors is discussed, using the example of virtual temperature sensors.

23 citations


Journal ArticleDOI
TL;DR: It is revealed that digital humanities collaboration requires the creation and deployment of tools for sharing that function to improve collaboration involving large–scale data repository analysis among multiple sites, academic disciplines, and participants through data sharing, software sharing, and knowledge sharing practices.
Abstract: This paper explores infrastructure supporting humanities–computer science research in large–scale image data by asking: Why is collaboration a requirement for work within digital humanities projects? What is required for fruitful interdisciplinary collaboration? What are the technical and intellectual approaches to constructing such an infrastructure? What are the challenges associated with digital humanities collaborative work? We reveal that digital humanities collaboration requires the creation and deployment of tools for sharing that function to improve collaboration involving large–scale data repository analysis among multiple sites, academic disciplines, and participants through data sharing, software sharing, and knowledge sharing practices.

17 citations


Journal ArticleDOI
TL;DR: Tupelo has enabled the recent work creating e‐Science cyberenvironments to serve distributed, active scientific communities, allowing researchers to develop, coordinate and share datasets, documents, and computational models, while preserving process documentation and other contextual information needed for distribution and archiving.
Abstract: The Tupelo semantic content management middleware implements Knowledge Spaces that enable scientists to integrate information into a comprehensive research record as they work with existing tools and domain-specific applications. Knowledge Spaces combine approaches that have demonstrated success in automating parts of this integration activity, including content management systems for domain-neutral management of data, workflow technologies for management of computation and analysis, and semantic web technologies for extensible, portable, citable management of descriptive information and other metadata. Tupelo's ‘Context’ facility and its associated semantic operations both allow existing data representations and tools to be plugged in, and also provide a semantic ‘glue’ of important associative relationships that span the research record, such as provenance, social networks, and annotation. Tupelo has enabled the recent work creating e-Science cyberenvironments to serve distributed, active scientific communities, allowing researchers to develop, coordinate and share datasets, documents, and computational models, while preserving process documentation and other contextual information needed to produce an integrated research record suitable for distribution and archiving. Copyright © 2011 John Wiley & Sons, Ltd.

13 citations


Proceedings ArticleDOI
TL;DR: This paper addresses the problem of stitching Giga Pixel images from airborne images acquired over multiple flight paths of Costa Rica in 2005 by utilizing the coarse georeferencing information for initial image grouping followed by an intensitybased stitching of groups of images.
Abstract: This paper addresses the problem of stitching Giga Pixel images from airborne images acquired over multiple flight paths of Costa Rica in 2005 The set of input images contains about 10,158 images, each of size around 4072x4072 pixels, with very coarse georeferencing information (latitude and longitude of each image) Given the spatial coverage and resolution of the input images, the final stitched color image is 294,847 by 269,195 pixels (793 Giga Pixels) and corresponds to 2382 GigaBytes An assembly of such large images requires either hardware with large shared memory or algorithms using disk access in tandem with available RAM providing data for local image operation In addition to I/O operations, the computations needed to stitch together image tiles involve at least one image transformation and multiple comparisons to place the pixels into a pyramid representation for fast dissemination The motivation of our work is to explore the utilization of multiple hardware architectures (eg, multicore servers, computer clusters) and parallel computing to minimize the time needed to stitch Giga pixel images Our approach is to utilize the coarse georeferencing information for initial image grouping followed by an intensitybased stitching of groups of images This group-based stitching is highly parallelizable The stitching process results in image patches that can be cropped to fit a tile of an image pyramid frequently used as a data structure for fast image access and retrieval We report our experimental results obtained when stitching a four Giga Pixel image from the input images at one fourth of their original spatial resolution using a single core on our eight core server and our preliminary results for the entire 793 Gigapixel image obtained using a 120 core computer cluster

7 citations


Proceedings ArticleDOI
05 Dec 2011
TL;DR: A background process that in conjunction with a central repository of lightweight wrapper scripts allows functionality within heterogeneous software to be called in a simple and consistent manner, and a Restful interface consisting of URL endpoints allowing any programming/scripting language capable of accessing URLs to utilize software functionality as a black box is provided.
Abstract: In this paper we describe a Software Server, a background process that in conjunction with a central repository of lightweight wrapper scripts allows functionality within heterogeneous software to be called in a simple and consistent manner. The key role of the Software Server is to provide a common interface to software functionality in a manner that can be programmed against, in essence re-introducing an API to compiled code. Using the Java rest let framework, we provide a Restful interface consisting of URL endpoints allowing any programming/scripting language capable of accessing URLs to utilize software functionality as a black box. In addition to being widely accessible the Restful interface allows for a secondary role from Software Servers by giving them the ability to turn any traditional desktop software into a cloud based web service. In this paper we describe these Software Servers, the scripts we use to wrap primarily GUI based software, and show how these servers allow software to be called and interconnected into workflows across distributed machines. Finally, quantitative experiments showing the feasibility of the described Software Servers on a number of applications are presented.

7 citations


Journal ArticleDOI
01 Jan 2011
TL;DR: This paper uses a distributed setup for the viewing of digital data within a large number of formats to build a “universal” viewer by converting given files among a large set of file formats to a relatively small subset of formats that are renderable by a given viewer.
Abstract: In this paper we present a distributed setup for the viewing of digital data within a large number of formats. Through the use of software servers 3rd party software is automated and made available as functions that can be called within other programs. Using multiple machines containing software and running software servers a conversion service is built. Being built on top of 3rd party software the conversion service is easily extensible and thus can be made to support a large number of conversions. We use this service to build a “universal” viewer by converting given files among a large set of file formats to a relatively small subset of formats that are renderable by a given viewer (e.g. web browser). We describe this service and the underlying software servers as well as the future directions we are planning on taking.

5 citations


Proceedings ArticleDOI
TL;DR: The goal of the work is understand computational scalability of the web-based dissemination using image pyramids for these large image scans, as well as the preservation aspects of the data.
Abstract: We have investigated the computational scalability of image pyramid building needed for dissemination of very large image data. The sources of large images include high resolution microscopes and telescopes, remote sensing and airborne imaging, and high resolution scanners. The term 'large' is understood from a user perspective which means either larger than a display size or larger than a memory/disk to hold the image data. The application drivers for our work are digitization projects such as the Lincoln Papers project (each image scan is about 100-150MB or about 5000x8000 pixels with the total number to be around 200,000) and the UIUC library scanning project for historical maps from 17th and 18th century (smaller number but larger images). The goal of our work is understand computational scalability of the web-based dissemination using image pyramids for these large image scans, as well as the preservation aspects of the data. We report our computational benchmarks for (a) building image pyramids to be disseminated using the Microsoft Seadragon library, (b) a computation execution approach using hyper-threading to generate image pyramids and to utilize the underlying hardware, and (c) an image pyramid preservation approach using various hard drive configurations of Redundant Array of Independent Disks (RAID) drives for input/output operations. The benchmarks are obtained with a map (334.61 MB, JPEG format, 17591x15014 pixels). The discussion combines the speed and preservation objectives.

4 citations


05 Dec 2011
TL;DR: This work presents a framework for the execution and dissemination of customizable content-based file comparison methods, and provides an implementation of this abstraction as a Java API and a RESTful service API.
Abstract: We present a framework for the execution and dissemination of customizable content-based file comparison methods. Given digital objects such as files, database entries, or in-memory data structures, we are interested in establishing their proximity (i.e. similarity or dissimilarity) based on the information encoded within the files (text, images, 3D, video, audio, etc.). We provide an implementation of this abstraction as a Java API and a RESTful service API. This implementation includes a set of tools to support access and execution of contentbased comparisons both on local and distributed computational resources, and a library of methods focused on images, 3D models, text, and documents comprised of the three. We provide three use cases to demonstrate the use of the framework: (1) content-based retrieval of handwritten text, (2) quantifying information loss and (3) the evaluation of image segmentation accuracy in cell biology.

2 citations


Journal ArticleDOI
TL;DR: The goal is to minimize the manual labor needed to transcribe handwritten entries in the census images and deliver a system capable of computationally scalable search services.
Abstract: Individuals and humanities researchers alike recognize the benefits of search services for censuses, which contain important information on ancestral populations.1 In April 2012, the raw US census data from 1940 will be made available to the public for the first time in digital format. The census is being digitized by the National Archives and Records Administration and the US Census Bureau. Consisting of digitally scanned microfilm rolls, nearly 3.25 million photographs of the original census forms will be released (see Figure 1). The tasks of transcribing, organizing, and searching this very large >18TB corpus of images remains a resource-intensive task for other federal agencies. With databases of this type, a Soundex index, which encodes words based on how they sound to enable homophone matching, are often compiled. However, producing such an algorithm is a tedious and time-consuming process and will not be released with the 1940 data. On the day of the data release, various commercial entities will also begin transcribing the handwritten content of the images, a task that will take thousands of trained laborers anywhere between 6 and 12 months. As a result, access to the searchable, transcribed data will come at a cost to the public by these various companies. Here, we describe our approach to image-based information retrieval to avoid the costly transcription process.2, 3 Our goal is to minimize the manual labor needed to transcribe handwritten entries in the census images and deliver a system capable of computationally scalable search services. Understanding the achievable accuracy and levels of automation depends on solving several problems related to scalability and data management. We endeavor to provide a completely automated search capability that can build more accurate transcriptions over time using passive and active crowd sourcing (see Figure 2). Commercial entities typically outsource manual transcription of census forms and host text-based search services Figure 1. A digitized census form from the 1930 US census.

2 citations


Proceedings ArticleDOI
27 Dec 2011
TL;DR: In this article, a decision support for selecting software and hardware architecture for content-based document comparison is provided. But, the authors do not provide a power efficiency analysis of the resulting architectures.
Abstract: This paper aims to provide decision support for selecting software and hardware architecture for content-based document comparison. We evaluate Java, C, CUDA C and OpenCL implementations of an image characterization algorithm used for content-based document comparison on a CPU and NVIDIA and AMD graphics processing units (GPUs). Based on our experimental results, we conclude that the original Java implementation of the image characterization algorithm running on a CPU-based architecture can be accelerated by a factor of 6 if the Java code is re-implemented in C, or by a factor of almost 16 if the Java code is re-implemented in CUDA C and run on NVIDIA GTX 480 GPU hardware. We also provide a power efficiency analysis.

2 citations


Journal ArticleDOI
TL;DR: This work deployed existing image-pyramid stitching methods onto multicore and parallel architectures to benchmark how performance improves with the addition of computing nodes and explored the benefits of multiple hardware architectures and parallel computing to reduce the time needed to stitch very large images.
Abstract: Gigapixel and terapixel images are commonly viewed using a mosaic of smaller megapixel images. Stitching is used to create Google Maps from satellite imagery, panoramic views from photographic images, and high resolution images from microscopy tiles. Creating and disseminating these mosaics requires significant computational capacity. We deployed existing image-pyramid stitching methods onto multicore and parallel architectures to benchmark how performance improves with the addition of computing nodes. Our motivation is to explore the benefits of multiple hardware architectures (such as multicore servers and computer clusters) and parallel computing to reduce the time needed to stitch very large images. Our sample case is the processing and dissemination of airborne images acquired over multiple flight paths of Costa Rica