scispace - formally typeset
Search or ask a question

Showing papers by "Mitre Corporation published in 2008"


Journal ArticleDOI
TL;DR: Major advances for the BioCreative II gene normalization task include broader participation (20 versus 8 teams) and a pooled system performance comparable to human experts, at over 90% agreement, which show promise as tools to link the literature with biological databases.
Abstract: Background: The goal of the gene normalization task is to link genes or gene products mentioned in the literature to biological databases. This is a key step in an accurate search of the biological literature. It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different organisms). For BioCreative II, the task was to list the Entrez Gene identifiers for human genes or gene products mentioned in PubMed/MEDLINE abstracts. We selected abstracts associated with articles previously curated for human genes. We provided 281 expert-annotated

354 citations


Proceedings ArticleDOI
24 Aug 2008
TL;DR: Experiments on both synthetic data and real-world large scale networks demonstrate the efficacy of the algorithm and suggest its generality in solving problems with complex relationships.
Abstract: A multi-mode network typically consists of multiple heterogeneous social actors among which various types of interactions could occur. Identifying communities in a multi-mode network can help understand the structural properties of the network, address the data shortage and unbalanced problems, and assist tasks like targeted marketing and finding influential actors within or between groups. In general, a network and the membership of groups often evolve gradually. In a dynamic multi-mode network, both actor membership and interactions can evolve, which poses a challenging problem of identifying community evolution. In this work, we try to address this issue by employing the temporal information to analyze a multi-mode network. A spectral framework and its scalability issue are carefully studied. Experiments on both synthetic data and real-world large scale networks demonstrate the efficacy of our algorithm and suggest its generality in solving problems with complex relationships.

293 citations


Journal Article
TL;DR: In this paper, the authors present three studies, conducted with two different learning environments, which present evidence on which student behaviors, motivations, and emotions are associated with the choice to game the system.
Abstract: In recent years there has been increasing interest in the phenomena of \" gaming the system, \" where a learner attempts to succeed in an educational environment by exploiting properties of the system's help and feedback rather than by attempting to learn the material. Developing environments that respond constructively and effectively to gaming depends upon understanding why students choose to game. In this article , we present three studies, conducted with two different learning environments, which present evidence on which student behaviors, motivations, and emotions are associated with the choice to game the system. We also present a fourth study to determine how teachers' perspectives on gaming behavior are similar to, and different from, researchers' perspectives and the data from our studies. We discuss what motivational and attitudinal patterns are associated with gaming behavior across studies, and what the implications are for the design of interactive learning environment.

256 citations


Journal ArticleDOI
TL;DR: This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications.
Abstract: Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet http://zope.bioinfo.cnio.es/bionlp_tools/.

234 citations


Patent
29 Jan 2008
TL;DR: In this article, the authors present methods, systems, and computer program products for insider threat detection by monitoring the network to detect network activity associated with a set of network protocols and processing the detected activity to generate information-use events.
Abstract: Methods, systems, and computer program products for insider threat detection are provided. Embodiments detect insiders who act on documents and/or files to which they have access but whose activity is inappropriate or uncharacteristic of them based on their identity, past activity, and/or organizational context. Embodiments work by monitoring the network to detect network activity associated with a set of network protocols; processing the detected activity to generate information-use events; generating contextual information associated with users of the network; and processing the information-use events based on the generated contextual information to generate alerts and threat scores for users of the network. Embodiments provide several information-misuse detectors that are used to examine generated information-use events in view of collected contextual information to detect volumetric anomalies, suspicious and/or evasive behavior. Embodiments provide a user threat ranking system and a user interface to examine user threat scores and analyze user activity.

232 citations


Proceedings ArticleDOI
07 Apr 2008
TL;DR: The situation in SoS in the context of existing SoS frameworks is described and the current DoD approach to SoS is discussed and challenges the soS environment poses for the systems engineer at both the SoS and system levels are discussed.
Abstract: The US Department of Defense has begun to recognize the need to manage and engineer ensembles of systems to address use capability needs. As DoD systems of systems are being recognized with explicit management, systems engineering and funding support, systems engineers face challenges in applying systems engineering processes to support SoS, particularly in the typical situation when the systems retain their independence. This paper describes the situation in SoS in the context of existing SoS frameworks and discusses the current DoD approach to SoS and challenges the SoS environment poses for the systems engineer at both the SoS and system levels. Finally, the paper will suggest some areas for further investigation to address key issues as systems engineering takes up the challenge of these changes in the interdependent networked environment of the future battle space.

230 citations


Journal ArticleDOI
TL;DR: A common characteristic observed in all three tasks was that the combination of system outputs could yield better results than any single system, including the development of the first text-mining meta-server.
Abstract: Background: Genome sciences have experienced an increasing demand for efficient text-processing tools that can extract biologically relevant information from the growing amount of published literature. In response, a range of text-mining and information-extraction tools have recently been developed specifically for the biological domain. Such tools are only useful if they are designed to meet real-life tasks and if their performance can be estimated and compared. The BioCreative challenge (Critical Assessment of Information Extraction in Biology) consists of a collaborative initiative to provide a common evaluation framework for monitoring and assessing the state-of-the-art of text-mining systems applied to biologically relevant problems.

206 citations


Journal ArticleDOI
01 Dec 2008
TL;DR: Typical user equipment configurations and civil aviation applications of GNSS including navigation, automatic dependent surveillance, terrain awareness warning systems, and timing are detailed.
Abstract: The Global Navigation Satellite System (GNSS) is the worldwide set of satellite navigation constellations, civil aviation augmentations, and user equipment. This paper reviews the current status and future plans of the elements of GNSS as it pertains to civil aviation. The paper addresses the following satellite navigation systems: the U.S. Global Positioning System (GPS), Russian GLONASS, European Galileo, Chinese Compass, Japanese Quasi Zenith Satellite System, and Indian Regional Navigation Satellite System. The paper also describes aviation augmentations including aircraft-based, satellite-based, ground-based, and ground-based regional augmentation systems defined by the International Civil Aviation Organization. Lastly, this paper details typical user equipment configurations and civil aviation applications of GNSS including navigation, automatic dependent surveillance, terrain awareness warning systems, and timing.

164 citations


Book
Paul R. Garvey1
20 Oct 2008
TL;DR: In this article, a Geometric Approach for Ranking Risks is proposed for Risk Management in Engineering Enterprise Systems, which is based on the Bayes' rule and decision analysis approach.
Abstract: Engineering Risk Management Introduction Engineering Risk Management Objectives Overview of Process and Practice New Perspectives on Engineering Systems Elements of Probability Theory Introduction Interpretations and Axioms Conditional Probability and Bayes' Rule Applications to Engineering Risk Management Elements of Decision Analysis Introduction The Value Function Risk and Utility Functions Applications to Engineering Risk Management Analytical Topics in Engineering Risk Management Introduction Risk Identification and Approaches Risk Analysis and Risk Prioritization Risk Management and Progress Monitoring Measuring Technical Performance Risk Risk Management for Engineering Enterprise Systems Appendix A: A Geometric Approach for Ranking Risks Appendix B: Success Factors in Engineering Risk Management Index References appear at the end of each chapter.

113 citations


Journal ArticleDOI
01 Sep 2008
TL;DR: The structure of the Windows registry as it is stored in physical memory is described and a compelling attack that modifies the cached version of the registry without altering the on-disk version is described.
Abstract: This paper describes the structure of the Windows registry as it is stored in physical memory. We present tools and techniques that can be used to extract this data directly from memory dumps. We also provide guidelines to aid investigators and experimentally demonstrate the value of our techniques. Finally, we describe a compelling attack that modifies the cached version of the registry without altering the on-disk version. While this attack would be undetectable with conventional on-disk registry analysis techniques, we demonstrate that such malicious modifications are easily detectable by examining memory.

99 citations


Proceedings Article
01 May 2008
TL;DR: In adapting the extent tagger to new domains, merging the training data from the above corpus with annotated data in the new domain provides the best performance.
Abstract: SpatialML is an annotation scheme for marking up references to places in natural language. It covers both named and nominal references to places, grounding them where possible with geo-coordinates, including both relative and absolute locations, and characterizes relationships among places in terms of a region calculus. A freely available annotation editor has been developed for SpatialML, along with a corpus of annotated documents released by the Linguistic Data Consortium. Inter-annotator agreement on SpatialML is 77.0 F-measure for extents on that corpus. An automatic tagger for SpatialML extents scores 78.5 F-measure. A disambiguator scores 93.0 F-measure and 93.4 Predictive Accuracy. In adapting the extent tagger to new domains, merging the training data from the above corpus with annotated data in the new domain provides the best performance.

Journal ArticleDOI
TL;DR: Services in Digital|Vita combine an existing workflow, maintaining and formatting biographical information, with collaboration-searching functions in a novel way, and developers and researchers may consider one or more of the services described in this paper for implementation in their own expertise locating systems.
Abstract: Background: As biomedical research projects become increasingly interdisciplinary and complex, collaboration with appropriate individuals, teams, and institutions becomes ever more crucial to project success. While social networks are extremely important in determining how scientific collaborations are formed, social networking technologies have not yet been studied as a tool to help form scientific collaborations. Many currently emerging expertise locating systems include social networking technologies, but it is unclear whether they make the process of finding collaborators more efficient and effective. Objective: This study was conducted to answer the following questions: (1) Which requirements should systems for finding collaborators in biomedical science fulfill? and (2) Which information technology services can address these requirements? Methods: The background research phase encompassed a thorough review of the literature, affinity diagramming, contextual inquiry, and semistructured interviews. This phase yielded five themes suggestive of requirements for systems to support the formation of collaborations. In the next phase, the generative phase, we brainstormed and selected design ideas for formal concept validation with end users. Then, three related, well-validated ideas were selected for implementation and evaluation in a prototype. Results: Five main themes of systems requirements emerged: (1) beyond expertise, successful collaborations require compatibility with respect to personality, work style, productivity, and many other factors (compatibility); (2) finding appropriate collaborators requires the ability to effectively search in domains other than your own using information that is comprehensive and descriptive (communication); (3) social networks are important for finding potential collaborators, assessing their suitability and compatibility, and establishing contact with them (intermediation); (4) information profiles must be complete, correct, up-to-date, and comprehensive and allow fine-grained control over access to information by different audiences (information quality and access); (5) keeping online profiles up-to-date should require little or no effort and be integrated into the scientist’s existing workflow (motivation). Based on the requirements, 16 design ideas underwent formal validation with end users. Of those, three were chosen to be implemented and evaluated in a system prototype, “Digital|Vita”: maintaining, formatting, and semi-automated updating of biographical information; searching for experts; and building and maintaining the social network and managing document flow. Conclusions: In addition to quantitative and factual information about potential collaborators, social connectedness, personal and professional compatibility, and power differentials also influence whether collaborations are formed. Current systems only partially model these requirements. Services in Digital|Vita combine an existing workflow, maintaining and formatting biographical information, with collaboration-searching functions in a novel way. Several barriers to the adoption of systems such as Digital|Vita exist, such as potential adoption asymmetries between junior and senior researchers and the tension between public and private information. Developers and researchers may consider one or more of the services described in this paper for implementation in their own expertise locating systems. [J Med Internet Res 2008;10(3):e24]

Journal ArticleDOI
TL;DR: In this paper, a transmission-based surface plasmon resonance (SPR) sensor for label-free detection of protein-carbohydrate and protein-protein binding proximate to a perforated gold surface is demonstrated.
Abstract: A transmission-based surface plasmon resonance (SPR) sensor for label-free detection of protein-carbohydrate and protein-protein binding proximate to a perforated gold surface is demonstrated. An SPR instrument makes real-time measurements of the resonant wavelength and/or the resonant angle of incidence of transmitted light; both are influenced by the presence of proteins at the gold surface-liquid interface. Ethylene glycol solutions with known refractive indexes were used to calibrate the instrument. A paired polarization-sensitive detector achieved an overall detection resolution of ~ 6.6 times 10-5 refractive index units (RIU). Proof of principle experiments was performed with concanavalin A (Con A) binding to gold-adsorbed ovomucoid and anti-bovine serum albumin (BSA) binding to gold-adsorbed BSA.

Journal ArticleDOI
TL;DR: A medical information extraction system that combines a rule-based extraction engine with machine learning algorithms to identify and categorize references to patient smoking in clinical reports and shows overall accuracy in the 90s on all data sets used.

Journal ArticleDOI
TL;DR: This article explored how personality differences affect risk-taking preferences in slot-like games that vary along two dimensions of a risk space, namely the wager amount or utility and the winning chances or probability.

Journal ArticleDOI
TL;DR: The integration of NeuroMorpho.Org with version-1 of the NIF (NIFv1) provides the opportunity to access morphological data in the context of other relevant resources and diverse subdomains of neuroscience, opening exciting new possibilities in data mining and knowledge discovery.
Abstract: Neuronal morphology affects network connectivity, plasticity, and information processing. Uncovering the design principles and functional consequences of dendritic and axonal shape necessitates quantitative analysis and computational modeling of detailed experimental data. Digital reconstructions provide the required neuromorphological descriptions in a parsimonious, comprehensive, and reliable numerical format. NeuroMorpho.Org is the largest web-accessible repository service for digitally reconstructed neurons and one of the integrated resources in the Neuroscience Information Framework (NIF). Here we describe the NeuroMorpho.Org approach as an exemplary experience in designing, creating, populating, and curating a neuroscience digital resource. The simple three-tier architecture of NeuroMorpho.Org (web client, web server, and relational database) encompasses all necessary elements to support a large-scale, integrate-able repository. The data content, while heterogeneous in scientific scope and experimental origin, is unified in format and presentation by an in house standardization protocol. The server application (MRALD) is secure, customizable, and developer-friendly. Centralized processing and expert annotation yields a comprehensive set of metadata that enriches and complements the raw data. The thoroughly tested interface design allows for optimal and effective data search and retrieval. Availability of data in both original and standardized formats ensures compatibility with existing resources and fosters further tool development. Other key functions enable extensive exploration and discovery, including 3D and interactive visualization of branching, frequently measured morphometrics, and reciprocal links to the original PubMed publications. The integration of NeuroMorpho.Org with version-1 of the NIF (NIFv1) provides the opportunity to access morphological data in the context of other relevant resources and diverse subdomains of neuroscience, opening exciting new possibilities in data mining and knowledge discovery. The outcome of such coordination is the rapid and powerful advancement of neuroscience research at both the conceptual and technological level.

Proceedings Article
26 Oct 2008
TL;DR: The activities undertaken by the Uncertainty Reasoning for the World Wide Web Incubator Group, the recommendations produced by the group, and next steps required to carry forward the work begun are described.
Abstract: The Uncertainty Reasoning for the World Wide Web Incubator Group (URW3-XG) was chartered as a means to explore and better define the challenges of reasoning with and representing uncertain information in the context of the World Wide Web. The objectives of the URW3-XG were: (1) To identify and describe situations on the scale of the World Wide Web for which uncertainty reasoning would significantly increase the potential for extracting useful information; and (2) To identify methodologies that can be applied to these situations and the fundamentals of a standardized representation that could serve as the basis for information exchange necessary for these methodologies to be effectively used. This paper describes the activities undertaken by the URW3-XG, the recommendations produced by the group, and next steps required to carry forward the work begun by the group.

Journal ArticleDOI
01 May 2008
TL;DR: Investigation of the confirmation bias in a complex analysis task that is more characteristic of law enforcement investigations, financial analysis, and intelligence analysis showed a confirmation bias for both experience groups, but ACH significantly reduced bias only for participants without intelligence analysis experience.
Abstract: Most research works investigating the confirmation bias has used abstract experimental tasks where participants drew inferences from just a few items of evidence. The experiment reported in this paper investigated the confirmation bias in a complex analysis task that is more characteristic of law enforcement investigations, financial analysis, and intelligence analysis. Participants were professionals, half of whom had intelligence analysis experience. The effectiveness of a procedure designed to mitigate the confirmation bias, called analysis of competing hypotheses (ACH), was tested. Results showed a confirmation bias for both experience groups, but ACH significantly reduced bias only for participants without intelligence analysis experience. Confirmation bias manifested as a weighting bias, not as an interpretation bias. Participants tended to agree on the interpretation of evidence (i.e., whose hypothesis was supported by the evidence) but tended to disagree on the importance of the evidence-giving more weight to the evidence that supported their preferred hypothesis and less weight to evidence that disconfirmed it.

Journal ArticleDOI
TL;DR: The utility of ion mobility spectrometry (IMS) has been steadily growing, and it cuts across diverse areas in physical and biological sciences as discussed by the authors, and the development of ion sources, particularly in the context of IMS, is described.
Abstract: The utility of ion mobility spectrometry (IMS) has been steadily growing, and it cuts across diverse areas in physical and biological sciences. The development of ion sources, particularly in the context of IMS, is described. IMS ion sources operate efficiently in ambient environment and yield ions for a wide range of complex molecules including biological materials. While significant progress has been made in this area through the development of a variety of ion sources, further research to address several key issues, namely, ionization processes, reaction chemistry, and overall system miniaturization for field deployment of IMS, is the primary focus of current activities. Aside from reviewing the present state of the art of ion sources for IMS, this paper has discussed the wide range of applications and current trends of research in the field.

Book ChapterDOI
20 Oct 2008
TL;DR: This work argues that attestation must be able to deliver temporally fresh evidence, and comprehensive information about the target should be accessible, and the underlying attestation mechanism must be trustworthy.
Abstract: Attestation is the activity of making a claim about properties of a target by supplying evidence to an appraiser. We identify five central principles to guide development of attestation systems. We argue that (i) attestation must be able to deliver temporally fresh evidence; (ii) comprehensive information about the target should be accessible; (iii) the target, or its owner, should be able to constrain disclosure of information about the target; (iv) attestation claims should have explicit semantics to allow decisions to be derived from several claims; and (v) the underlying attestation mechanism must be trustworthy. We propose an architecture for attestation guided by these principles, as well as an implementation that adheres to this architecture. Virtualized platforms, which are increasingly well supported on stock hardware, provide a natural basis for our attestation architecture.

Journal ArticleDOI
TL;DR: The early and late components of the event-related potential (ERP) Old-New effect are well characterized with respect to long-term memory, and have been associated with processes of familiarity and recollection, respectively, but the way that these two components respond to variation in recency and stimulus type is explored.
Abstract: The early and late components of the event-related potential (ERP) Old–New effect are well characterized with respect to long-term memory, and have been associated with processes of familiarity and recollection, respectively. Now, using a short-term memory paradigm with verbal and nonverbal stimuli, we explored the way that these two components respond to variation in recency and stimulus type. We found that the amplitude of the early component (or frontal N400, FN400) showed Old–New effects only for verbal stimuli and increased with recency. In contrast, the later component (or late positive component, LPC) showed Old–New effects across a range of stimulus types and did not scale with recency. These results are consistent with the way that these same ERP components have been characterized in long-term memory, supporting the idea that some of the same processes underlie long- and short-term item recognition.

Journal ArticleDOI
TL;DR: This paper is a status update on the Common Weakness Enumeration (CWE) initiative, one of the efforts focused on improving the utility and effectiveness of code-based security assessment technology.
Abstract: This paper is a status update on the Common Weakness Enumeration (CWE) initiative [1], one of the efforts focused on improving the utility and effectiveness of code-based security assessment technology. As hoped, the CWE initiative has helped to dramatically accelerate the use of tool-based assurance arguments in reviewing software systems for security issues and invigorated the investigation of code implementation, design, and architecture issues with automation.

Proceedings ArticleDOI
01 Nov 2008
TL;DR: These ldquomaking security measurablerdquo initiatives provide the foundation for answering today's increased demands for accountability, efficiency and interoperability without artificially constraining an organizationpsilas solution options.
Abstract: The security and integrity of information systems is a critical issue within most types of organizations. Finding better ways to address the topic is the objective of many in industry, academia, and government. One of the more effective approaches gaining popularity in addressing these issues is the use of standard knowledge representations, enumerations, exchange formats and languages, as well as sharing of standard approaches to key compliance and conformance mandates. By standardizing and segregating the interactions amongst their operational, development and sustainment tools and processes organizations gain great freedom in selecting technologies, solutions and vendors. These ldquomaking security measurablerdquo initiatives provide the foundation for answering todaypsilas increased demands for accountability, efficiency and interoperability without artificially constraining an organizationpsilas solution options.

Journal ArticleDOI
TL;DR: Simulation results show that the echo-MIMO method for the case of channel-matched maximal ratio transmission/combining (MRT/MRC) approaches the performance of the ideal beamformers based on perfect channel estimates, especially for the cases of no fewer transmit antennas than receive antennas.
Abstract: A new two-way multiple-input multiple-output (MIMO) channel sounding method for enabling a transmitter and a receiver both to have complete knowledge of the channel between them is presented. The key idea is that when the transmitter sends out its training signal, the receiver repeats, or ldquoechoes,rdquo its received signal back to that transmitter. From this round-trip training signal, together with the usual one-way incoming training signal from the receiver, the transmitter is able to recover its own outgoing channel fading coefficients for all transmit/receive pairs of antenna elements. The two-stage, weighted least squares problem to estimate the incoming and outgoing channel response matrices is solved. The method applies to FDD and TDD channels, and is suitable for wireless networks with relays. A second contribution of this paper is to propose another new training step, called naturally channel-matched beamforming (NCMB). This step actively uses the physical channel to match the transmit and receive beamformers, instead of relying on uncoordinated beamformer estimates at the transmit and receive ends. When we create parallel virtual channels, by spatial precoding with singular modes of the channel matrix, this training step directly matches each pair of right and left singular vectors used to beamform, adjusting the receive vector to optimize the virtual link gain for the pair. Evidence for a ridge in the gain surface over beamformer pairs is given, which would explain the near-ideal gains obtained from the channel-matching step. We also show how to optimize the link gains while constraining these pairs to ensure low interchannel interference (ICI). Simulation results show that the echo-MIMO method for the case of channel-matched maximal ratio transmission/combining (MRT/MRC) approaches the performance of the ideal beamformers based on perfect channel estimates, especially for the case of no fewer transmit antennas than receive antennas. In this case, echo-MIMO outperforms a quantized MRT/MRC codebook feedback method.

Journal ArticleDOI
TL;DR: Thirty technology and research areas were compared for quantity production, and represented areas of emphasis by the PRC in physical, environmental, engineering, and life sciences.

Journal ArticleDOI
TL;DR: Analyzing the structure and dynamics of the origin and destination of core air travel market demand using 1995–2006 US quarterly time-series data finds that super-thin markets have lost services while other markets gained.

Journal ArticleDOI
TL;DR: In this article, a 2-element, superdirective multiple arm folded monopole array was designed to achieve a near 50 Omega input radiation resistance at each element, resulting in a matched input voltage standing wave ratio (VSWR) and a substantial increase in realized or overall efficiency.
Abstract: With proper adjustment of the amplitude and phase of the element excitations, the directivity of an N-element end-fire array of isotropic radiators can approach N2 as the spacing between the elements approaches zero. To achieve this end-fire superdirectivity for a 2-element array, the excitation phase difference between the elements must closely approach 180deg. When closely spaced elements are driven nearly 180deg out-of-phase, the element radiation resistances approach zero, resulting in a high input Voltage Standing Wave Ratio (VSWR) and reduced radiation efficiency. At the same time, there is also a significant decrease in the operating bandwidth. In this letter we present the design of a 2-element, superdirective multiple arm folded monopole array that achieves a near 50 Omega input radiation resistance at each element, resulting in a matched input VSWR, higher radiation efficiency and therefore, a substantial increase in realized or overall efficiency.

Journal ArticleDOI
TL;DR: The goal of the work described here is to provide a light-weight, easy-to-use (small) set of terms ("Habitat-Lite") that captures high-level information about habitat while preserving a mapping to the recently launched Environment Ontology (EnvO).
Abstract: There is an urgent need to capture metadata on the rapidly growing number of genomic, metagenomic and related sequences, such as 16S ribosomal genes. This need is a major focus within the Genomic Standards Consortium (GSC), and Habitat is a key metadata descriptor in the proposed “Minimum Information about a Genome Sequence” (MIGS) specification. The goal of the work described here is to provide a light-weight, easy-to-use (small) set of terms (“Habitat-Lite”) that captures high-level information about habitat while preserving a mapping to the recently launched Environment Ontology (EnvO). Our motivation for building Habitat-Lite is to meet the needs of multiple users, such as annotators curating these data, database providers hosting the data, and biologists and bioinformaticians alike who need to search and employ such data in comparative analyses. Here, we report a case study based on semiautomated identification of terms from GenBank and GOLD. We estimate that the terms in the initial version...

Proceedings ArticleDOI
07 Apr 2008
TL;DR: Several research questions are addressed in this thought piece on the need for research in the engineering of complex systems.
Abstract: Several research questions are addressed in this thought piece on the need for research in the engineering of complex systems. What are the classes of problems for which complexity science and the engineering of complex systems represents the best solution? What are the classes of problems for which this is not the case? What elements of complexity science (and the associated mathematics) can be applied to the engineering of complex systems? What elements of the science are missing and need to be developed? How do we use the science to develop engineering tools and deliver effective and efficient solutions for our clients?

Book ChapterDOI
Peter Mork1, Len Seligman1, Arnon Rosenthal1, Joel G. Korb1, Chris Wolf1 
TL;DR: This paper provides a task model for schema integration by providing a breakdown of the relationships between the source schemata and the target schema, and uses this breakdown to motivate a workbench for schema Integration in which multiple tools share a common knowledge repository.
Abstract: A key aspect of any data integration endeavor is determining the relationships between the source schemata and the target schema. This schema integration task must be tackled regardless of the integration architecture or mapping formalism. In this paper, we provide a task model for schema integration. We use this breakdown to motivate a workbench for schema integration in which multiple tools share a common knowledge repository. In particular, the workbench facilitates the interoperation of research prototypes for schema matching (which automatically identify likely semantic correspondences) with commercial schema mapping tools (which help produce instance-level transformations). Currently, each of these tools provides its own ad hoc representation of schemata and mappings; combining these tools requires aligning these representations. The workbench provides a common representation so that these tools can more rapidly be combined.