scispace - formally typeset
Search or ask a question
Author

Krys J. Kochut

Bio: Krys J. Kochut is an academic researcher from University of Georgia. The author has contributed to research in topics: Workflow & Ontology (information science). The author has an hindex of 29, co-authored 83 publications receiving 3630 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, the authors present a predictive QoS model that makes it possible to compute the quality of service (QoS) for workflows automatically based on atomic task QoS attributes.

807 citations

Posted Content
TL;DR: Several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering are described, which briefly explain text mining in biomedical and health care domains.
Abstract: The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.

422 citations

Journal ArticleDOI
Abstract: In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. Text summarization is the task of shortening a text document into a condensed version keeping all the important information and content of the original document. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.

303 citations

Book ChapterDOI
03 Jun 2007
TL;DR: SPARQLeR is a novel extension of the SPARQL query language which adds the support for semantic path queries and allows easy and natural formulation of queries involving a wide variety of regular path patterns in RDF graphs.
Abstract: Complex relationships, frequently referred to as semantic associa-tions, are the essence of the Semantic Web. Query and retrieval of semantic associations has been an important task in many analytical and scientific activities, such as detecting money laundering and querying for metabolic pathways in biochemistry. We believe that support for semantic path queries should be an integral component of RDF query languages. In this paper, we present SPARQLeR, a novel extension of the SPARQL query language which adds the support for semantic path queries. The proposed extension fits seamlessly within the overall syntax and semantics of SPARQL and allows easy and natural formulation of queries involving a wide variety of regular path patterns in RDF graphs. SPARQLeR's path patterns can capture many low-level details of the queried associations. We also present an implementation of SPARQLeR and its initial performance results. Our implementation is built over BRAHMS, our own RDF storage system.

186 citations

Journal ArticleDOI
TL;DR: In this paper, a defeasible workflow framework is proposed to support exception handling for workflow management using ECA rules to capture more contexts in workflow modeling, and a case-based reasoning mechanism with integrated human involvement is used to improve the exception handling capabilities.
Abstract: In this paper, defeasible workflow is proposed as a framework to support exception handling for workflow management. By using the “justified” ECA rules to capture more contexts in workflow modeling, defeasible workflow uses context dependent reasoning to enhance the exception handling capability of workflow management systems. In particular, this limits possible alternative exception handler candidates in dealing with exceptional situations. Furthermore, a case-based reasoning (CBR) mechanism with integrated human involvement is used to improve the exception handling capabilities. This involves collecting cases to capture experiences in handling exceptions, retrieving similar prior exception handling cases, and reusing the exception handling experiences captured in those cases in new situations.

163 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal Article
TL;DR: In this paper, the coding exons of the family of 518 protein kinases were sequenced in 210 cancers of diverse histological types to explore the nature of the information that will be derived from cancer genome sequencing.
Abstract: AACR Centennial Conference: Translational Cancer Medicine-- Nov 4-8, 2007; Singapore PL02-05 All cancers are due to abnormalities in DNA. The availability of the human genome sequence has led to the proposal that resequencing of cancer genomes will reveal the full complement of somatic mutations and hence all the cancer genes. To explore the nature of the information that will be derived from cancer genome sequencing we have sequenced the coding exons of the family of 518 protein kinases, ~1.3Mb DNA per cancer sample, in 210 cancers of diverse histological types. Despite the screen being directed toward the coding regions of a gene family that has previously been strongly implicated in oncogenesis, the results indicate that the majority of somatic mutations detected are “passengers”. There is considerable variation in the number and pattern of these mutations between individual cancers, indicating substantial diversity of processes of molecular evolution between cancers. The imprints of exogenous mutagenic exposures, mutagenic treatment regimes and DNA repair defects can all be seen in the distinctive mutational signatures of individual cancers. This systematic mutation screen and others have previously yielded a number of cancer genes that are frequently mutated in one or more cancer types and which are now anticancer drug targets (for example BRAF , PIK3CA , and EGFR ). However, detailed analyses of the data from our screen additionally suggest that there exist a large number of additional “driver” mutations which are distributed across a substantial number of genes. It therefore appears that cells may be able to utilise mutations in a large repertoire of potential cancer genes to acquire the neoplastic phenotype. However, many of these genes are employed only infrequently. These findings may have implications for future anticancer drug development.

2,737 citations

Journal Article
TL;DR: FastTree as mentioned in this paper uses sequence profiles of internal nodes in the tree to implement neighbor-joining and uses heuristics to quickly identify candidate joins, then uses nearest-neighbor interchanges to reduce the length of the tree.
Abstract: Gene families are growing rapidly, but standard methods for inferring phylogenies do not scale to alignments with over 10,000 sequences. We present FastTree, a method for constructing large phylogenies and for estimating their reliability. Instead of storing a distance matrix, FastTree stores sequence profiles of internal nodes in the tree. FastTree uses these profiles to implement neighbor-joining and uses heuristics to quickly identify candidate joins. FastTree then uses nearest-neighbor interchanges to reduce the length of the tree. For an alignment with N sequences, L sites, and a different characters, a distance matrix requires O(N^2) space and O(N^2 L) time, but FastTree requires just O( NLa + N sqrt(N) ) memory and O( N sqrt(N) log(N) L a ) time. To estimate the tree's reliability, FastTree uses local bootstrapping, which gives another 100-fold speedup over a distance matrix. For example, FastTree computed a tree and support values for 158,022 distinct 16S ribosomal RNAs in 17 hours and 2.4 gigabytes of memory. Just computing pairwise Jukes-Cantor distances and storing them, without inferring a tree or bootstrapping, would require 17 hours and 50 gigabytes of memory. In simulations, FastTree was slightly more accurate than neighbor joining, BIONJ, or FastME; on genuine alignments, FastTree's topologies had higher likelihoods. FastTree is available at http://microbesonline.org/fasttree.

2,436 citations

Journal ArticleDOI
TL;DR: The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.
Abstract: Graph database models can be defined as those in which data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models took off in the eighties and early nineties alongside object-oriented models. Their influence gradually died out with the emergence of other database models, in particular geographical, spatial, semistructured, and XML. Recently, the need to manage information with graph-like nature has reestablished the relevance of this area. The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.

1,669 citations

Journal ArticleDOI
22 Dec 2005-Nature
TL;DR: Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.
Abstract: The genome of Aspergillus oryzae, a fungus important for the production of traditional fermented foods and beverages in Japan, has been sequenced. The ability to secrete large amounts of proteins and the development of a transformation system have facilitated the use of A. oryzae in modern biotechnology. Although both A. oryzae and Aspergillus flavus belong to the section Flavi of the subgenus Circumdati of Aspergillus, A. oryzae, unlike A. flavus, does not produce aflatoxin, and its long history of use in the food industry has proved its safety. Here we show that the 37-megabase (Mb) genome of A. oryzae contains 12,074 genes and is expanded by 7-9 Mb in comparison with the genomes of Aspergillus nidulans and Aspergillus fumigatus. Comparison of the three aspergilli species revealed the presence of syntenic blocks and A. oryzae-specific blocks (lacking synteny with A. nidulans and A. fumigatus) in a mosaic manner throughout the genome of A. oryzae. The blocks of A. oryzae-specific sequence are enriched for genes involved in metabolism, particularly those for the synthesis of secondary metabolites. Specific expansion of genes for secretory hydrolytic enzymes, amino acid metabolism and amino acid/sugar uptake transporters supports the idea that A. oryzae is an ideal microorganism for fermentation.

1,149 citations