scispace - formally typeset
Search or ask a question

Showing papers by "Joshua D. Welch published in 2009"


Journal ArticleDOI
TL;DR: The word-based signatures are shown to be effective by finding occurrences of known regulatory sites in promoter sequences for genes in DNA repair pathways of humans, and elucidate putative regulatory aspects of DNA Repair pathways, which are notably under-characterized.
Abstract: DNA repair genes provide an important contribution towards the surveillance and repair of DNA damage. These genes produce a large network of interacting proteins whose mRNA expression is likely to be regulated by similar regulatory factors. Full characterization of promoters of DNA repair genes and the similarities among them will more fully elucidate the regulatory networks that activate or inhibit their expression. To address this goal, the authors introduce a technique to find regulatory genomic signatures, which represents a specific application of the genomic signature methodology to classify DNA sequences as putative functional elements within a single organism. The effectiveness of the regulatory genomic signatures is demonstrated via analysis of promoter sequences for genes in DNA repair pathways of humans. The promoters are divided into two classes, the bidirectional promoters and the unidirectional promoters, and distinct genomic signatures are calculated for each class. The genomic signatures include statistically overrepresented words, word clusters, and co-occurring words. The robustness of this method is confirmed by the ability to identify sequences that exist as motifs in TRANSFAC and JASPAR databases, and in overlap with verified binding sites in this set of promoter regions. The word-based signatures are shown to be effective by finding occurrences of known regulatory sites. Moreover, the signatures of the bidirectional and unidirectional promoters of human DNA repair pathways are clearly distinct, exhibiting virtually no overlap. In addition to providing an effective characterization method for related DNA sequences, the signatures elucidate putative regulatory aspects of DNA repair pathways, which are notably under-characterized.

50 citations


Journal ArticleDOI
TL;DR: These studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome, where each segment contains a unique word-based signature.
Abstract: Genome sequences can be conceptualized as arrangements of motifs or words. The frequencies and positional distributions of these words within particular non-coding genomic segments provide important insights into how the words function in processes such as mRNA stability and regulation of gene expression. Using an enumerative word discovery approach, we investigated the frequencies and positional distributions of all 65,536 different 8-letter words in the genome of Arabidopsis thaliana. Focusing on promoter regions, introns, and 3' and 5' untranslated regions (3'UTRs and 5'UTRs), we compared word frequencies in these segments to genome-wide frequencies. The statistically interesting words in each segment were clustered with similar words to generate motif logos. We investigated whether words were clustered at particular locations or were distributed randomly within each genomic segment, and we classified the words using gene expression information from public repositories. Finally, we investigated whether particular sets of words appeared together more frequently than others. Our studies provide a detailed view of the word composition of several segments of the non-coding portion of the Arabidopsis genome. Each segment contains a unique word-based signature. The respective signatures consist of the sets of enriched words, 'unwords', and word pairs within a segment, as well as the preferential locations and functional classifications for the signature words. Additionally, the positional distributions of enriched words within the segments highlight possible functional elements, and the co-associations of words in promoter regions likely represent the formation of higher order regulatory modules. This work is an important step toward fully cataloguing the functional elements of the Arabidopsis genome.

33 citations


Proceedings ArticleDOI
15 Jun 2009
TL;DR: Novel bioinformatics strategies for exploring the word landscapes of putative regulatory regions of genomes of genomes are presented and incorporated into the WordSeeker software tool.
Abstract: Encyclopedias of regulatory genomic elements provide a foundation for research in areas such as disease diagnosis, disease treatment, and crop enhancement. The construction of complete encyclopedias of organism-specific genomic elements involved in gene regulation remains a significant challenge. To address this problem, the authors present novel bioinformatics strategies for exploring the word landscapes of putative regulatory regions of genomes. The methods are incorporated into the WordSeeker software tool, which is available at http://word-seeker.org. The effectiveness of these strategies is demonstrated through several case studies.