scispace - formally typeset
Search or ask a question
Author

Ravi Madduri

Bio: Ravi Madduri is an academic researcher from Argonne National Laboratory. The author has contributed to research in topics: Workflow & Cloud computing. The author has an hindex of 22, co-authored 69 publications receiving 1505 citations. Previous affiliations of Ravi Madduri include University of Chicago & University of Illinois at Chicago.


Papers
More filters
Journal ArticleDOI
TL;DR: While caGrid 1.0 is designed to address use cases in cancer research, the requirements associated with discovery, analysis and integration of large scale data, and coordinated studies are common in other biomedical fields.

119 citations

Journal ArticleDOI
05 Aug 2016-PLOS ONE
TL;DR: Model-free Big Data machine learning-based classification methods can outperform model-based techniques in terms of predictive precision and reliability, and it is observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data.
Abstract: Background A unique archive of Big Data on Parkinson’s Disease is collected, managed and disseminated by the Parkinson’s Progression Markers Initiative (PPMI). The integration of such complex and heterogeneous Big Data from multiple sources offers unparalleled opportunities to study the early stages of prevalent neurodegenerative processes, track their progression and quickly identify the efficacies of alternative treatments. Many previous human and animal studies have examined the relationship of Parkinson’s disease (PD) risk to trauma, genetics, environment, co-morbidities, or life style. The defining characteristics of Big Data–large size, incongruency, incompleteness, complexity, multiplicity of scales, and heterogeneity of information-generating sources–all pose challenges to the classical techniques for data management, processing, visualization and interpretation. We propose, implement, test and validate complementary model-based and model-free approaches for PD classification and prediction. To explore PD risk using Big Data methodology, we jointly processed complex PPMI imaging, genetics, clinical and demographic data. Methods and Findings Collective representation of the multi-source data facilitates the aggregation and harmonization of complex data elements. This enables joint modeling of the complete data, leading to the development of Big Data analytics, predictive synthesis, and statistical validation. Using heterogeneous PPMI data, we developed a comprehensive protocol for end-to-end data characterization, manipulation, processing, cleaning, analysis and validation. Specifically, we (i) introduce methods for rebalancing imbalanced cohorts, (ii) utilize a wide spectrum of classification methods to generate consistent and powerful phenotypic predictions, and (iii) generate reproducible machine-learning based classification that enables the reporting of model parameters and diagnostic forecasting based on new data. We evaluated several complementary model-based predictive approaches, which failed to generate accurate and reliable diagnostic predictions. However, the results of several machine-learning based classification methods indicated significant power to predict Parkinson’s disease in the PPMI subjects (consistent accuracy, sensitivity, and specificity exceeding 96%, confirmed using statistical n-fold cross-validation). Clinical (e.g., Unified Parkinson's Disease Rating Scale (UPDRS) scores), demographic (e.g., age), genetics (e.g., rs34637584, chr12), and derived neuroimaging biomarker (e.g., cerebellum shape index) data all contributed to the predictive analytics and diagnostic forecasting. Conclusions Model-free Big Data machine learning-based classification methods (e.g., adaptive boosting, support vector machines) can outperform model-based techniques in terms of predictive precision and reliability (e.g., forecasting patient diagnosis). We observed that statistical rebalancing of cohort sizes yields better discrimination of group differences, specifically for predictive analytics based on heterogeneous and incomplete PPMI data. UPDRS scores play a critical role in predicting diagnosis, which is expected based on the clinical definition of Parkinson’s disease. Even without longitudinal UPDRS data, however, the accuracy of model-free machine learning based classification is over 80%. The methods, software and protocols developed here are openly shared and can be employed to study other neurodegenerative disorders (e.g., Alzheimer’s, Huntington’s, amyotrophic lateral sclerosis), as well as for other predictive Big Data analytics applications.

109 citations

Proceedings ArticleDOI
04 Jul 2011
TL;DR: This paper proposes a novel approach of proactively recommending services in a workflow composition process, based on service usage history, and builds a People-Service-Workflow network that models existing scientific artifacts, services and workflows, and their past usage relationships into a social network.
Abstract: Services computing technology enables scientists to expose data and computational resources wrapped as publicly accessible Web services. However, our study indicates that scientific services are currently poorly reused in an ad hoc style. This project aims to help domain scientists find interested services and reuse successful processes to attain their research purposes in the form of workflows. In contrast to existing interface-based services discovery approaches, this paper proposes a novel approach of proactively recommending services in a workflow composition process, based on service usage history. The underpinning is a People-Service-Workflow (PSW) network that models existing scientific artifacts, services and workflows, and their past usage relationships into a social network. Various social network analysis techniques are applied to discover hidden knowledge accrued. A prototyping search engine has been developed as a proof of concept, and is seamlessly integrated as a plug-in into the Tavern a workbench, a widely used scientific workflow management tool.

93 citations

Journal ArticleDOI
TL;DR: In support of this work, the authors developed new technological capabilities that make it easy for researchers to manage, aggregate, manipulate, integrate, and model large amounts of distributed data.

77 citations

Journal ArticleDOI
TL;DR: A Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner is proposed.

71 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The Pathway Interaction Database (PID), a freely available collection of curated and peer-reviewed pathways composed of human molecular signaling and regulatory events and key cellular processes, serves as a research tool for the cancer research community and others interested in cellular pathways.
Abstract: The Pathway Interaction Database (PID, http://pid.nci.nih.gov) is a freely available collection of curated and peer-reviewed pathways composed of human molecular signaling and regulatory events and key cellular processes. Created in a collaboration between the US National Cancer Institute and Nature Publishing Group, the database serves as a research tool for the cancer research community and others interested in cellular pathways, such as neuroscientists, developmental biologists and immunologists. PID offers a range of search features to facilitate pathway exploration. Users can browse the predefined set of pathways or create interaction network maps centered on a single molecule or cellular process of interest. In addition, the batch query tool allows users to upload long list(s) of molecules, such as those derived from microarray experiments, and either overlay these molecules onto predefined pathways or visualize the complete molecular connectivity map. Users can also download molecule lists, citation lists and complete database content in extensible markup language (XML) and Biological Pathways Exchange (BioPAX) Level 2 format. The database is updated with new pathway content every month and supplemented by specially commissioned articles on the practical uses of other relevant online tools.

1,441 citations

Proceedings Article
01 Jan 2003

1,212 citations

Journal ArticleDOI
TL;DR: Applied Regression Analysis Bibliography Update 2000–2001,” Communications in Statistics: Theory and Methods, 2051– 2075.
Abstract: Christensen, R. (2002), Plane Answers to Complex Questions: The Theory of Linear Models (3rd ed.), New York: Springer-Verlag. Crocker, D. C. (1980), Review of Linear Regression Analysis, by G. A. F. Seber, Technometrics, 22, 130. Datta, B. N. (1995), Numerical Linear Algebra and Applications, PaciŽ c Grove, CA: Brooks/Cole. Draper, N. R. (2002), “Applied Regression Analysis Bibliography Update 2000–2001,” Communications in Statistics: Theory and Methods, 2051– 2075. Golub, G. H., and Van Loan, C. F. (1996), Matrix Computations (3rd ed.), Baltimore, MD: Johns Hopkins University Press. Graybill, F. A. (2000), Theory and Application of the Linear Model, PaciŽ c Grove, CA: Brooks/Cole. Hocking, R. R. (2003), Methods and Applications of Linear Models: Regression and the Analysis of Variance (2nd ed.), New York: Wiley. Porat, B. (1993), Digital Processing of Random Signals, Englewood Cliffs, NJ: Prentice-Hall. Ravishanker, N., and Dey, D. K. (2002), A First Course in Linear Model Theory, Boca Raton, FL: Chapman and Hall/CRC. White, H. (1984), Asymptotic Theory for Econometricians, Orlando, FL: Academic Press.

862 citations

Journal ArticleDOI
TL;DR: An update to the taverna tool suite is provided, highlighting new features and developments in the workbench and the Taverna Server.
Abstract: The Taverna workflow tool suite (http://www.taverna.org.uk) is designed to combine distributed Web Services and/or local tools into complex analysis pipelines. These pipelines can be executed on local desktop machines or through larger infrastructure (such as supercomputers, Grids or cloud environments), using the Taverna Server. In bioinformatics, Taverna workflows are typically used in the areas of high-throughput omics analyses (for example, proteomics or transcriptomics), or for evidence gathering methods involving text mining or data mining. Through Taverna, scientists have access to several thousand different tools and resources that are freely available from a large range of life science institutions. Once constructed, the workflows are reusable, executable bioinformatics protocols that can be shared, reused and repurposed. A repository of public workflows is available at http://www.myexperiment.org. This article provides an update to the Taverna tool suite, highlighting new features and developments in the workbench and the Taverna Server.

724 citations