'Big data', Hadoop and cloud computing in genomics
Citations
2,141 citations
819 citations
Cites background from "'Big data', Hadoop and cloud comput..."
...The increased popularity of cloud services has drawn the interest of users which span from patients, medical institutions and research institution to big cooperation’s to store their acquired data on cloud repositories [10], [11]....
[...]
705 citations
431 citations
419 citations
Cites background or methods from "'Big data', Hadoop and cloud comput..."
...The functionality of MapReduce has been discussed in detail by [56, 57]....
[...]
...To scale the processing of Big Data, map and reduce functions can be performed on small subsets of large datasets [56, 57]....
[...]
References
20,557 citations
"'Big data', Hadoop and cloud comput..." refers methods in this paper
...Contrail relies on the graphtheoretic framework of de Bruijin graphs [79] CloudBrush A distributed genome assembler based on string graphs [80] RNA sequence analysis Myrna A cloud computing pipeline for calculating differential gene expression in large RNA sequence datasets [48] FX RNA sequence analysis tool for the estimation of gene expression levels and genomic variant calling [34] Eoulsan An integrated and flexible solution for RNA sequence data analysis of differential expression [81] Sequence file management HadoopBAM A novel library for scalable manipulation of aligned next-generation sequencing data [82] SeqWare A tool set used for next generation genome sequencing technologies which includes a LIMS, Pipeline and Query Engine [35] GATK A gene analysis tool-kit for next-generation resequencing data [43] Phylogenetic analysis MrsRF A scalable, efficient multi-core algorithm that uses MapReduce to quickly calculate the all-to-all Robinson Foulds (RF) distance between large numbers of trees [83] Nephele A set of tools, which use the complete composition vector algorithm in order to group sequence clustering into genotypes based on a distance measure [84] GPU bioinformatics software GPU-BLAST An accelerated version of NCBI-BLAST which uses general purpose graphics processing unit (GPU), designed to rapidly manipulate and alter memory to accelerate overall algorithm processing [85] SOAP3 Short sequence read alignment algorithm that uses the multi-processors in a graphic processing unit to achieve ultra-fast alignments [86] Search engine implementation Hydra A protein sequence database search engine specifically designed to run efficiently on the Hadoop MapReduce framework [87] CloudBlast Scalable BLAST in the cloud [88] Miscellaneous BioDoop A set of tools which modules for handling Fasta streams, wrappers for Blast, converting sequences to the different formats and so on [89] BlueSNP An algorithm for computationally intensive analyses, feasible for large genotype–phenotype datasets [90] Quake DNA sequence error detection and correction in sequence reads [91] YunBe A gene set analysis algorithm for biomarker identification in the cloud [92] PeakRanger A multi-purpose peak caller software package for detecting regions from chromatin immunoprecipitation (ChIP) sequence experiments [93] particular has also blossomed in pioneering cloud and big data technologies in the biological research and medical space....
[...]
...One of the first MapReduce projects applied in the biotechnology space resulted in the Genome Analysis Tool Kit (GATK) [43]....
[...]
...GATK A gene analysis tool-kit for next-generation resequencing data [43]...
[...]
9,647 citations
6,077 citations
4,700 citations
"'Big data', Hadoop and cloud comput..." refers methods in this paper
...In the healthcare sector, according to the McKinsey Global Institute, if big data is used effectively, the US healthcare sector could make $300 billion in savings per annum, reducing expenditure by 8% [22]....
[...]
2,071 citations
"'Big data', Hadoop and cloud comput..." refers methods in this paper
...It comes with a user friendly Graphical User Interface (GUI), along with over 100 pre-installed bioinformatics tools including Galaxy [31], BioPerl, BLAST, Bioconductor, Glimmer, GeneSpring, ClustalW and EMBOSS utilities, amongst others....
[...]