scispace - formally typeset
Search or ask a question
Topic

Column (database)

About: Column (database) is a research topic. Over the lifetime, 12416 publications have been published within this topic receiving 121299 citations. The topic is also known as: attribute.


Papers
More filters
Proceedings Article
30 Aug 2005
TL;DR: C-Store as mentioned in this paper is a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimised, and it uses bitmap indexes to complement B-tree structures.
Abstract: This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.We present preliminary performance data on a subset of TPC-H and show that the system we are building, C-Store, is substantially faster than popular commercial products. Hence, the architecture looks very encouraging.

970 citations

Patent
22 Jul 2003
TL;DR: In this paper, a data validation, mirroring and error / erasure correction method for the dispersal and protection of one and two-dimensional data at the micro level for computer, communication and storage systems is presented.
Abstract: The invention discloses a data validation, mirroring and error / erasure correction method for the dispersal and protection of one and two-dimensional data at the micro level for computer, communication and storage systems. Each of 256 possible 8-bit data bytes are mirrored with a unique 8-bit ECC byte. The ECC enables 8-bit burst and 4-bit random error detection plus 2-bit random error correction for each encoded data byte. With the data byte and ECC byte configured into a 4 bit x 4 bit codeword array and dispersed in either row, column or both dimensions the method can perform dual 4-bit row and column erasure recovery. It is shown that for each codeword there are 12 possible combinations of row and column elements called couplets capable of mirroring the data byte. These byte level micro-mirrors outperform conventional mirroring in that each byte and its ECC mirror can self-detect and self-correct random errors and can recover all dual erasure combinations over four elements. Encoding at the byte quanta level maximizes application flexibility. Also disclosed are fast encode, decode and reconstruction methods via boolean logic, processor instructions and software table look-up with the intent to run at line and application speeds. The new error control method can augment ARQ algorithms and bring resiliency to system fabrics including routers and links previously limited to the recovery of transient errors. Image storage and storage over arrays of static devices can benefit from the two-dimensional capabilities. Applications with critical data integrity requirements can utilize the method for end-to-end protection and validation. An extra ECC byte per codeword extends both the resiliency and dimensionality.

958 citations

Journal ArticleDOI
TL;DR: FunRich is an open access, standalone functional enrichment and network analysis tool that permits for the tool to be exploited as a skeleton for enrichment analysis irrespective of the data type or organism used.
Abstract: As high-throughput techniques including proteomics become more accessible to individual laboratories, there is an urgent need for a user-friendly bioinformatics analysis system. Here, we describe FunRich, an open access, standalone functional enrichment and network analysis tool. FunRich is designed to be used by biologists with minimal or no support from computational and database experts. Using FunRich, users can perform functional enrichment analysis on background databases that are integrated from heterogeneous genomic and proteomic resources (>1.5 million annotations). Besides default human specific FunRich database, users can download data from the UniProt database, which currently supports 20 different taxonomies against which enrichment analysis can be performed. Moreover, the users can build their own custom databases and perform the enrichment analysis irrespective of organism. In addition to proteomics datasets, the custom database allows for the tool to be used for genomics, lipidomics and metabolomics datasets. Thus, FunRich allows for complete database customization and thereby permits for the tool to be exploited as a skeleton for enrichment analysis irrespective of the data type or organism used. FunRich (http://www.funrich.org) is user-friendly and provides graphical representation (Venn, pie charts, bar graphs, column, heatmap and doughnuts) of the data with customizable font, scale and color (publication quality).

951 citations

Proceedings Article
31 Jul 1994
TL;DR: An improved algorithm for the problem of mining association rules from large collections of data based on careful combinatorial analysis of the information obtained in previous passes is given, which makes it possible to eliminate unnecessary candidate rules.
Abstract: Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W, then it has 1 also in column B". Agrawal, Imielinski, and Swami introduced the problem of mining association rules from large collections of data, and gave a method based on successive passes over the database. We give an improved algorithm for the problem. The method is based on careful combinatorial analysis of the information obtained in previous passes; this makes it possible to eliminate unnecessary candidate rules. Experiments on a university course enrollment database indicate that the method outperforms the previous one by a factor of 5. We also show that sampling is in general a very efficient way of finding such rules.

758 citations

Proceedings ArticleDOI
27 Jun 2006
TL;DR: This paper shows how compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems and evaluates a set of compression schemes and shows that the best scheme depends not only on the properties of the data but also on the nature of the query workload.
Abstract: Column-oriented database system architectures invite a re-evaluation of how and when data in databases is compressed. Storing data in a column-oriented fashion greatly increases the similarity of adjacent records on disk and thus opportunities for compression. The ability to compress many adjacent tuples at once lowers the per-tuple cost of compression, both in terms of CPU and space overheads.In this paper, we discuss how we extended C-Store (a column-oriented DBMS) with a compression sub-system. We show how compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems. We then evaluate a set of compression schemes and show that the best scheme depends not only on the properties of the data but also on the nature of the query workload.

663 citations


Network Information
Related Topics (5)
Database design
15K papers, 376K citations
77% related
Query optimization
17.6K papers, 474.4K citations
76% related
Query language
17.2K papers, 496.2K citations
76% related
Web search query
17.3K papers, 451K citations
75% related
Relational database
21.7K papers, 479K citations
75% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20222
2021293
2020428
2019590
2018568
2017559