Topic

Column (database)

About: Column (database) is a research topic. Over the lifetime, 12416 publications have been published within this topic receiving 121299 citations. The topic is also known as: attribute.

...read moreread less

Papers published on a yearly basis

1 / 3

Papers

PDF

Open Access

More filters

Proceedings Article•

C-store: a column-oriented DBMS

[...]

Michael Stonebraker¹, Daniel J. Abadi¹, Adam Batkin², Xuedong Chen³, Mitch Cherniack², Miguel Ferreira¹, Edmond Lau¹, Amerson Lin¹, Samuel Madden¹, Elizabeth O'Neil³, Patrick O'Neil³, Alexander Rasin⁴, Nga Tran², Stan Zdonik⁴ - Show less +10 more•Institutions (4)

Massachusetts Institute of Technology¹, Brandeis University², University of Massachusetts Boston³, Brown University⁴

30 Aug 2005

TL;DR: C-Store as mentioned in this paper is a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimised, and it uses bitmap indexes to complement B-tree structures.

...read moreread less

Abstract: This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.We present preliminary performance data on a subset of TPC-H and show that the system we are building, C-Store, is substantially faster than popular commercial products. Hence, the architecture looks very encouraging.

...read moreread less

970 citations

Patent•

Multi-dimensional data protection and mirroring method for micro level data

[...]

Robert Halford

22 Jul 2003

TL;DR: In this paper, a data validation, mirroring and error / erasure correction method for the dispersal and protection of one and two-dimensional data at the micro level for computer, communication and storage systems is presented.

...read moreread less

Abstract: The invention discloses a data validation, mirroring and error / erasure correction method for the dispersal and protection of one and two-dimensional data at the micro level for computer, communication and storage systems. Each of 256 possible 8-bit data bytes are mirrored with a unique 8-bit ECC byte. The ECC enables 8-bit burst and 4-bit random error detection plus 2-bit random error correction for each encoded data byte. With the data byte and ECC byte configured into a 4 bit x 4 bit codeword array and dispersed in either row, column or both dimensions the method can perform dual 4-bit row and column erasure recovery. It is shown that for each codeword there are 12 possible combinations of row and column elements called couplets capable of mirroring the data byte. These byte level micro-mirrors outperform conventional mirroring in that each byte and its ECC mirror can self-detect and self-correct random errors and can recover all dual erasure combinations over four elements. Encoding at the byte quanta level maximizes application flexibility. Also disclosed are fast encode, decode and reconstruction methods via boolean logic, processor instructions and software table look-up with the intent to run at line and application speeds. The new error control method can augment ARQ algorithms and bring resiliency to system fabrics including routers and links previously limited to the recovery of transient errors. Image storage and storage over arrays of static devices can benefit from the two-dimensional capabilities. Applications with critical data integrity requirements can utilize the method for end-to-end protection and validation. An extra ECC byte per codeword extends both the resiliency and dimensionality.

...read moreread less

958 citations

Journal Article•DOI•

FunRich: An open access standalone functional enrichment and interaction network analysis tool

[...]

Mohashin Pathan¹, Shivakumar Keerthikumar¹, Ching-Seng Ang², Lahiru Gangoda¹, Camelia Quek², Nicholas A. Williamson², Dmitri Mouradov³, Oliver M. Sieber², Oliver M. Sieber³, Richard J. Simpson¹, Agus Salim¹, Antony Bacic², Andrew F. Hill¹, Andrew F. Hill², David A. Stroud⁴, Michael T. Ryan⁴, Johnson I. Agbinya¹, John M. Mariadason⁵, Antony W. Burgess², Antony W. Burgess³, Suresh Mathivanan¹ - Show less +17 more•Institutions (5)

La Trobe University¹, University of Melbourne², Walter and Eliza Hall Institute of Medical Research³, Monash University⁴, Ludwig Institute for Cancer Research⁵

01 Aug 2015-Proteomics

TL;DR: FunRich is an open access, standalone functional enrichment and network analysis tool that permits for the tool to be exploited as a skeleton for enrichment analysis irrespective of the data type or organism used.

...read moreread less

Abstract: As high-throughput techniques including proteomics become more accessible to individual laboratories, there is an urgent need for a user-friendly bioinformatics analysis system. Here, we describe FunRich, an open access, standalone functional enrichment and network analysis tool. FunRich is designed to be used by biologists with minimal or no support from computational and database experts. Using FunRich, users can perform functional enrichment analysis on background databases that are integrated from heterogeneous genomic and proteomic resources (>1.5 million annotations). Besides default human specific FunRich database, users can download data from the UniProt database, which currently supports 20 different taxonomies against which enrichment analysis can be performed. Moreover, the users can build their own custom databases and perform the enrichment analysis irrespective of organism. In addition to proteomics datasets, the custom database allows for the tool to be used for genomics, lipidomics and metabolomics datasets. Thus, FunRich allows for complete database customization and thereby permits for the tool to be exploited as a skeleton for enrichment analysis irrespective of the data type or organism used. FunRich (http://www.funrich.org) is user-friendly and provides graphical representation (Venn, pie charts, bar graphs, column, heatmap and doughnuts) of the data with customizable font, scale and color (publication quality).

...read moreread less

951 citations

Proceedings Article•

Efficient algorithms for discovering association rules

[...]

Heikki Mannila¹, Hannu Toivonen¹, A. Inkeri Verkamo¹•Institutions (1)

University of Helsinki¹

31 Jul 1994

TL;DR: An improved algorithm for the problem of mining association rules from large collections of data based on careful combinatorial analysis of the information obtained in previous passes is given, which makes it possible to eliminate unnecessary candidate rules.

...read moreread less

Abstract: Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W, then it has 1 also in column B". Agrawal, Imielinski, and Swami introduced the problem of mining association rules from large collections of data, and gave a method based on successive passes over the database. We give an improved algorithm for the problem. The method is based on careful combinatorial analysis of the information obtained in previous passes; this makes it possible to eliminate unnecessary candidate rules. Experiments on a university course enrollment database indicate that the method outperforms the previous one by a factor of 5. We also show that sampling is in general a very efficient way of finding such rules.

...read moreread less

758 citations

Proceedings Article•DOI•

Integrating compression and execution in column-oriented database systems

[...]

Daniel J. Abadi¹, Samuel Madden¹, Miguel Ferreira¹•Institutions (1)

Massachusetts Institute of Technology¹

27 Jun 2006

TL;DR: This paper shows how compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems and evaluates a set of compression schemes and shows that the best scheme depends not only on the properties of the data but also on the nature of the query workload.

...read moreread less

Abstract: Column-oriented database system architectures invite a re-evaluation of how and when data in databases is compressed. Storing data in a column-oriented fashion greatly increases the similarity of adjacent records on disk and thus opportunities for compression. The ability to compress many adjacent tuples at once lowers the per-tuple cost of compression, both in terms of CPU and space overheads.In this paper, we discuss how we extended C-Store (a column-oriented DBMS) with a compression sub-system. We show how compression schemes not traditionally used in row-oriented DBMSs can be applied to column-oriented systems. We then evaluate a set of compression schemes and show that the best scheme depends not only on the properties of the data but also on the nature of the query workload.

...read moreread less

663 citations

Collapse

Network Information

Performance

Metrics

12,416

Papers

127,334

Citations

No. of papers in the topic in previous years
Year	Papers
2022	2
2021	293
2020	428
2019	590
2018	568
2017	559

Column (database)

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics