scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Integrating XML and databases

01 Jul 2001-IEEE Internet Computing (IEEE Educational Activities Department)-Vol. 5, Iss: 4, pp 84-88
TL;DR: XML is becoming a standard for data communication over the Internet, but it supports a richer set of features, such as user-defined tags that allow both data and descriptive information about data to be represented within a single document.
Abstract: XML is becoming a standard for data communication over the Internet. Like HTML, it is a markup language, but it supports a richer set of features, such as user-defined tags that allow both data and descriptive information about data to be represented within a single document. At the same time, presentation aspects remain decoupled from data representation. XML's flexibility lets it serve as a metalanguage for defining other markup languages specialized for specific contexts. A document type definition (DTD) describes the tags documents can use, customized to the specific semantic requirements of the application context, and the rules connecting tags with their contents. These capabilities make XML a common data format for data interchange between computer systems and between applications. XML's proliferation raises the question of how data transferred by XML documents can be read, stored, and queried. In other words, how can database management systems (DBMSs) handle XML documents?.
Citations
More filters
Journal ArticleDOI
01 Nov 2006
TL;DR: The basic concept of document warehousing is discussed and its formal definitions are presented and a general system framework is proposed and some useful applications are elaborate to illustrate the importance of documentWarehousing.
Abstract: During the past decade, data warehousing has been widely adopted in the business community. It provides multi-dimensional analyses on cumulated historical business data for helping contemporary administrative decision-making. Nevertheless, it is believed that only about 20% information can be extracted from data warehouses concerning numeric data only, the other 80% information is hidden in non-numeric data or even in documents. Therefore, many researchers now advocate that it is time to conduct research work on document warehousing to capture complete business intelligence. Document warehouses, unlike traditional document management systems, include extensive semantic information about documents, cross-document feature relations, and document grouping or clustering to provide a more accurate and more efficient access to text-oriented business intelligence. In this paper, we discuss the basic concept of document warehousing and present its formal definitions. Then, we propose a general system framework and elaborate some useful applications to illustrate the importance of document warehousing. The work is essential for establishing an infrastructure to help combine text processing with numeric OLAP processing technologies. The combination of data warehousing and document warehousing will be one of the most important kernels of knowledge management and customer relationship management applications.

124 citations

Journal ArticleDOI
01 Dec 2007
TL;DR: This paper focuses on fuzzy XML data modeling, which is mainly involved in the representation model of the fuzzy XML, its conceptual design, and its storage in databases.
Abstract: Information imprecision and uncertainty exist in many real-world applications and for this reason fuzzy data modeling has been extensively investigated in various data models. Currently, huge amounts of electronic data are available on the Internet, and XML has been the de facto standard of information representation and exchange over the Web. This paper focuses on fuzzy XML data modeling, which is mainly involved in the representation model of the fuzzy XML, its conceptual design, and its storage in databases. Based on ''possibility distribution theory'', we developed this fuzzy XML data model. We developed this fuzzy UML data model to design the fuzzy XML model conceptually. We investigated the formal conversions from the fuzzy UML model to the fuzzy XML model and the formal mapping from the fuzzy XML model to the fuzzy relational databases.

113 citations


Cites methods from "Integrating XML and databases"

  • ...Based on ‘‘possibility distribution theory’’, we developed this fuzzy XML data model....

    [...]

Patent
10 May 2002
TL;DR: The text format of input data is checked and converted into a system-manipulated format using tags, heading information, and the like, and then the converted data is divided into blocks in a simple manner such that elements in the blocks can be checked based on repetition of predetermined character patterns as discussed by the authors.
Abstract: The text format of input data is checked, and is converted into a system-manipulated format It is further determined if the input data is in an HTML or e-mail format using tags, heading information, and the like The converted data is divided into blocks in a simple manner such that elements in the blocks can be checked based on repetition of predetermined character patterns Each block section is tagged with a tag indicating a block The data divided into blocks is parsed based on tags, character patterns, etc, and is structured A table in text is also parsed, and is segmented into cells Finally, tree-structured data having a hierarchical structure is generated based on the sentence-structured data A sentence-extraction template paired with the tree-structured data is used to extract sentences

86 citations

Patent
08 Dec 2003
TL;DR: In this article, a method for searching unstructured data stored in a database is presented, where a plurality of electronic records in a common repository is stored in the database that provides an audit trail that cannot be altered or disabled by users.
Abstract: A method of and system for searching unstructured data stored in a database. In one embodiment the method comprises storing a plurality of electronic records in a common repository of electronic records in the database that provides an audit trail that cannot be altered or disabled by users of the system where each electronic record comprises unstructured data stored in a character large-object (CLOB) format in a column of a table of the database; creating a security protocol that protects the electronic records against unauthorized access; and creating a query designed to identify electronic records in the database that meet criteria designated in the query. The method further comprises modifying the query in accordance with the security protocol to create a modified query prior to executing the query and running the modified query against the unstructured data. In one particular implementation, the unstructured data comprises a well-formed XML document stored within a column of a table stored in the database.

65 citations

Patent
05 Sep 2002
TL;DR: In this paper, a method for automatically generating code for converting data from stored procedures to an XML format is performed by a wizard with which a client interfaces, where the wizard receives a selection of a stored procedure, determines an output data format for the selected procedure, obtains a definition of an XML document which contains which portions of the output data formats to include in the XML document, and generates code for a wrapper.
Abstract: A method for automatically generating code for converting data from stored procedures to an XML format is performed by a wizard with which a client interfaces. The wizard receives a selection of a stored procedure, determines an output data format for the selected stored procedure, obtains a definition of an XML document which contains which portions of the output data format to include in the XML document, and generates code for a wrapper. The wrapper would call the stored procedure and generate the defined XML document. The XML document is then returned by the wrapper. In this manner, the code for converting data to the XML format need not be manually generated. The data also need not be manually converted to the XML format.

32 citations

References
More filters
01 Jan 1999
TL;DR: This paper describes the experiences migrating the Lore database management system for semistructured data to work with XML, and presents a modified data model, whose definition was a subtly challenging task given that XML itself is just a textual language.
Abstract: Research on semistructured data over the last several years has focused on data models, query languages, and systems where the database is modeled as some form of labeled, directed graph. The recent emergence of eXtensible Markup Language (XML) as a new standard for data representation and exchange on the World-Wide Web has drawn significant attention. Researchers have casually observed a striking similarity between semistructured data models and XML. While similarities do abound, some key differences dictate changes to any existing data model, query language, or DBMS for semistructured data in order to fully support XML. This paper describes our experiences migrating the Lore database management system for semistructured data to work with XML. We present our modified data model, whose definition was a subtly challenging task given that XML itself is just a textual language. Based on this model, we describe changes to Lorel, Lore's query language. We also briefly discuss changes to Lore's dynamic structural summaries (DataGuides) and the relationship of DataGuides to XML's Document Type Definitions (DTDs).

309 citations

Journal ArticleDOI
TL;DR: Author-X is a Java-based system that addresses the security issues of access control and policy design for XML document administration and allows a user to verify a document's integrity without contacting the document server.
Abstract: Author-X is a Java-based system that addresses the security issues of access control and policy design for XML document administration. Author-X supports the specification of policies at varying granularity levels and the specification of user credentials as a way to enforce access control. Access control is available according to both push and pull document distribution policies, and document updates are distributed through a combination of hash functions and digital signature techniques. The Author-X approach to distributed updates allows a user to verify a document's integrity without contacting the document server.

248 citations

Proceedings ArticleDOI
01 May 2001
TL;DR: Administrative operations provides financial support for sales and marketing activities of the United States Postal Service through grants, loans, and grants.
Abstract: administrative operations Tom Moore 25 sales 2,000 third < duty> manager secretary Bob Watson 8005769840 < phoneNr > UPS

27 citations