Regular expression types for XML
read more
Citations
Types and Programming Languages
Taxonomy of XML schema languages using formal language theory
XDuce: A statically typed XML processing language
CDuce: an XML-centric general-purpose language
Typechecking for XML transformers
References
Introduction to Automata Theory, Languages, and Computation
Extensible Markup Language (XML).
Extensible markup language
Tree Automata Techniques and Applications
DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
Related Papers (5)
Frequently Asked Questions (15)
Q2. What have the authors stated for future works in "Regular expression types for xml" ?
In the future, the authors hope to incorporate other standard features from functional programming, such as higher-order functions and parametric polymorphism. For function types, their current approach—define subtyping by inclusion of the semantics of types and reduce it to the decidability of tree automata inclusion—does not easily extend simply because functions are not trees. Also for polymorphism, their current scheme needs to be substantially extended since usual tree automata do not have any concept corresponding to “ type variables. ” A promising direction might be to incorporate ideas from tree set automata [ Gilleron et al. 1999 ], though the authors have not gone far.
Q3. What is the way to decide subtype relations?
In particular, the authors can exploit reflexivity (T <: T) in order to decide subtype relations by looking at only a part of the whole input type expressions.
Q4. What is the reason for XML’s popularity?
One of the reasons for its popularity is the existence of a number of schema languages, including DTDs [Bray et al. 2000], XML-Schema [Fallside 2001], DSD [Klarlund et al. 2000], and RELAX [Murata 2001], that can be used to define “types” (or “schemas”) describing structural constraints on data and thereby improve the safety of data processing and exchange.
Q5. What are the main features of functional programming?
In the future, the authors hope to incorporate other standard features from functional programming, such as higher-order functions and parametric polymorphism.
Q6. What is the type system used by Buneman, Fernandez, and Suciu?
In the type system studied by Buneman, Davidson, Fernandez, and Suciu [Buneman et al. 1997], types are graph structures and their conformance and subtype relations are defined in terms of graph simulation (which is weaker than the inclusion relation).
Q7. What is the way to improve equality tests?
To further improve the speed of equality tests, the authors use hash consing, which associates each type expression with its integer hash value, so that equality can be quickly checked in most cases by comparing their hash values.
Q8. What is the main argument for the proposed regular expression types?
The authors have proposed regular expression types for XML processing, arguing that setinclusion-based subtyping and subtagging yield useful expressive power in this domain.
Q9. What is the cost of a type system for XML?
The cost is that XML values and their corresponding schemas must somehow be “injected” into the value and type spaces of the host language; this usually involves adding more layers of tagging than were present in the original XML documents, which inhibits subtyping.
Q10. What are the schema languages for XML?
Although schema languages for XML do not treat static verification of programs, the type structures in these languages and regular expression types are worth discussing.
Q11. What is the way to extend the semantics of functions?
For function types, their current approach—define subtyping by inclusion of the semantics of types and reduce it to the decidability of tree automata inclusion—does not easily extend simply because functions are not trees.
Q12. What is the representation of equality?
Since the authors use only union and equality for the operations on such sets, a suitable representation is a sorted list, which allows us to perform these two operations in linear time.
Q13. What is the recent example of the embedding approach?
A recent example of the embedding approach is Wallace and Runciman’s proposal to use Haskell as a host language [Wallace and Runciman 1999] for XML processing.
Q14. How does the algorithm run on large types?
By incorporating several optimization techniques, their algorithm runs at acceptable speeds on several applications involving fairly large types, such as the complete DTD for HTML documents.
Q15. What is the main difference between Haskell and XML?
Although their type system stems from Haskell’s, they attain additional flexibility required in XML processing by incorporating, instead of subtyping, extensible records and variants based on row polymorphism.