A semantic network-based design methodology for XML documents

of 32
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report



Views: 19 | Pages: 32

Extension: PDF | Download: 0

A semantic network-based design methodology for XML documents
  A Semantic Network-Based DesignMethodology for XML Documents LING FENGUniversity of Twente, The NetherlandsELIZABETH CHANGUniversity of Newcastle, AustraliaandTHARAM DILLONLa Trobe University, Australia The eXtensible Markup Language (XML) is fast emerging as the dominant standard for describ-ing and interchanging data among various systems and databases on the Internet. It offers theDocument Type Definition (DTD) as a formalism for defining the syntax and structure of XMLdocuments. The XML Schema definition language, as a replacement for the DTD, provides morerich facilities for defining and constraining the content of XML documents. However, it does notconcentrate on the semantics that underlies these documents, representing a logical data modelratherthanaconceptualmodel.Toenableefficientbusinessapplicationdevelopmentinlarge-scaleelectroniccommerceenvironments,itisnecessarytodescribeandmodelreal-worlddatasemanticsand their complex interrelationships. In this article, we describe a design methodology for XMLdocuments. The aim is to enforce XML conceptual modeling power and bridge the gap betweensoftware development and XML document structures. The proposed methodology is comprised of twodesignlevels:the semanticlevel andthe schemalevel .Thefirstlevelisbasedonasemanticnetwork,whichprovidessemanticmodelingofXMLthroughfourmajorcomponents:asetof  atomic and  complex  nodes, representing real-world objects; a set of directed edges, representing   semanticrelationships between the objects; a set of labels denoting different types of semantic relationships,including  aggregation,generalization,association ,and of-property relationships;andfinallyasetof constraints defined over nodes and edges to constrain semantic relationships and object domains.The other level of the proposed methodology is concerned with detailed XML schema design, in-cluding   element/attributedeclarations and simple/complextypedefinitions . The mapping betweenthe two design levels is proposed to transform the XML semantic model into the XML Schema,based on which XML documents can be systematically created, managed, and validated.Categories and Subject Descriptors: H.2.1 [ Information Systems ]: Database Management— logical design General Terms: Design, LanguagesThis work was completed while L. Feng was at Infolab, Tilburg University in the Netherlands. Author’s addresses: L. Feng, Dept. of Computer Science, University of Twente, The Netherlands;email: ling@cs.utwente.nl; E. Chang, Department of Computer Science & Software Engineering,University of Newcastle, Australia; email: chang@cs.newcastle.edu.au; T. Dillon, Department of Computer Science & Computer Engineering, La Trobe University, Australia; email: tharam@ cs.latrobe.edu.au.Permission to make digital/hard copy of part or all of this work for personal or classroom use isgranted without fee provided that the copies are not made or distributed for profit or commercialadvantage, the copyright notice, the title of the publication, and its date appear, and notice is giventhat copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers,or to redistribute to lists, requires prior specific permission and/or a fee. C   2002 ACM 1046-8188/02/1000-0390 $5.00  ACM Transactions on Information Systems, Vol. 20, No. 4, October 2002, Pages 390–421.  Semantic Network-Based Design Methodology  •  391  Additional Key Words and Phrases: XML, design methodology, conceptual modeling, semanticnetwork, XML Schema 1. INTRODUCTION  XML was introduced in 1996 to overcome the deficiencies of HTML. It is muchmore powerful than HTML, allowing structural and semantic markups, allow-ing presentation of specific instructions using style sheets, and allowing the in-corporation of metainformation and somewhat more flexible link managementthan HTML. It builds on the principles and conventions of the Standard Gen-eralized Markup Language (SGML) [Bryan 1992], and provides a simple yetpowerful mechanism for information storage, processing, and delivery. Nowa-days, XML has become an increasingly important data format for storing andinterchanging data among various systems and databases on the Internet. Asa new markup language that supports user-defined tags, and encourages theseparation of document content from its presentation, XML is able to auto-mate Web information processing, in particular for data exchange and intero-perability which are major issues in business-to-business electronic commerce[Consortium2000a,d].Indataexchangeapplications,whenevercompositedatamust be exchanged between two programs, XML serves as a suitable formatfor making the data self-describing. XML is thus, in part, a data representa-tion language that lets us describe data and create vocabularies to exchangeinformation. In addition, XML separates data from presentation, making themreusable. The Document Type Definition (DTD) offered by XML can be usedas a formalism for defining the syntax and structure of XML documents. The XMLSchemadefinitionlanguage,asareplacementfortheDTD,providesmorerich facilities for defining and constraining the content of XML documents[Sahuguet 2000].However, in Web applications, writing XML is a lot more work than writing HTMLbecauseXMLrequiresmoreknowledgeabouttherelationshipsbetweenelements rather than the content itself. On the other hand, to enable efficientbusiness application development in large-scale electronic commerce environ-ments, current XML lacks the modeling power in describing real-world dataand their complex interrelationships, and thus providing the objects’ necessarysemantics.These factors highlight the need for a design methodology to provide a foun-dation for the design and development of XML documents. As XML bears a close similarity to semistructured data models [Bradley1998;Bunemanetal.2001;BeeriandTzaban1999],onecurrenttrendinthelit-erature is to apply data models developed for semistructured and unstructureddata to XML [Goldman et al. 1999]. The Object Exchange Model (OEM) devel-oped at Stanford University is a simple, self-describing, nested object modelthat represents semistructured data by a labeled directed graph [Goldman andWidom 1997; Buneman et al. 1996]. The OEM model was further migratedto work with XML [Goldman et al. 1999], where OEM’s object corresponds toelement, and OEM’s subobject relationship mirrors element nesting in XML.  ACM Transactions on Information Systems, Vol. 20, No. 4, October 2002.  392  •  L. Feng et al. When we compare the OEM-based XML data model with the one describedin this article, although both adopt graphic notations, there is a fundamentaldifference. The former models XML documents at the  instance level . It doesnot give an instance-independent description of the data. In contrast, we model XML data at the  concept level  through a set of semantic relationships in-cluding aggregation, generalization, association, and of-property, and variousconstraints defined on objects and their relationships.W3C provides an XML Data Model to visualize the structures of XML doc-uments [Consortium 2000c]. This model provides no more than a baseline onwhich more complex models can be built. It presents an XML document as alinearization of a tree structure. This model focuses more on the  syntacticstructure  of XML documents; it does not address  semantic modeling   issuesfor XML documents.The Resource Description Framework (RDF) is part of the result of the W3CMetadata Activity. The aim of RDF is to provide a robust and flexible way tostandardize the definition and use of metadata, descriptions of Web-based re-sources [Consortium 2000b]. RDF emphasizes facilities to enable automatedprocessing of Web resources. It is a foundation for processing metadata soas to provide interoperability between applications that exchange machine-understandable information on the Web. The broad goal of RDF is to define amechanism for describing resources that makes no assumptions about a par-ticular application domain, nor predefines the semantics of any applicationdomain. In general, RDF is a model of metadata. It uses XML to specify meta-data semantics. In this sense, RDF and XML are complementary. Since RDFaims to provide a common, generic, and domain-neutral information descrip-tion framework, its modeling primitives are not as rich as a semantic network,which captures more semantics such as different kinds of interrelationshipsamong real-world entities, together with different constraints enforced overthese interrelationships. Also, as XML currently gains increasing importanceindataexchangeanddisseminationontheInternet,foraspecificapplication,itwouldbeimportantandusefultobuildupitssemanticmodel,andthenconvertit into XML Schema.Recently, Conrad et al. [2000] proposed conceptually modeling XML DTDsand thus classes of documents on the basis of the Unified Modeling Lan-guage (UML). The idea was to use essential parts of static UML to model XML data schemata. The mapping between the static part of UML specifi-cation (i.e., class diagrams) and XML DTDs was developed. To take advan-tage of all facets that DTD concepts offer, the authors extended the UMLlanguage in a UML-compliant way. An object-oriented method was furtherpresented in Xiao et al. [2001b,a] to conceptually model XML Schema. Ourwork is distinguished from the above ones in the following aspects. First,we focus on the design methodology for XML documents, which is comprisedof two design levels, that is, the semantic level and the schema level. Sec-ond, our transformations from the semantic level to the schema level tar-get the most general semantic relationships between objects, and are thusnot limited to UML. In particular, we take different perspectives regarding these semantic relationships (e.g., strong/weak adhesion, ordered composition,  ACM Transactions on Information Systems, Vol. 20, No. 4, October 2002.  Semantic Network-Based Design Methodology  •  393 Fig. 1. A two-level design approach. homogeneity/heterogeneity and exclusion in aggregation relationships, inher-itance, and overriding in generalization relationships, and strong/weak adhe-sion and exclusion in association relationships), and various constraints onobjects into consideration, and examine how they can be realized in the XMLSchema.Our current article highlights an XML design methodology based on a se-manticnetwork.TheaimistoenforceXMLconceptualmodelingpower,making it easier to create, manage, retrieve, and validate the semantics of the XMLschema. The proposed methodology is based on two design levels: a  semanticlevel  and a  schema level , as shown in Figure 1. We first present a way tosemantically model XML using a semantic network. It has four major compo-nents: a set of   atomic  and  complex  nodes, representing real-world objects; a setofdirectededges,representing  semanticrelationships betweentheobjects;asetof labels denoting different types of semantic relationships, including   aggrega-tion,generalization,association ,and of-property relationships;andfinally,asetofconstraintsdefinedovernodesandedgestoconstrainsemanticrelationshipsand object domains. We then examine the mapping from the XML semanticlevel to the corresponding XML schema level, which is mainly concerned withdetailed XML  element/attribute declarations  and  simple/complex type defini-tions .BasedonthegeneratedXMLSchema,XMLinstancedocumentscanthenbe systematically constructed and controlled.With the proposed methodology, it is possible to combine software develop-ment with the XML data schemata. This will enable one to improve reuse atboth document and application design levels, and support the generation of common application components.The remainder of the article is organized as follows. Section 2 presents asemantic network model for XML. Section 3 describes the mapping processfrom the XML semantic level to the XML schema level. Section 4 concludes thearticle with a brief discussion of future work. 2. XML SEMANTIC LEVEL In this section, we first provide a brief review of XML document structure, andthen describe a semantic data model for XML based on a semantic network. 2.1 XML Document Structure  XML is concerned with describing the structure of documents that are storedin electronic format, in a form that is accessible to both people and computer  ACM Transactions on Information Systems, Vol. 20, No. 4, October 2002.  394  •  L. Feng et al. Fig. 2. A hierarchical document structure example. software. An XML format data file contains a mixture of document texts and XML markups, which organize and identify the components of a document. XML markups build on the concept of macro-based typesetting languages. A start-tag and an end-tag, together with the data enclosed by them, com-prise an element. XML elements may contain further embedded elements,leading to a hierarchical document structure. An example of this hierarchyis shown in Figure 2. A Book element may contain a Title element and a Con-tent element. The Content element is comprised of a number of Chapter ele-ments, and each Chapter element contains a number of Paragraph elements.In general, each specified element must be a container element or empty. Con-tainer elements may contain text, child elements, or a mixture of both. Theuse of child elements can be controlled in a  Required, Optional , or  Repeti-tion  way. For example, every book must have a title, so the Book elementmust have a Title child element. This is an example of   Required . A child el-ement may be an option. This is called  Optional . A child element may be re-peatable like the Chapter element and Paragraph element. Each element canbe associated with one or more attributes. An attribute, consisting of an at-tribute name and an attribute value, provides refined information about anelement. AnXMLdocumentisusuallyassociatedwithatypespecificationcalledDocu-mentTypeDefinitioncontaininguser-definedelementtypesandattributespec-ifications that allow one to describe the meaning of the content. In the earlystages of XML development, there were several proposals to introduce XMLSchema.TheXMLSchemaoffersareplacementofXMLDTD,withthepurposeofconstraininganddocumentingthemeaning,usage,andrelationshipsoftheirconstituent parts such as permissible data types, elements, and their contents,attributes, and attribute values [Consortium 2001]. The schema definition lan-guage,whichisitselfrepresentedinXML,considerablyextendsthecapabilitiesof XML DTD for defining and constraining the content of XML documents. An XML Schema is usually comprised of a set of schema components, such as typedefinitions and element declarations. They can be used to assess the validity of well-formed element information items. There are 12 kinds of schema compo-nentsintotal,fallingintothreegroups.Themostusedcomponentsinclude sim- pletypedefinitions,complextypedefinitions,attributedeclarations ,and  elementdeclarations .  ACM Transactions on Information Systems, Vol. 20, No. 4, October 2002.
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!