Essays academic service


Research paper on implementation issues in xml

Intermediary schemas for complex XML applications: This article examines a possible approach of using 'intermediary' XML schemas, and associated XSLT stylesheets, to make such applications easier to employ.

The wider applicability of this methodology, particularly in relation to the METS standard, is also discussed.

  1. Examples of such elements are the issue title, the Table of Contents headings and the cover image and caption.
  2. An obvious example is when a portion of the content is signed in a situation where the relative position of the content has some semantic import.
  3. In addition, the CERIF core provides basic metadata for research outputs 'results' including publications, patents and products. This means that if the XML has previously been signed and an Id attribute is added, the original signature will be invalid.

Introduction The complexity of important XML schemas may often present a major hurdle to their adoption, particularly in cases where they are required in environments which do not already have significant experience in XML authoring or editing. These problems may be exacerbated in cases where these schemas are highly flexible, the lack of constraint in the way in which they can be used often requiring extensive work on initial information architectural design before they are implemented in practice: Such problems of complexity and over-flexibility become more acute in applications which employ multiple, linked XML files to capture the complexities of their required architectures.

In such cases, not only must the elements and other components be determined, but also the form of the linkages between files including the definition of any ontologies used to define any links with semantic meaning. The possibility of offering an 'off-the-shelf' scheme with a minimal learning curve becomes less likely the greater the complexity of the overall information environment to be encoded, and the potential take-up of any such schemas will inevitably become more limited.

  1. The possibilities offered by such simple validation procedures as lists or constrained attribute values are very limited, however, and the requirement for them to be incorporated into the schema when it is written makes their potential irrelevant when seeking to constrain standards such as CERIF which have already been published. As is the case with Signature, the flexibility of XML Encryption has led to the discovery of new types of security threats.
  2. Such an approach has the benefit of being able to fulfil the metadata requirements of almost any operating CRIS, but also the downside of potentially great complexity. The example shown above presents two headings at the same level.
  3. The notion of the issue-level XML does not necessarily refer to a printed issue.

This article examines one potential approach to obviating these problems in the form of 'intermediary' XML schemas and XSLT stylesheets. The context in which it is examined is that of the CERIF format European Organisation for International Research Information 2010ba complex application designed to facilitate the interoperability of research management information.

The CERIF model, originally instantiated as a set of relational SQL tables but since 2006 available in XML, is based on a small number of core components and an extensive set of linkages between these which can mirror their often complicated inter-relationships. Such an approach has the benefit of being able to fulfil the metadata requirements of almost any operating CRIS, but also the downside of potentially great complexity.

The methodology advocated here to alleviate some of this complexity is an intermediary XML schema and associated XSLT stylesheets which are used to select a relevant subset of the CERIF components and constrain the manner in which they are employed.

This technique is employed in the context of the Readiness for REF R4R project Centre for e-Research 2011 from the United Kingdom's higher education community which sought to examine the feasibility of employing CERIF in the context of the periodic research assessment exercises which are used to determine the allocation of research funding to universities and other research institutions in that country.

Its aim was to render the CERIF standard, which the higher education funding body had specified as a format for submissions to the next exercise in 2014, a feasible option for the first time for the majority of institutions who had not previously found it viable.

General Requirements

CERIF and its implementation challenges The need to share information required for research management has long been recognised, particularly where this research is publicly funded and there is a consequent onus to ensure transparency in determining the allocation of funding and ensuring that it is well spent.

In addition, the international nature of much research collaboration also requires this information to be readily shared across national and often linguistic boundaries.

Early work on rationalising the metadata necessary for sharing information of this type began in Europe in the 1980s and eventually produced the CERIF standard, which has undergone several major revisions in 2000, 2004 and 2006 since its first appearance in 1991 European Organisation for International Research Information 2010c.

The model defines three 'base' entities project, person and organisation unit, all of which include only very basic metadata including crucially unique IDs as shown in the diagram below: In addition, the CERIF core provides basic metadata for research outputs 'results' including publications, patents and products: Any textual data capable of translation into multiple languages, such as publication titles, abstracts, descriptions of funding programs or descriptions of research environments, must be encoded using these entities: Similarly a research publication may be joined to its funding stream, to a conference at which it is presented or to a prize awarded for it.

These entities rely upon either user-defined, or preferably pre-published, semantic schemes to assign meaning to the linkages within a given application. The following XML fragment, for instance, is an example of part research paper on implementation issues in xml the linking entity which joins research staff to their publications or other research outputs: To be able to use CERIF in a real-world application, therefore, requires the identification or definition of an extensive set of semantic schemes and their consistent application.

This may be particularly problematic as CERIF records lose much of their interoperability without the application of a coherent semantic scheme. Although euroCRIS have themselves published a core set of semantic terms European Organisation for International Research Information 2010a which would, if widely adopted, move towards resolving this problem, it at present covers only a proportion of the relationships likely to be required in a real-world application.

The complexities involved in implementing CERIF should be apparent from even this short introduction to it. Adopting this approach was done with good reason, particularly to retain the degree of flexibility present in the original data model which would be very difficult to replicate in a single XML schema. The disadvantage is the verbosity and complexity involved in employing the 192 XML schemas which form the model in this format. Constraining the XML metadata universe Much of the preceding discussion leads to the conclusion that for CERIF to achieve its potential as a medium for the interoperability of research information it requires some degree of constraint in its application.

Intermediary schemas for complex XML applications: an example from research information management

In addition, the terms under which it is constrained for instance, the choice of schemas and the semantics to be employed need to be adopted in an environment that extends beyond a single institution preferably the whole research management community. XML as a language offers fewer opportunities for constraining and validating content than are available in, for instance, a conventional relational database: As is well known, XML validation procedures can test the conformance of a document syntactically but not semantically Jacinto et al.

The possibilities offered by such research paper on implementation issues in xml validation procedures as lists or constrained attribute values are very limited, however, and the requirement for them to be incorporated into the schema when it is written makes their potential irrelevant when seeking to constrain standards such as CERIF which have already been published. Both work by allowing conditions for element contents to be tested against specified contexts: Both also offer the possibility of conditional validation, so allowing, for instance, the value of a given element or attribute to dictate the structure or content of other components of a file: The validation of CERIF's complex linkages is possible using either approach for instance, by employing the document function in XPath within a Schematron filebut rapidly becomes complicated and hard to maintain accurately once the number of linkages exceeds a relatively small number.

It has been announced that the next exercise, due in 2014, will accept submissions in CERIF as its preferred format, although no detailed specification of the CERIF implementation envisaged has yet been published.

To test the feasibility of using CERIF as the medium for submissions, the R4R project undertook a detailed mapping to it of the data requirements from the previous exercise, undertaken in 2008 Higher Education Funding Council for England 2008. In the absence of any specification of the data requirements for the REF itself, this schema formed the basis of the mapping exercise. The results revealed the complexity of the task ahead, as most concepts in the RAE schema proved to mappable only by employing three or four inter-linked CERIF files: While the number of files required was relatively small, the complexity of the linkages required and the complex semantic vocabularies required to enable them to be formed appeared daunting, and would probably have rendered the standard an inappropriate solution for institutions without an advanced technical knowledge of CERIF and XML itself.

Such architectural processing is not available in XML but much of its functionality can be duplicated by creating highly constrained 'intermediary' XML schemas which are then processed by XSLT to created the complex application required. In the CERIF model, these sources are encoded explicitly, and it is impossible to disaggregate them from the summary form in which they are given in RAE: