GEOSS Banner

3.2 The producer quality model

A full metadata document which conforms to the producer schema is shown in Appendix 1, and may also be viewed at

http://schemas.geoviqua.org/GVQ/3.1....00_GVQ_raw.xml

A version of the same document, styled using XSLT, may be viewed at

http://schemas.geoviqua.org/GVQ/3.1....VQ_styled.html

Screenshots from the styled version are used in section 3.3 to illustrate some of the benefits of the producer quality model.

    3.2.1 Publications

    Publications (e.g. journal articles, technical reports) may be added to a number of quality elements within the metadata document. In each case, an existing DQ_ or MD_ element is extended to allow a ‘referenceDoc’ element to be added (see Figure 9). The resulting new objects are GVQ_Lineage, GVQ_DataIdentification and GVQ_Usage.

    DQ_Evaluation already has a ‘referenceDoc’, and since GVQ_Publication is substitutable for CI_Citation, it may also be employed here.

    locations.png

    Figure 9. Locations where a publication can or must be supplied in the GeoViQua producer model

                      

    Each publication extends a CI_Citation record with some GeoViQua‐specific elements, including codes describing the purpose and the medium of publication (see Figure 10). At the stage when metadata is produced, these publications are likely to be related to the Cal/Val process or quality assurance procedures.

    GVQ_Publication.png

    Figure 10. The GVQ_Publication class, which extends CI_Citation

    GVQ_Publication class refers to the specific dataset to which the reference relates by using a MD_Identifier as a "target", by doing so, a publication element can be used independetly. The use of publication elements as citations in the feedback model is more fully discussed in section 3.4.

    An example of a publication element can be viewed online at

    http://schemas.geoviqua.org/GVQ/3.1....on_example.xml

    A compressed summary is shown in Figure 11.

    The existence of a unique identification values such as a DOI, a Uniform Resource Locator (URL), an ISBN number or an ISSN number that identifies a particular publication, can be used to link a scientific publication to a specific dataset within a catalogue. The DOI reference has been used for scientific publications since 1994, as well as the ISBN and ISSN identification used for books back in the seventies, but recently, with the use of the Internet, new categories of publications have appear (e.g. web-pages, wikis, etc.), which  also require a unique identifier number. To uniquely identify publications such as books or serial publications (e.g. magazines) it has been adopted the ISO 19115 standard that uses ISSN and ISBN. GeoViQua has added the DOI number to facilitate the search over the internet of online publications. In cases where the publication is available online, its URL can be recorded as part of the CI_OnlineResource element.

    With the aim to provide as much information and as clear and easy to reach as possible two additions have been made to complete information related to the Publication, one is related to the Category of the Publication and the other refers to the Purpose of the Publication.

    publication_element_metadata.png

    Figure 11. A publication element in a metadata document

    In the example above, a client can retrive the original publication by using the DOI (defined as a character string) specified or by using the URL link to te publication, which will be documented in the "onlineResource" element.

    If the metadata producer wishes to recommend that a specific DOI resolver should be used to retrieve the publication, they could instead specify the combination of resolver URL and DOI as the content of the ‘onlineResource’, as follows:

    http://dx.doi.org/10.1109/TGRS.2006.864370

    which resolves to:

    http://ieeexplore.ieee.org/xpl/artic...number=1645273

                      

    We recommend that if this strategy is used, the DOI string should still be supplied in a ‘doi’ element, to allow the use of alternative resolvers if necessary.

    To do this work we took into consideration ISO 690, particularly VQ_PublicationCategoryCode contains all categories specified in this standard,"

    3.2.2 Traceability

    A fourth element has been added to the ‘metaquality’ concrete types described in section 3.1.2, to allow the lineage of a data quality assessment to be recorded, along with its representativity and coverage. This new element is called the GVQ_Lineage and it allows to document the report, the process steps and the source (see Figure 12). If the substitutable GVQ_Lineage is used instead of LI_Lineage, then one or more reference documents may also be cited in support of the traceability statement.

    Traceability.png

    Figure 12. The ‘Traceability’ element

    Metaquality elements are created as reports belonging to the DQ_DataQuality element of the metadata document. They are therefore found at the same level as the reports which they reference using ‘relatedElement’ attribute. A compressed example is shown below in Figure 13. This example uses XML Linking Language (xlink)7 to reference the id of the relevant data quality report.

    3.2.3 Reference datasets used for evaluation

    Evaluation and verification of datasets is done by comparing its values against independent dataset values (calibrated or validated). In ISO 19157 the identification of the dataset used to evaluate aparticular dataset can only be done by entering it as free-text, since no systematic way is in place. An addition has been made in the model to extend the DQ_EvaluationMethod to document the identity of the reference dataset and how it is used.

    Figure 14 shows how the extension was practically achieved: an initial attempt to extend DQ_DataEvaluation directly caused problems with substitutability in older schemas,and so each concrete type was extended individually. An example of one of these extended elements is shown in Figure 15. The publication element is not shown in full, but may be inspected in full online at

    http://schemas.geoviqua.org/GVQ/3.1....00_GVQ_raw.xml

    The citation element is compressed to save space, but, as can be seen from the namespace prefix, it uses the updated MD_Identifier in which a codespace may be specified.

    GeoViQuaelements.png

    Figure 14. The new GeoViQua elements which permit reference datasets to be recorded.

    evaluationelement.png

    Figure 15. An evaluation element with a reference dataset

    3.2.4 Producer soft knowledge: Discovered issues

    Discovered issues can be now documented into the extended version of the DQ_DataQuality element of a metadata document, this is done by the addition of a new class GVQ_DiscoveredIssue which can provide reference to a corrected version of the dataset as well as to other information related to the dataset possible issues (Figure 16).

    GVQ_DiscoveredIssue.png

    Figure 16. GVQ_DiscoveredIssue type

    To facilitate the documentation and make information regarding the corrected dataset available, information on the problem that has been detected or on how to solve a problem that a dataset has, or even to suggest alternatives datasets, amongst other relevant information to the dataset, a new class has been created as a standalone class, this is the GVQ_DiscoveredIssue class.

    Note that the GVQ_DiscoveredIssue contains a required ‘target’ element which uniquely identifies the resource to which it refers. Fixed resources and alternative datasets are encoded more fully using MD_DataIdentification elements, which contain contact information as well as unique resource identifiers. The recommended use of MD_Identifier elements is discussed in section 3.5.1.

    When discovered issues are embedded within a metadata document in this way, the ‘target’ identifier becomes somewhat redundant since, according to good practice, that identifier should already be recorded elsewhere in the document (for example, in CI_Citation > identifier). However, this approach aims to make the GVQ_DiscoveredIssue element re‐usable within the user quality model, in the form of isolated records in databases and catalogues which record user feedback and expert judgements. The possible redundancy within a producer quality document can be addressed by using xlink for internal document referencing, as shown in the example data quality element containing a discovered issue (see Figure 17). As with all the other examples in this section, this may be more fully viewed in Appendix 1, as part of an example metadata document, or online at

    http://schemas.geoviqua.org/GVQ/3.1....00_GVQ_raw.xml

    discoveredissueelement.png

    Figure 17. A ‘discoveredIssue’ element within a data quality item

    3.2.5 Populating DQ_Result elements using UncertML

    The Abstract DQ_Element is inherited from the ISO 19115, 19139 and the 19157. It is extended into DQ_Result elements in order to document additional information regarding conformance, coverage, quantitative results such as accuracy and even information in the form of text into the descriptive element (as shown in Figure 18).

    19157schemaelements.png

    Figure 18.The existing 19157 schema elements relating to quality measurement reporting

    The schemas proposed are not restrictive, and thus no visible changes can be found in the UML diagrams. However, additional information on the nature of the documented values can be registered through the use of encodings for probabilistic statistical information.

    There is currently flexibility regarding the Record type that is documented, with the outcome of a heterogeneous list of data structures and types. Concerns to comply with Interoperability requirements have made necessary to follow certain guidelines. It is thus recommended that there should be constrains on the data types used to allow automated generation and use of quantitative data. There are several resources that can be consulted that define specific quantitative measure type, some are UML dictionary of statistical concepts and measures which are publicly available, others are OGC online registry, NASA's Semantic Web for Earth Environmental Terminology (SWEET) ontology, as well as the Marine Metadata Interoperability (MMI) registry.

    3.2.5.1 Quantitative Accuracy

    The results of an specific quality assessment are embedded into the DQ_Result element in ISO 19157, as it is shown in figure 19.

Encoding_DQ_Results.png

Figure 19. Concrete types available for encoding a DQ_Result

Below can be found an example on how UncertML can be used to document information on the uncertainty of a data type. To illustrate it we are going to start with the following schema on the vertical accuracy of a DEM, which it has been documented as a 'result' (a DQ_QuantitativeResult) of a DQ_AbsoluteExternalPositionalAccuracy of element.

shema1.png

We can find plenty of information in the schema, but the 3.5 value does not indicate what kind of information is given (Gaussian error, error bias,...), although here it is probably representing two standard deviations around a Gaussian mean.To provide more information on this value and on uncertainty, UncertML allows two options:

Fist option: Supply the URI of the UncertML dictionary as the type value (valueType) and leave the value as a simple number (value), showing a new result as the variance (assuming that the original value was the deviation of two values).

schema2.png

Second option: this option is a bit advanced and it requires the client to parse UncertML. Here the URI of the UncertML dictionary is given as the valueType, information on the distribution of the value from which the vertical accuracy was calculated is given as NormalDistribution element. The difference with the information given in the original example is that here it is explicit that the samples fit in the Normal distribution with mean of 0, and it references to the dictionary definitions of the distribution.

schema3.png

3.2.5.2 Thematic accuracy

Thematic accuracy has been mostly recorded as a confusion matrix which has an associated Kappa value. This information has been usually documented as a text and also as hiperliks to the confusion matrix. Either way, the information is not directly retrived and thus the use of UncertML stardard which defines a ConfusionMatrix encoding element can automaticaly extract the values of interest. Its application has benefits for modelling, simulations as well as to detect any missclassifications.

schema4.png


<<Prev T.O.C.  Next>>
Tag
none

Files (0)

 
You must login to post a comment.