3.3 Information Layer
Last updated
Last updated
© 2016 – 2024 | All Rights Reserved | International Data Spaces Association
The Information Layer specifies the Information Model, the domain-agnostic, common language, i.e., Vocabulary of the International Data Spaces. The Information Model is an essential agreement shared by the participants and components of the IDS, facilitating compatibility and interoperability. The primary purpose of this formal model is to enable (semi-)automated exchange of digital resources within a trusted ecosystem of distributed parties, while preserving data sovereignty of Data Owners. The Information Model therefore supports the description, publication and identification of data products and reusable data processing software (both referred to hereinafter as Digital Resources, or simply Resources). Once the relevant Resources are identified, they can be exchanged and consumed via easily discoverable services. Apart from those core commodities, the Information Model describes essential constituents of the International Data Spaces, its participants, its infrastructure components, and its processes.
The Information Model is evolved and maintained by the IDSA Sub-Working Group 4.
The Information Model is a generic model, with no commitment to any particular domain. Domain modeling is delegated to shared Vocabularies and data schemata, as provided, e.g., by domain-specific communities of the International Data Spaces. The Information Model does not provide a meta-model for defining custom datatypes comparable to standards such as OData or OPC-. Concerns beyond the scope of modeling Digital Resources and their interchange are considered out of scope. The Information Model therefore does not deal with the side effects of data exchange (e.g., in scenarios in which data is used for time-critical machine operations).
The Information Model has been specified at three levels of formalization. Each level corresponds to a digital representation, ranging from this high-level, conceptual document down to the level of operational code, as depicted in Figure 3.3.1. Every representation depicts the complete Information Model in its particular way. Among the different representations, the Declarative Representation (IDS Vocabulary) is the only normative specification of the Information Model. As such, it is accompanied by a set of auxiliary resources (e.g., guidance documents, reference examples, validation tools, and editing tools intended to support a competent, appropriate, and consistent usage of the IDS Vocabulary).
The Conceptual Representation of the Information Model presents a high-level overview of the main, largely invariant concepts, with no commitment to a particular technology or domain. It targets a general audience, management boards, and media, as it provides basic information and promotes a shared understanding of the concepts by means of a textual document and a plausible visual notation. Where available, references to related elements of the Declarative Representation and a Programmatic Representation are provided, encouraging the reader to take a look at these alternative implementations.
The Declarative Representation (IDS Vocabulary) provides a normative view of the Information Model of the International Data Spaces. It has been developed along the analysis, findings, and requirements of the Conceptual Representation. Based on a stack of W3C Semantic Web technology and standard modeling Vocabularies (the Data Catalog Vocabulary DCAT, the Open Digital Rights Language , the Simple Knowledge Organization System SKOS, etc.), it provides a formal, machine-interpretable specification of concepts envisaged by the Conceptual Representation, residing at the persistent namespace URI https://w3id.org/idsa/core/ according to best practices for publishing linked . Furthermore, it details and formally defines entities of the International Data Spaces in order to be able to share, search for, and reason upon the structured metadata describing these entities. The IDS Vocabulary is defined using RDF and the OWL Web Ontology Language; additionally, descriptions of Digital Resources can be validated against SHACL that express syntactic and semantic conditions. Queries against, e.g., Data Resources listed in the Data Catalogue of a Connector or Metadata Broker, or against Software Resources available from an App Store, can be formulated in query languages such as SPARQL. Thus, the Declarative Representation comprises a complete referential model allowing the derivation of a number of Programmatic Representations. The IDS Vocabulary is typically used and instantiated by knowledge engineers, ontology experts, or information architects. It defines a fairly minimal, domain-agnostic core model and relies on third-party standard and custom Vocabularies in order to express domain-specific facts. According to the common practice, existing domain Vocabularies and standards are reused where possible, fostering acceptance and interoperability.
The Programmatic Representation of the Information Model targets Software Providers by supporting seamless integration of the Information Model with a development infrastructure software developers are familiar with. It comprises a programming language data model (e.g., Java, Python, C++) shipped as a set of documented software libraries (e.g., JAR files). The Programmatic Representation provides a best-effort mapping of the IDS Vocabulary onto native structures of a target programming language. This approach supports type-safe development, well-established unit testing, and quality assurance processes. It allows developers to easily create instances of the Information Model that are compliant with the IDS Vocabulary, relieving them from the intricacies of ontology processing.
In specific IDS-based ecosystems, domain-specific adaptations – also known as Application Profiles – of the Information Model may be used to describe Resources, Participants, infrastructure and other constituents of an International Data Space. The definition of such domain-specific Vocabularies should follow best practices established, e.g., by the DCAT Application Profile for data portals in Europe (DCAT-AP), which tailors the specification of DCAT “to the specific application needs of data portals in Europe while providing semantic interoperability with other applications on the basis of reuse of established Controlled Vocabularies […] and mappings to existing metadata Vocabularies”.
Further, independent domain-specific Vocabularies, which are not necessarily derived from the IDS Information Model, may be used to describe the Content of a Resource and the Concepts addressed by a Resource, as detailed in the respective sections below.
In the following, the pivotal concept of a Digital Resource is introduced, segregated into modules in accordance with the separation of concerns principle (SoC principle). To do so, a set of six broad concerns (“concern hexagon”) is provided.
Since version 3.0 of the IDS-RAM, this section of the document has been reduced to the same high level of abstraction as the other sections. Full versioning information is available from the repository that hosts the source code of the normative Declarative Representation as well as documentation covering further details on the Conceptual Representation. The remaining text has been edited to better present the Information Layer in the context of the other layers, and to provide up-to-date pointers to external standards reused.
A (Digital) Resource in the context of the International Data Spaces is a uniquely identifiable, valuable, digital (i.e., non-physical) commodity that can be traded and exchanged between remote participants using the IDS infrastructure. Following the web resource paradigm, the abstract content of a Resource may be provided in a variety of representations. Examples of Resources include documents, time series of sensor values, messages, image file archives, or media streams. Resources are subject to forwarding, processing, and/or consumption, with a particular demand for modeling related, complementary aspects (i.e., content, provenance, provisioning etc.). These are analyzed and specified here by applying the separation of concerns (SoC) .
Following the separation of concerns design principle, only one dimension of a subject matter is considered at a time, for the sake of clarity and consistency. Similar to the principle a microscope works, each concern follows a particular, analytical point of view, while other concerns can temporarily be disregarded. This principle can be applied to information modeling, aiming at a thorough understanding of the domain and fostering modularity and re-usability of the resulting (sub-)models. Accordingly designed, these models may evolve independently of each other and can be updated by different agents at different times. As any modification of a single element of the overall model does not require a change in other, logically unrelated parts, the development and maintenance of models can be substantially simplified.
To illustrate the main modeling [c]{.underline}oncerns of Digital Resources in a way easy to memorize, the mnemonic hexagonal arrangement of [c]{.underline}arbon atoms can be used (“C-Hexagon”), as shown in Figure 3.3.2.
As a Resource's content is its most essential aspect, Content is located at the top of the hexagon. The Content concern deals with
the description of a Resource's abstract substance,
its serialization as a representation in a machine-interpretable format, making use of Vocabularies as appropriate, and
the materializations of these representations at certain points in time as one or more instances (e.g., values or artifacts).
Content is interpretable by references to a shared, formally defined Concept, which may cover the meaning, annotation and interpretation of entities by, e.g.,
natural language keywords,
terms defined in curated sources such as Controlled Vocabularies, concept schemes, taxonomies, thesauri, etc., or
types defined in type systems or ontologies, i.e., Vocabularies.
On the other hand, links to a particular Context (in terms of, e.g.,
time,
place, or
real-world entities)
make the Content potentially relevant for certain Data Consumers.
Thus, the upper part of the C-Hexagon deals with the “what” aspects, independently of Data Exchange, Data Sharing or Data Utilization. The lower part relates to the “how” aspects; i.e., how the content is exchanged (Communication) and under which conditions (Commodity).
The Communication concern deals with means to communicate a Resource's Content in one of the Representations available, e.g.,
by sending messages in some communication protocol
to a resource or service endpoint or to an IDS Connector
in order to perform an operation.
The Commodity concern helps to address the value and utility of a Resource in terms of, e.g.,
its provenance,
its quality, and
the (usage) policies attached to it, e.g., the obligation to pay a certain price for its consumption.
The Community of Trust concern refers to the distinctive feature of the International Data Spaces being an ecosystem of certified participants operating certified components, such as Connectors. Using such components, Participants exchange and share Digital Resources in a secure and trusted way in accordance with contracts composed of usage policies, thus ensuring data sovereignty.
The level of detail differs across the individual concerns. The selection of their constituting aspects may change in light of new requirements and insights; Fig. 3.3.3 suggests one such expansion of the C-Hexagon to one more level of detail.
Modeling concerns may inform, but do not necessarily correspond to any physical organization of the model (e.g., modules or directories).
https://www.odata.org/
https://github.com/International-Data-Spaces-Association/InformationModel
Data Catalog Vocabulary (DCAT) - Version 2. W3C Recommendation 04 February 2020. https://www.w3.org/TR/vocab-dcat-2/
SKOS Simple Knowledge Organization System Reference. W3C Recommendation 18 August 2009. https://www.w3.org/TR/skos-reference/
OWL 2 Web Ontology Language Document Overview (Second Edition). W3C Recommendation 11 December 2012. https://www.w3.org/TR/owl2-overview/
SPARQL 1.1 Query Language. W3C Recommendation 21 March 2013. https://www.w3.org/TR/sparql11-query/
Known implementations are listed at https://github.com/International-Data-Spaces-Association/InformationModel/.
R. Fielding. "Architectural Styles and the Design of Network-based Software Architectures," 2000. PhD thesis. Table 5-1 "REST Data Elements". Available: https://www.ics.uci.edu/fielding/pubs/dissertation/rest_arch_style.htm$ tab_5
DCAT Application Profile for data protals in Europe. https://joinup.ec.europa.eu/collection/semantic-interoperability-community-semic/solution/dcat-application-profile-data-portals-europe