Catalog(s)

Sharing data among participants requires the provision of metadata -- regardless of the design of the data space (centralized, federated, or decentralized) and whether the data is open or protected. Information about the data needs to be published with an agreed-upon vocabulary for querying and with controls that regulate access to the catalog items.

Two participants can share data directly communicating off- or online without the need for a catalog. But for more participants a catalog function greatly increases the discoverability of data assets and services. If there is more than one catalog due to a federated or decentralized design, the catalog must allow federated searches of data assets in catalogs at multiple sites.

Catalogs don't provide the data asset itself, but they provide data contract offers (more on this in the section on data sharing below).

When choosing a target architecture for a data space, the design of the catalog function can fall somewhere along the spectrum between a central catalog, multiple federated catalogs, and many decentralized catalogs. Each has its own advantages and disadvantages. Compare the three main types of catalogs, depending on the implementation design of the DSGA, to evaluate their capabilities:

Catalog architecture

Advantages

Disadvantages

Centralized catalog

No deployment by individual participants

A central gatekeeper can arbitrarily exclude participants and their data from the catalog

Central control – a gatekeeper can regulate which entries are permissible and which are not

Single point of failure

Easy discovery as only one catalog needs to be queried

Potential performance bottle neck

Security issues will affect all members at once

Federated catalog

Deployment by a limited number of participants, while most participants don’t need to deploy any catalog components

Additional replication mechanisms are needed

Federated control – voting mechanisms for content control can be implemented

A small group of operators of federated catalog nodes can control participation in the data space

Decentralized catalog

Every participant can autonomously decide which catalog items they share with whom

Every participant needs to run a catalog component

No interference in the interaction between two participants through a 3rd party

A list of available catalogs needs to be either centrally provided through the DSGA or discoverable through a peer-to-peer protocol

Data Space as a whole is more resilient towards cyberattacks even though individual members can experience outages

Participants need to crawl each other’s catalogs to see which items are available

Easier to scale

Access policies

A best practice of access security is for an IT system to show users only what they need to know - to minimize the potential attack surface. The same is true for data contract offers (DCO) in a data space: Participants should only see the DCOs for which they are authorized to request a contract negotiation. This does not imply that the participant already has authorization for the data but only that a participant is allowed to see that the data exists. The permission to access is part of the data contract negotiation. Any catalog must implement attribute-based access control (ABAC) through access policies.

The most common access filter is that a participant proves membership to see which assets are in a data space. Filters can also be applied that make data assets accessible only to specific participant groups. For example, a participant who has a VC as a data space member, but also has an additional VC which attests that the participant is an auditor, could provide this participant access to audit log files or streams which are being shared as DCOs, but should not be visible to participants without the special auditor credentials.

In case a participant wants to make a DCO visible to other entities that are not participating in the data space and are merely using the technical mechanisms of the data space or have been directly informed about the existence of those DCOs, they could have an access policy which is simply a no-op, or allow-all policy.

Access policies can also be used as filters to control visibility/access to DCOs. For example, time-based policies can be used to control when DCOs can be negotiated, location-based policies can limit the audience to participants from a specific geographic region.

Last updated