Specification

This document outlines the key elements of the Transfer Process Protocol. The used terms are described here.

1 Introduction

A Transfer Process (TP) involves two parties, a Provider that offers one or more Datasets under a Usage Policy and a Consumer that requests Datasets. A TP progresses through a series of states, which are controlled by the Provider and Consumer using messages. A TP transitions to another state as a result of an exchanged message.

1.1 Prerequisites

To put the document into the right context, some non-normative descriptions of the core concepts follow in this subsection.

1.1.1 Control and Data Planes

A TP involves two logical constructs, a control plane and a data plane. Serving as a coordinating layer, services on the control plane receive messages and manage the local state of the TP (same as for the Catalog Protocol and the Contract Negotiation Protocol). On the data plane, the actual transfer of data takes place using a wire protocol. Both participants in a data sharing scenario run services logically regarded as control and/or data plane services.

The specification of data plane interfaces and interaction patterns are not in scope of this document.

1.1.2 Data Transfer Types

Dataset transfers are characterized as push or pull transfers and it's data is either finite or non-finite. This section describes the difference between these types.

Push Transfer

A push transfer is when the Provider's data plane initiates sending data to a Consumer endpoint. For example, after the Consumer has issued a Transfer Request Message, the Provider begins data transmission to an endpoint specified by the Consumer using an agreed-upon wire protocol.

Note that the illustration of the sequence is only exemplary. The activation of actors is not determined, also, responses, parameters, possible recursions, and interactions between the components of one participant are not shown.

Pull Transfer

A pull transfer is when the Consumer's data plane initiates retrieval of data from a Provider endpoint. For example, after the Provider has issued a Transfer Start Message, the Consumer can request the data from the Provider-specified endpoint.

Finite and Non-Finite Data

Data may be finite or non-finite. This applies to either push and pull transfers. Finite data is data that is defined by a finite set, for example, machine learning data or images. After finite data transmission has finished, the TP is completed. Non-finite data is data that is defined by an infinite set or has no specified end, for example, streams or an API endpoint. With non-finite data, a TP will continue indefinitely until either the Consumer or Provider explicitly terminates the transmission.

1.2 States

The TP states are:

REQUESTED: A Dataset has been requested under an Agreement by the Consumer and the Provider has sent an ACK response.
STARTED: The Dataset is available for access by the Consumer or the Provider has begun pushing the data to the Consumer endpoint.
COMPLETED: The transfer has been completed by either the Consumer or the Provider.
SUSPENDED: The transfer has been suspended by the Consumer or the Provider.
TERMINATED: The Transfer Process has been terminated by the Consumer or the Provider.

1.3 State Machine

The TP state machine is represented in the following diagram:

Transitions marked with C indicate a message sent by the Consumer, transitions marked with P indicate a Provider message. Terminal states are final; the state machine may not transition to another state.

2 Message Types

All messages must be serialized in JSON-LD compact form as specified in the JSON-LD 1.1 Processing Algorithms and API. Further Dataspace specifications may define additional optional serialization formats.

2.1 Transfer Request Message

Sent by

Consumer

Resulting state

REQUESTED

Response

ACK or ERROR

Schema

TTL Shape, JSON Schema

Example

Message

Diagram(s)

The Transfer Request Message is sent by a Consumer to initiate a TP.

The consumerPid property refers to the transfer id of the Consumer side.
The agreementId property refers to an existing contract Agreement between the Consumer and Provider.
The dct:format property is a format specified by a Distribution for the Dataset associated with the Agreement. This is generally obtained from the Provider's Catalog.
The dataAddress property must only be provided if the dct:format requires a push transfer.
The dataAddress contains a transport-specific endpoint address for pushing the data. It may include a temporary authorization via the endpointProperties property.
callbackAddress is a URI indicating where messages to the Consumer should be sent. If the address is not understood, the Provider MUST return an UNRECOVERABLE error.
The endpointProperties may contain the following optional values:
- authorization - An opaque authorization token that clients must present when accessing the transport-specific endpoint address.
- authType - The auth token type. For example, the value may be bearer. If present, this value may be used in conjunction with transport rules to define how the client must present an authorization token.

Note that Providers should implement idempotent behavior for Transfer Request Messages based on the value of consumerPid. Providers may choose to implement idempotent behavior for a certain period of time. For example, until a TP has completed and been archived after an implementation-specific expiration period, repeated sending of Transfer Request Messages does not change the state of the TP. If a request for the given consumerPid has already been received and the same Consumer sent the original message again, the Provider should respond with an appropriate Transfer Start Message.

Once a TP has been created, all associated callback messages must include a consumerPid and providerPid.
Providers must include a consumerPid and a providerPid property in the object.
Valid states of a TP are REQUESTED, STARTED, TERMINATED, COMPLETED, and SUSPENDED.

2.2 Transfer Start Message

Sent by

Provider

Resulting state

STARTED

Response

ACK or ERROR

Schema

TTL Shape, JSON Schema

Example

Message

Diagram(s)

The Transfer Start Message is sent by the Provider to indicate the data transfer has been initiated.

The dataAddress is only provided if the current transfer is a pull transfer and contains a transport-specific endpoint address for obtaining the data. It may include a temporary authorization via the endpointProperties property.
The endpointProperties may contain the following optional values:
- authorization - An opaque authorization token that clients must present when accessing the transport-specific endpoint address.
- authType - The auth token type. For example, the value may be bearer. If present, this value may be used in conjunction with transport rules to define how the client must present an authorization token.

2.3 Transfer Suspension Message

Sent by

Consumer, Provider

Resulting state

SUSPENDED

Response

ACK or ERROR

Schema

TTL Shape, JSON Schema

Example

Message

Diagram(s)

The Transfer Suspension Message is sent by the Provider or Consumer when either of them needs to temporarily suspend the TP.

2.4 Transfer Completion Message

Sent by

Consumer, Provider

Resulting state

COMPLETED

Response

ACK or ERROR

Schema

TTL Shape, JSON Schema

Example

Message

Diagram(s)

The Transfer Completion Message is sent by the Provider or Consumer when a data transfer has completed. Note that some Connector implementations may optimize completion notification by performing it as part of their wire protocol. In those cases, a Transfer Completion Message does not need to be sent.

2.5 Transfer Termination Message

Sent by

Consumer, Provider

Resulting state

TERMINATED

Response

ACK or ERROR

Schema

TTL Shape, JSON Schema

Example

Message

Diagram(s)

The Transfer Termination Message is sent by the Provider or Consumer at any point except a terminal state to indicate the TP should stop and be placed in a terminal state. If the termination was due to an error, the sender may include error information.

3 Response Types

The ACK and ERROR response types are mapped onto a protocol such as HTTPS. A description of an error might be provided in protocol-dependent forms, e.g., for an HTTPS binding in the request or response body.