Data Pipeline

The conformance services pipeline has a series of stages. This page provides an overview; see the individual pages for each stage for details.

Spidering

The Spidering stage takes an OpenActive Data Catalog and follows the links, resulting in a list of URLs where OA feeds are to be found. The value is hard-coded for this project, but developers can change this if required for their application.

Harvesting

The Harvesting stage takes the list of URLs from the Spidering stage and downloads the contents of the feed from each one, storing the results in the database.

Validation

The Validation stage compares the harvested data from each of the feeds against the OpenActive specification, and gives a yes/no answer for each item in the feed, recording any errors that were encountered.

Normalisation

The Normalisation stage takes the harvested data and transforms it into conformant, normalised OpenActive data, according to a fixed set of patterns. This allows data from sources that publish conformant data in different ways to be used together.

Profiling

The Profiling stage compares the normalised data against the OpenActive Profiles and reports on the degree to which each item matches the profile

Republication

The republication stage makes an RPDE feed of the normalised data available.

Error Capture

Any errors encountered during processing of the data at any stage are stored in the database to aid with improvement efforts.

PreviousSystem Overview NextSpidering

Last updated 4 years ago