Serialization and Publication Formats

The Human Services Data Specification provides a structured model and for capturing and sharing machine-readable information about health, human, and social services.

The current canonical version of this data model is provided by a set of JSON Schema files which describe individual objects in terms of their structures, properties, and definitions. These JSON Schema files also describe the relationships between the different objects that make up the HSDS data model. Compliance of data with HSDS should be assessed against these schemas.

Prior to version 3.0 HSDS was defined and exchanged using the Frictionless Data Package Specification. Although the canonical format of HSDS is JSON since version 3.0, there is still support for Data Packages through the Tabular Data Package serialization compiled from the canonical JSON schema files. This means that publishers can choose to publish and exchange HSDS data in a Tabular Data Package format.

Other serializations of HSDS (e.g. XML, etc.) are not considered conformant to HSDS.

JSON

Publishers should consider publishing HSDS in JSON format where possible.

JSON is widely considered a de facto standard for exchanging data on the web. The JSON Schema language provides features to define JSON data models which can be used for validating datasets. The canonical format for HSDS since version 3.0 is JSON, and the canonical schemas are defined in JSON Schema.

The HSDS JSON schemas along with example data are provided in the Schema Reference.

Dereferencing

HSDS JSON should be dereferenced when published. JSON data is structured as a tree with data elements nested inside each other.

The canonical HSDS Schemas are modular with multiple schemas each defining a single object. Where there is a relationship between objects, they make use of JSON Schema’s $ref keyword to refer to another schema. For example, organization.json has the property locations which is an array of Locations. Locations are defined in location.json, so this array refers directly to the location.json as the canonical source of the Location data model. Publishers should embed the location data, conformant to the model defined in location.json, as an item in the locations array.

Publishers may find the compiled schemas useful. These are generated directly from the canonical HSDS JSON schemas and provide a fully de-referenced JSON schema for various representations of the HSDS data model (e.g. a Service-oriented view, an Organization-oriented view). The compiled schemas are not considered canonical themselves, though, and are generated to provide utility to publishers and tools working with HSDS JSON data.

Serializing HSDS JSON for APIs

There are additional considerations for publishing HSDS JSON through an API.

Due to the nature of HSDS’ data model, the canonical HSDS JSON Schemas do not provide an official packaging format for publishing or exchanging multiple records in a single file. To provide this function in HSDS APIs, the API Reference provides an embedded Page schema which is used in several endpoints. Page is documented on the API Reference

Page is not considered part of the HSDS 3.0 data model, but it is part of the API Reference. Therefore publishers seeking compliance with the HSDS 3.0 API Reference should ensure that they are using this correctly.

Tabular Data Package

HSDS may also be serialized as a Tabular Data Package. In this serialization, data is published using a series of CSV files (one per object) and accompanied by a package descriptor in JSON format with the filename datapackage.json.

Instead of dereferencing and embedding objects such as in the canonical JSON serialization, each object can refer to others via its id property in the appropriate column. This makes id behave like a foreign key in this serialization.

We provide an existing package descriptor generated directly from the canonical HSDS JSON Schema files. It is available here and contains details of field names and file names for this serialization. Publishers should use this to support their Tabular Data Package serialization rather than develop their own datapackage.json file. Examples CSV files are available in the HSDS Github repo here.

Prior to HSDS 3.0, Tabular Data Packages were the primary publication format for HSDS data.