Open Energy Metadata (OEMetadata) is a metadata standard designed specifically to be used on data for energy (systems) research. For Science, a metadata standard can provide unambiguity, transparency, objectivity, reliability, verifiability, openness, integrity and novelty. In short - it can help with good scientific practice. OEMetadata adhere to the FAIR principles, i.e. they ensure Findability, Accessibility, Interoperability, and Reuse of digital assets.
schema.org -> Schemas for structured data markup on web pages
PROV -> W3C specification providing a vocabulary to interchange provenance information
DCAT-AP -> Application profile for data portals in Europe based on the Data Catalog Vocabulary
They shaped OEMetadata to varying degrees. Some of them were too general, others too specific. The following requirements lead us to define our own standard:
Compatibility with csv and database tables
machine- and human readability
Coverage of all aspects of metadata
Coverage of all data and tailoring to energy system analysis
Compliance with FAIR criteria
Extensibility
Well defined compatibility with ontology and linked open data
Compatibility with DCAT-AP was originally planned, but the standard was found partly incompatible with datapackages
Compatible with all: timeseries, geodata, parameter collections, data produced by machines, data collaboratively collected
Our concept to include ontology references is depiced in a poster (pdf) which was created during the development stage. The resulting standard is based on Data Packages. The file format is JSON (and JSON-LD). In it's simplest form a Tabular Data Package is a csv file containing data, accompanied by a JSON file which describes the name and structure of the data. OEMetadata take the standard set of keys and possible values and extend it with ones useful for energy research. It is inspired by Dublin Core, INSPIRE and DataCite. The
development process is organized on GitHub and open for everyone to see and
participate in. The repository contains the following useful files:
Creating a table on the OEP can be done through the wizard. The menu has a section that helps you fill out OEMetadata to accompany your data
To help with the creation of a standalone metadata file, the OEP has a metadata creator (You will need to be logged in to use it)
There is a review process to maintain any given metadata on the OEP. This
process was created to replace the now deprecated process on GitHub. As a owner of a table on the OEP, you can ask for a review which will start a guided review process. At the end of the process a badge will be assigned to the metadata depicting its level of completeness:
The standard is under active development and currently available in version 1.6.0. The table with a full key description is shown here for convenience, but may not be as up to date as in the repository.
An Uniform Resource Identifier (URI) that unambiguously identifies the resource. This can be a URL on the data set. It can also be a Digital Object Identifier (DOI).
A description or abstract of the package. It should be usable as summary information for the entire package that is described by the metadata.
Example table used to illustrate the metadata structure and meaning.
5
language
An array of languages used within the described data structures (e.g. titles, descriptions). The language key can be repeated if more languages are used. Standard: IETF (BCP47)
en-GB, de-DE, fr-FR
6
subject
An array of objects with topics of the data in OEO terms.
An object that describes the general setting, environment, or project leading to the creation or maintenance of this dataset. In science this is can be the research project.
An object that describes the spatial context of the data it contains.
10.1
location
A location of the data. In case of data where the location can be described as a point. May be specified as coordinates, URI or addresses with street, house number and zip code.
52.433509, 13.535855
10.2
extent
A covered area. May be the name of a region, or the geometry of a bounding box.
Europe
10.3
resolution
Pixel size in case of a regular raster image. Reference to administrative level or other spatial division that is present as the smallest spatially distinguished unit size.
1 ha
11
temporal
An object with the time period covered in the data. Temporal information should either contain a "referenceDate" or the keys describing a time series; in rare cases both.
11.1
referenceDate
The base year, month or day. Point in time for which the data is meant to be accurate. The census data or a satellite image will have a reference date. Date Format is ISO 8601.
2016-01-01
11.2
timeseries
An array that describes the timeseries.
11.2.1
start
The beginning point in time of a time series.
2019-02-06T10:12:04+00:00
11.2.2
end
The end point in time of a time series.
2019-02-07T10:12:04+00:00
11.2.3
resolution
The time span between individual points of information in a time series.
30 s
11.2.4
alignment
An indicator whether stamps in a time series are left, right or middle.
left
11.2.5
aggregationType
Indicates whether the values are a sum, average or current.
An array of objects of the license(s) under which the described package is provided. It can depend on the licenses of the sources (copyleft or share-alike) or can be granted by the creator of the data.
An array of objects of the people or organizations who contributed to the data or metadata. Each object refers to one contributor. Every contributor must have a title and property. The path, email, role and organization properties are optional.
14.1
title
A name of the contributor.
Ludwig Hülk
14.2
email
A email address of the contributor or GitHub handle.
@Ludee
14.3
date
The date of the contribution. If the contribution took more than a day, use the date of the final contribiution. Date Format is ISO 8601.
2016-06-16
14.4
object
The target of the contribution. Which part of the package was supplied or changed. Can be the data, metadata or both (data and metadata).
An array of objects of the data. It describes the data resource as an individual file or (database) table.
15.1
profile
The profile of this descriptor according to the profiles specification. This information is retained in order to comply with the "Tabular Data Package" standard. Use "tabular-data-resource" for all tables.
tabular-data-resource
15.2
name
A name for the entire data package. The name must consist of only lowercase alphanumeric characters or underscore. It must not start with a number or underscore. In a database, this will be the name of the table within the schema containing it. The name can correspond to the file name (minus the file-extension) of the data file describing the resource, if it complies with the naming convention above. Name also contains information about the shema on the OEP, use "." to seperate shema from table name.
openstreetmap.osm_deu_line
15.3
path
A URL that should be a permanent http(s) address or other path directly linking to the resource.
The file extension. 'csv', 'xls', 'json' etc. would be expected to be the standard file extension for this type of resource. When you upload your data to the OEDB, in the shown metadata string, the format will be changed accordingly to 'PostgreSQL', since the data there are stored in a database.
PostgreSQL
15.5
encoding
Specifies the character encoding of the resource's data file. The values should be one of the "Preferred MIME Names" for a character encoding registered with IANA. If no value for this key is specified then the default is UTF-8.
An object that describes the structure of the present data. It contains all fields (columns of the table), the primary key and optional foreign keys.
15.6.1
fields
An array of objects describing a column and providing name, description, type and unit.
15.6.1.1
name
The name of the field. The name must consist of only lowercase alphanumeric characters or underscore. It must not start with a number or underscore.
year
15.6.1.2
description
A text describing the field.
Reference year for which the data were collected.
15.6.1.3
type
The data type of the field. In case of a geom column in a database, also indicate the shape and CRS.
geometry(Point, 4326)
15.6.1.4
unit
The unit, preferably SI-unit, that values in this field are mapped to. If 'unit' doesn't apply to a field, use 'null'. If the unit is given in a seperate field, reference this field.
MW
15.6.1.5
isAbout
An array of objects with describe the field in OEO terms.
A primary key is a field or set of fields that uniquely identifies each row in the table. It is recorded as an array, since it is possible to define the primary key as made up of several columns.
id
15.6.3
foreignKeys
A foreign key is a field that refers to a column in another table.
15.6.3.1
fields
The column in the table that is constrainted by the foreign key.
version
15.6.3.2
reference
The reference to the foreign table.
15.6.3.2.1
resource
The foreign resource (table).
schema.table
15.6.3.2.2
fields
The foreign resource column.
version
15.7
dialect
Object. A CSV Dialect defines a simple format to describe the various dialects of CSV files in a language agnostic manner. In case of a database, the values in the containing fields are 'null'.
15.7.1
delimiter
The delimiter specifies the character sequence which should separate fields (columns). Common characters are "," (comma), "." (point) and "\t" (tab).
,
15.7.2
decimalSeparator
A symbol used to separate the integer part from the fractional part of a number written in decimal form. Depending on language and region this symbol can be "." or ",".
Data uploaded through the OEP will go through a review process. The review will cover the data and metadata. It is done by the OEP community. See the OEP Data Review for detailed information. The review itself is documented at the specified path and a badge is rewarded with regards to completeness.
18.1
path
A URL that should be a permanent http(s) address directly linking to the documented review.
An object that describes the metadata themselves, their format, version and license. These fields should already be provided when you are filling out your metadata.
An array of objects. This section is used as a self-description of the final metadata file. It is text, intended for humans and includes a link to the metadata documentation, required value formats and similar remarks.