Guidebook - How to Publish Your Data on the OEP¶
license: GNU Affero General Public License Version 3 (AGPL-3.0)
copyright: Reiner Lemoine Institut
authors: christian-rli, jh-RLI, Ludee
If Jupyter Notebooks are new to you and you'd like to get an introduction, have a look at this less than 10 minute introduction video. Official installation instructions are available on jupyter's readthedocs page.
If your goal is to publish your energy system research data and you're wondering just about what that entails, this is the right resource for you. This tutorial will guide you through the steps needed for publishing your data on the OEDB and supplying it with proper metadata. The process has the following steps:
1 Creation of metadata
2 Initiation of metadata review
3 Uploading data and metadata
This document describes each of these steps, refers to resources relevant for the tasks and ends by introducing the tools needed to upload your data. If your system is already set up, you can jump to an upload tutorial. If all of this is new to you, we recommend reading on. By uploading your data in this process described below, you will ensure that your data is:
- licensed openly
- supplied with sources
1 Creation of metadata¶
Metadata provide information about other data. By creating metadata for your dataset, you are making sure, that it is easier to find and to understand, it's supplied with its sources and its structure is well described. The OEP uses oemetadata as a standard, which was developed specifically for energy research modelling. It is based on json and compatible to the datapackage standard. An example oemetadata file in the current version is available in the oemetadata repository.
Filling out a string for the first time might take a while. Once you're familiar with what the different fields stand for, it'll become much faster. You can start with an empty template string and just work your way from top to bottom. A description for every key of the string with example entries is available on the oemetadata repository.
Take care when creating your resources that your data types correspond to postgresql datatypes, because these definitions will be used when uploading your data. Most likely, you will mostly use these types:
- For strings or any other kind of textual data, use "text"
- For numbers without fractional component, use "integer"
- For number with a fractional component, use "float"
Proprietary data is not currently allowed on the OEP. We recommend using ODbL-1.0.
2 Initiation of metadata review:¶
Once your metadata string is ready, send it in for a review! The review process takes place publicly on GitHub. Create a new issue in the data-preprocessing repository and follow the workflow described in the issue.
This includes uploading your metadata string to the folder data-review in a new branch and referencing the issue number. If you're unable or don't know how to do this, just attach your string to a comment in the issue you created and get in contact with a reviewer. The reviewer can help you with this.
There is a reference manual for reviewers which will be followed to complete the review process. Once the review is done, the new branch will be merged with the master branch and if your data is on the OEP, a reviewed badge will eventually show next to it (this still needs to be implemented).
So you may just continue with the next step.
3 Uploading data¶
The OEDB is a PostgreSQL database with a public, RESTful API. The OEP functions as an interface to it. Downloads from the OEDB can be carried out by anyone without registration. However, in order to upload, you will need a user account on the OpenEnergyPlatform. There you will be presented with a token, which allows you to upload data via the public API. If you haven't got one already, create a new account. To retrieve your token, do the following:
Click on the login button and sign in on the OEP.
3.1.2. Copy your token¶
Click on your Profile Name to see your information. To view your token, click on "Show token"
3.2. Using the API¶
You can access the API with any tool that can send HTTP-requests. That means you can technically use the address bar of your browser to access data on the OEDB.
For example the following links will return a json string of the columns and the rows respectively of the wind_turbine_library, a dataset published on the OEP:
Extending on this browser based approach to more complex tasks would be unfeasible however. You would also need to configure your browser to send your token in the header of every request in order to upload any data.
Using the API documentation for working with the OEDB, users in the community can build their own tools to access the OEP. There is already a suite of small tools, mainly written in python, that aim to facilitate access. There is a hands on tutorial that will guide you through the process of uploading example data and metadata from files on your computer all the way into a publicly visible place on the OEP. The structure of the same tutorial is available as a template for you to fill in with your own data. These tutorials are going to use a range of tools. A brief description of these tools and how to set them up on your system will make up the rest of this guidebook.
SQLAlchemy is the python toolkit to work with PostgreSQL databases. SQLAlchemy internally uses so called "dialects" to provide a consistent interface to different database drivers. The oedialect supplies your SQLAlchemy installation with a dialect using the REST-API of the Open Energy Platform (OEP). In short, the oedialect allows you to use SQLAlchemy to down- and upload data to the OEP, which is helpful if you're using python as a means to handy your data. Installation instructions for SQLAlchemy for different operating systems are available on liquidweb. To install the oedialect, you can use pip:
pip install oedialect
oem2orm stands for oemetadata to object relational mapping. It's a convenience tool written in python. It can create an engine to connect you to the OEP with your username and token and has a function, which reads all the metadata metadata strings in a folder and based on the contained information creates tables on the OEP using the oedialect. You can install it using pip:
pip install oem2orm
The oep-client is a command line tool written in python. It offers a range of functionalities, including, downloading data and metadata, creating a table, uploading data, updating a table's metadata, and deleting tables that you created. As it is written in python it's also possible to import its functions to your python project. You can install the oep-client using pip:
python3 -m pip install --upgrade oep-client
pandas is an open source library providing easy-to-use data structures and data analysis tools for the python. Pandas is often used in research to handle data and it is also used in several descriptions to read data into python before uploading it to the OEDB. Use pip to install pandas:
pip install pandas