How Upload Your Data and Metadata to the OEP¶
The development of tutorials for the Open Energy Family takes place publicly in a dedicated tutorial repository.
Please report bugs and suggestions as new issues.
license: GNU Affero General Public License Version 3 (AGPL-3.0)
copyright: Reiner Lemoine Institut
authors: christian-rli, jh-RLI, Ludee
If Jupyter Notebooks are new to you and you'd like to get an introduction, have a look at this less than 10 minute introduction video. Official installation instructions are available on jupyter's readthedocs page.
Introduction¶
This resource will go through the technical process of uploading data to the OEDB. It uses example data from a csv file and corresponding metadata to illustrate the process. In order to replicate it with your own data in a jupyter notebook, you can use this empty upload template with the same structure.
If you need more context on the used tools and how to install them, have a look at the Upload Process Guidebook.
Setup¶
You need to be signed up to the OEP to access your username and API Token. To run this Jupyter Notebook you need to create an execution environment with all the following dependencies installed: oem2orm
, pandas==1.2.4
, sqlalchemy==1.3.16
, requests
, oep_client
, oedialect
.
from oem2orm import oep_oedialect_oem2orm as oem2orm
import os
import pandas as pd
import getpass
Setting up the oem2orm logger¶
If you want to see detailed runtime information on oem2orm functions or if errors occur, you can activate the logger with this simple setup function.
oem2orm.setup_logger()
Connection to OEP¶
To connect to the OEP you need your OEP Token and user name. Note: You ca view your token on your OEP profile page after logging in. The following command will prompt you for your token and store it as an environment variable. When you paste it here, it will only show dots instead of the actual string.
os.environ["OEP_API_TOKEN"] = getpass.getpass('Token:')
Provide your OEP-username to oem2orm in order to create a connection to the database. Your token is taken from the environment variable you've created above. Note: Using white space in your name is fine.
db = oem2orm.setup_db_connection()
Creating sql tables from oemetadata¶
The oemetadata format is a standardised json file format and required for all data uploaded to the OEP. It includes the data model and the used data types. This allows us to derive the necessary tables in sqlalchemy from it.
In order to create the table(s) we need to tell python where to find our oemetadata file first. To do this we place them in the folder "metadata" which is in the current directory (Path of this jupyter notebbok). Provide the path to your own folder if you want to use your own metadata. oem2orm will process all files that are located in the folder.
metadata_folder = oem2orm.select_oem_dir(oem_folder_name="metadata")
The next command will set up the table. The collect_tables_function collects all metadata files in a folder and retrives the SQLAlchemy ORM objects and returns them. The Tables are ordered by foreign key. Having a valid metadata strings is necessary for the following steps.
tables_orm = oem2orm.collect_tables_from_oem(db, metadata_folder)
Now we can use create our table objects in the database.
#create table
oem2orm.create_tables(db, tables_orm)
The tables should now be public, but empty on the OEP at the location provided in the metadata file. For this example tutorial, the created table is located in model_draft.upload_tutorial_example_data. If you've just been playing around and don't want to write any data to the OEP, please make sure to delete your tables again.
Deleting your table¶
Running the following commands will delete the tables from the database which you have in your ordered ORM. Take care that you only delete tables you actually want to have removed.
# In order to actually delete, you will need to type: yes
oem2orm.delete_tables(db, tables_orm)
Writing data into a table¶
In this example we will upload data from a csv file. Pandas has a read_csv function which makes importing a csv-file rather comfortable. It reads csv into a DataFrame. By default, it assumes that the fields are comma-separated. Our example file has columns with semicolons as separators, so we have to specify this when reading the file.
The example file for this tutorial ('upload_tutorial_example_data.csv') is in the 'data' directory, next to this tutorial. Make sure to adapt the path to the file you're using if your file is located elsewhere.
# db = oem2orm.setup_db_connection()
filepath = "./data/upload_tutorial_example_data.csv"
example_df = pd.read_csv(filepath, encoding='utf8', sep=';')
# show the first 10 row´s
example_df[:10]
We need to define the location in the OEDB where the data should be written to. The connection information is still available from our steps above.
# change to the table schema
schema = "sandbox"
# change to the table name
table_name = "upload_tutorial_example_data"
connection = db.engine
The following command will write the content of your dataframe to the table on the OEP that was created earlier.
The table is uploaded in chunks to ensure a stable connection to the OEP when uploading large tables.
Take a look at your table on the OEP after succesfull upload or for debugging in case of an exception!
# Define chunksize. The chunksize defines the number of uploaded csv-rows per request.
chunksize = 6000
with pd.read_csv(filepath, encoding='utf8', sep=';', chunksize=chunksize) as reader:
for chunk in reader:
print(f'Uploading chunk from table: {table_name}.')
try:
chunk.to_sql(table_name, connection, schema=schema, if_exists='append', index=False)
print(f'Appended {len(chunk)} rows to table on OEP')
except Exception as e:
print(f'Writing to {schema}.{table_name} failed!')
print('Delete and recreate with the commands above, if you want to repeat your upload again.')
print('The program throws the following exception:')
print(e)
finally:
print(f'Take a look at your table on the OEP: https://openenergy-platform.org/dataedit/view/{schema}/{table_name}')
Writing metadata to the table¶
Now that we have data in our table it's high time, that we attach our metadata to it. Since we're using the api, some direct http-requests and a little helper function from the oep-client, we need to import these new dependencies.
import json
import requests
We use the metadata folder we set up before. (See the Creating tables section) If you wan´t to set another folder use the code below:
# oem_path = oem2orm.select_oem_dir(oem_folder_name="metadata")
md_file_name = "example_metadata"
First we're reading the metadata file into a json dictionary.
metadata = oem2orm.mdToDict(oem_folder_path=metadata_folder, file_name=md_file_name)
Then we need to validate the metadata.
oem2orm.omi_validateMd(metadata)
Now we can upload the metadata.
oem2orm.api_updateMdOnTable(metadata)
If you still have the page on the OEP with your data open, refresh it. It should now show you the metadata on its side.