Normalizing / Denormalizing¶
This tutorial explains how to transform a custom number of columns of a table into rows (Normalizing) and vice versa (Denormalizing).
An additional Video tutorial is here.
Please report bugs and improvements here: https://github.com/OpenEnergyPlatform/examples/issues
In [1]:
Copied!
__copyright__ = "Reiner Lemoine Institut"
__license__ = "GNU Affero General Public License Version 3 (AGPL-3.0)"
__url__ = "https://github.com/openego/data_processing/blob/master/LICENSE"
__author__ = "oakca, Ludee"
__copyright__ = "Reiner Lemoine Institut"
__license__ = "GNU Affero General Public License Version 3 (AGPL-3.0)"
__url__ = "https://github.com/openego/data_processing/blob/master/LICENSE"
__author__ = "oakca, Ludee"
In [2]:
Copied!
import pandas as pd
import pandas as pd
In [3]:
Copied!
# open df.xlsx
with pd.ExcelFile('df.xlsx') as xls:
# save Sheet1 in df.xlsx as df
df = xls.parse('Sheet1')
# show df
df
# open df.xlsx
with pd.ExcelFile('df.xlsx') as xls:
# save Sheet1 in df.xlsx as df
df = xls.parse('Sheet1')
# show df
df
Out[3]:
Site | Process | inst-cap | cap-lo | cap-up | inv-cost | fix-cost | var-cost | wacc | depreciation | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Mid | Coal plant | 0 | 0 | 0 | 600000 | 0 | 0.6 | 0.07 | 40 |
1 | Mid | Lignite plant | 0 | 0 | 60000 | 600000 | 0 | 0.6 | 0.07 | 40 |
2 | Mid | Gas plant | 0 | 0 | 80000 | 450000 | 0 | 1.6 | 0.07 | 30 |
3 | Mid | Biomass plant | 0 | 0 | 5000 | 875000 | 0 | 1.4 | 0.07 | 25 |
4 | Mid | Wind plant | 0 | 0 | 13000 | 1500000 | 0 | 0.0 | 0.07 | 25 |
5 | Mid | Solar plant | 0 | 0 | 160000 | 600000 | 0 | 0.0 | 0.07 | 25 |
6 | Mid | Hydro plant | 0 | 0 | 1400 | 1600000 | 0 | 0.0 | 0.07 | 50 |
7 | South | Coal plant | 0 | 0 | 100000 | 600000 | 0 | 0.6 | 0.07 | 40 |
8 | South | Lignite plant | 0 | 0 | 0 | 600000 | 0 | 0.6 | 0.07 | 40 |
9 | South | Gas plant | 0 | 0 | 100000 | 450000 | 0 | 1.6 | 0.07 | 30 |
10 | South | Biomass plant | 0 | 0 | 0 | 875000 | 0 | 1.4 | 0.07 | 25 |
11 | South | Wind plant | 0 | 0 | 200000 | 1500000 | 0 | 0.0 | 0.07 | 25 |
12 | South | Solar plant | 0 | 0 | 600000 | 600000 | 0 | 0.0 | 0.07 | 25 |
13 | South | Hydro plant | 0 | 0 | 0 | 1600000 | 0 | 0.0 | 0.07 | 50 |
14 | North | Coal plant | 0 | 0 | 100000 | 600000 | 0 | 0.6 | 0.07 | 40 |
15 | North | Lignite plant | 0 | 0 | 0 | 600000 | 0 | 0.6 | 0.07 | 40 |
16 | North | Gas plant | 0 | 0 | 100000 | 450000 | 0 | 1.6 | 0.07 | 30 |
17 | North | Biomass plant | 0 | 0 | 6000 | 875000 | 0 | 1.4 | 0.07 | 25 |
18 | North | Wind plant | 0 | 0 | 60000 | 1500000 | 0 | 0.0 | 0.07 | 25 |
19 | North | Solar plant | 0 | 0 | 3000 | 600000 | 0 | 0.0 | 0.07 | 25 |
20 | North | Hydro plant | 0 | 0 | 20000 | 1600000 | 0 | 0.0 | 0.07 | 50 |
Normalizing¶
In [4]:
Copied!
# Normalizing the df with melt (we keep the two columns 'Site' and 'Process'), we also add column unit, sort the values
# and make sure no old indexes are used as columns.
# As a result we have the two kept columns 'Site' and 'Process' and all other columns are transposed into rows
# labeled as variable and the respective values are found in value. The newly added column unit contains the respective units
norm = df.melt(['Site', 'Process']).assign(unit='').sort_values(['Site','Process']).reset_index(drop=True)
# assign values for unit
unit = {'inst-cap': 'MW', 'cap-lo': 'MW', 'cap-up': 'MW',
'inv-cost': '€/MW', 'fix-cost': '€/MW/a', 'var-cost': '€/MWh',
'wacc': None, 'depreciation': 'a'}
# include units in table
norm['unit'] = norm['variable'].map(unit)
# show normalized df
norm
# Normalizing the df with melt (we keep the two columns 'Site' and 'Process'), we also add column unit, sort the values
# and make sure no old indexes are used as columns.
# As a result we have the two kept columns 'Site' and 'Process' and all other columns are transposed into rows
# labeled as variable and the respective values are found in value. The newly added column unit contains the respective units
norm = df.melt(['Site', 'Process']).assign(unit='').sort_values(['Site','Process']).reset_index(drop=True)
# assign values for unit
unit = {'inst-cap': 'MW', 'cap-lo': 'MW', 'cap-up': 'MW',
'inv-cost': '€/MW', 'fix-cost': '€/MW/a', 'var-cost': '€/MWh',
'wacc': None, 'depreciation': 'a'}
# include units in table
norm['unit'] = norm['variable'].map(unit)
# show normalized df
norm
Out[4]:
Site | Process | variable | value | unit | |
---|---|---|---|---|---|
0 | Mid | Biomass plant | inst-cap | 0.00 | MW |
1 | Mid | Biomass plant | cap-lo | 0.00 | MW |
2 | Mid | Biomass plant | cap-up | 5000.00 | MW |
3 | Mid | Biomass plant | inv-cost | 875000.00 | €/MW |
4 | Mid | Biomass plant | fix-cost | 0.00 | €/MW/a |
5 | Mid | Biomass plant | var-cost | 1.40 | €/MWh |
6 | Mid | Biomass plant | wacc | 0.07 | None |
7 | Mid | Biomass plant | depreciation | 25.00 | a |
8 | Mid | Coal plant | inst-cap | 0.00 | MW |
9 | Mid | Coal plant | cap-lo | 0.00 | MW |
10 | Mid | Coal plant | cap-up | 0.00 | MW |
11 | Mid | Coal plant | inv-cost | 600000.00 | €/MW |
12 | Mid | Coal plant | fix-cost | 0.00 | €/MW/a |
13 | Mid | Coal plant | var-cost | 0.60 | €/MWh |
14 | Mid | Coal plant | wacc | 0.07 | None |
15 | Mid | Coal plant | depreciation | 40.00 | a |
16 | Mid | Gas plant | inst-cap | 0.00 | MW |
17 | Mid | Gas plant | cap-lo | 0.00 | MW |
18 | Mid | Gas plant | cap-up | 80000.00 | MW |
19 | Mid | Gas plant | inv-cost | 450000.00 | €/MW |
20 | Mid | Gas plant | fix-cost | 0.00 | €/MW/a |
21 | Mid | Gas plant | var-cost | 1.60 | €/MWh |
22 | Mid | Gas plant | wacc | 0.07 | None |
23 | Mid | Gas plant | depreciation | 30.00 | a |
24 | Mid | Hydro plant | inst-cap | 0.00 | MW |
25 | Mid | Hydro plant | cap-lo | 0.00 | MW |
26 | Mid | Hydro plant | cap-up | 1400.00 | MW |
27 | Mid | Hydro plant | inv-cost | 1600000.00 | €/MW |
28 | Mid | Hydro plant | fix-cost | 0.00 | €/MW/a |
29 | Mid | Hydro plant | var-cost | 0.00 | €/MWh |
... | ... | ... | ... | ... | ... |
138 | South | Hydro plant | cap-up | 0.00 | MW |
139 | South | Hydro plant | inv-cost | 1600000.00 | €/MW |
140 | South | Hydro plant | fix-cost | 0.00 | €/MW/a |
141 | South | Hydro plant | var-cost | 0.00 | €/MWh |
142 | South | Hydro plant | wacc | 0.07 | None |
143 | South | Hydro plant | depreciation | 50.00 | a |
144 | South | Lignite plant | inst-cap | 0.00 | MW |
145 | South | Lignite plant | cap-lo | 0.00 | MW |
146 | South | Lignite plant | cap-up | 0.00 | MW |
147 | South | Lignite plant | inv-cost | 600000.00 | €/MW |
148 | South | Lignite plant | fix-cost | 0.00 | €/MW/a |
149 | South | Lignite plant | var-cost | 0.60 | €/MWh |
150 | South | Lignite plant | wacc | 0.07 | None |
151 | South | Lignite plant | depreciation | 40.00 | a |
152 | South | Solar plant | inst-cap | 0.00 | MW |
153 | South | Solar plant | cap-lo | 0.00 | MW |
154 | South | Solar plant | cap-up | 600000.00 | MW |
155 | South | Solar plant | inv-cost | 600000.00 | €/MW |
156 | South | Solar plant | fix-cost | 0.00 | €/MW/a |
157 | South | Solar plant | var-cost | 0.00 | €/MWh |
158 | South | Solar plant | wacc | 0.07 | None |
159 | South | Solar plant | depreciation | 25.00 | a |
160 | South | Wind plant | inst-cap | 0.00 | MW |
161 | South | Wind plant | cap-lo | 0.00 | MW |
162 | South | Wind plant | cap-up | 200000.00 | MW |
163 | South | Wind plant | inv-cost | 1500000.00 | €/MW |
164 | South | Wind plant | fix-cost | 0.00 | €/MW/a |
165 | South | Wind plant | var-cost | 0.00 | €/MWh |
166 | South | Wind plant | wacc | 0.07 | None |
167 | South | Wind plant | depreciation | 25.00 | a |
168 rows × 5 columns
Denormalizing¶
In [5]:
Copied!
# denormalizing the norm with pivot_table, transforms the table back into the original state (apart from column order)
denorm = norm.pivot_table(values='value', index=['Site', 'Process'], columns='variable').reset_index()
# remove the variable axis name
denorm = denorm.rename_axis(None, axis=1)
# show denormalized df
denorm
# note: the only difference between denorm and df is that the column names are in alphabetic order
# denormalizing the norm with pivot_table, transforms the table back into the original state (apart from column order)
denorm = norm.pivot_table(values='value', index=['Site', 'Process'], columns='variable').reset_index()
# remove the variable axis name
denorm = denorm.rename_axis(None, axis=1)
# show denormalized df
denorm
# note: the only difference between denorm and df is that the column names are in alphabetic order
Out[5]:
Site | Process | cap-lo | cap-up | depreciation | fix-cost | inst-cap | inv-cost | var-cost | wacc | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Mid | Biomass plant | 0.0 | 5000.0 | 25.0 | 0.0 | 0.0 | 875000.0 | 1.4 | 0.07 |
1 | Mid | Coal plant | 0.0 | 0.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
2 | Mid | Gas plant | 0.0 | 80000.0 | 30.0 | 0.0 | 0.0 | 450000.0 | 1.6 | 0.07 |
3 | Mid | Hydro plant | 0.0 | 1400.0 | 50.0 | 0.0 | 0.0 | 1600000.0 | 0.0 | 0.07 |
4 | Mid | Lignite plant | 0.0 | 60000.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
5 | Mid | Solar plant | 0.0 | 160000.0 | 25.0 | 0.0 | 0.0 | 600000.0 | 0.0 | 0.07 |
6 | Mid | Wind plant | 0.0 | 13000.0 | 25.0 | 0.0 | 0.0 | 1500000.0 | 0.0 | 0.07 |
7 | North | Biomass plant | 0.0 | 6000.0 | 25.0 | 0.0 | 0.0 | 875000.0 | 1.4 | 0.07 |
8 | North | Coal plant | 0.0 | 100000.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
9 | North | Gas plant | 0.0 | 100000.0 | 30.0 | 0.0 | 0.0 | 450000.0 | 1.6 | 0.07 |
10 | North | Hydro plant | 0.0 | 20000.0 | 50.0 | 0.0 | 0.0 | 1600000.0 | 0.0 | 0.07 |
11 | North | Lignite plant | 0.0 | 0.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
12 | North | Solar plant | 0.0 | 3000.0 | 25.0 | 0.0 | 0.0 | 600000.0 | 0.0 | 0.07 |
13 | North | Wind plant | 0.0 | 60000.0 | 25.0 | 0.0 | 0.0 | 1500000.0 | 0.0 | 0.07 |
14 | South | Biomass plant | 0.0 | 0.0 | 25.0 | 0.0 | 0.0 | 875000.0 | 1.4 | 0.07 |
15 | South | Coal plant | 0.0 | 100000.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
16 | South | Gas plant | 0.0 | 100000.0 | 30.0 | 0.0 | 0.0 | 450000.0 | 1.6 | 0.07 |
17 | South | Hydro plant | 0.0 | 0.0 | 50.0 | 0.0 | 0.0 | 1600000.0 | 0.0 | 0.07 |
18 | South | Lignite plant | 0.0 | 0.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
19 | South | Solar plant | 0.0 | 600000.0 | 25.0 | 0.0 | 0.0 | 600000.0 | 0.0 | 0.07 |
20 | South | Wind plant | 0.0 | 200000.0 | 25.0 | 0.0 | 0.0 | 1500000.0 | 0.0 | 0.07 |
In [6]:
Copied!
# In case of columns containing NaN values, pivot_table will omit these columns in the denormalized table.
# Hence, in this case another option is to use unstack():
# denormalizing the norm with unstack
denorm = norm.set_index(['Site', 'Process', 'variable'])['value'].unstack().reset_index()
# remove the variable axis name
denorm = denorm.rename_axis(None, axis=1)
# show denormalized df
denorm
# In case of columns containing NaN values, pivot_table will omit these columns in the denormalized table.
# Hence, in this case another option is to use unstack():
# denormalizing the norm with unstack
denorm = norm.set_index(['Site', 'Process', 'variable'])['value'].unstack().reset_index()
# remove the variable axis name
denorm = denorm.rename_axis(None, axis=1)
# show denormalized df
denorm
Out[6]:
Site | Process | cap-lo | cap-up | depreciation | fix-cost | inst-cap | inv-cost | var-cost | wacc | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Mid | Biomass plant | 0.0 | 5000.0 | 25.0 | 0.0 | 0.0 | 875000.0 | 1.4 | 0.07 |
1 | Mid | Coal plant | 0.0 | 0.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
2 | Mid | Gas plant | 0.0 | 80000.0 | 30.0 | 0.0 | 0.0 | 450000.0 | 1.6 | 0.07 |
3 | Mid | Hydro plant | 0.0 | 1400.0 | 50.0 | 0.0 | 0.0 | 1600000.0 | 0.0 | 0.07 |
4 | Mid | Lignite plant | 0.0 | 60000.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
5 | Mid | Solar plant | 0.0 | 160000.0 | 25.0 | 0.0 | 0.0 | 600000.0 | 0.0 | 0.07 |
6 | Mid | Wind plant | 0.0 | 13000.0 | 25.0 | 0.0 | 0.0 | 1500000.0 | 0.0 | 0.07 |
7 | North | Biomass plant | 0.0 | 6000.0 | 25.0 | 0.0 | 0.0 | 875000.0 | 1.4 | 0.07 |
8 | North | Coal plant | 0.0 | 100000.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
9 | North | Gas plant | 0.0 | 100000.0 | 30.0 | 0.0 | 0.0 | 450000.0 | 1.6 | 0.07 |
10 | North | Hydro plant | 0.0 | 20000.0 | 50.0 | 0.0 | 0.0 | 1600000.0 | 0.0 | 0.07 |
11 | North | Lignite plant | 0.0 | 0.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
12 | North | Solar plant | 0.0 | 3000.0 | 25.0 | 0.0 | 0.0 | 600000.0 | 0.0 | 0.07 |
13 | North | Wind plant | 0.0 | 60000.0 | 25.0 | 0.0 | 0.0 | 1500000.0 | 0.0 | 0.07 |
14 | South | Biomass plant | 0.0 | 0.0 | 25.0 | 0.0 | 0.0 | 875000.0 | 1.4 | 0.07 |
15 | South | Coal plant | 0.0 | 100000.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
16 | South | Gas plant | 0.0 | 100000.0 | 30.0 | 0.0 | 0.0 | 450000.0 | 1.6 | 0.07 |
17 | South | Hydro plant | 0.0 | 0.0 | 50.0 | 0.0 | 0.0 | 1600000.0 | 0.0 | 0.07 |
18 | South | Lignite plant | 0.0 | 0.0 | 40.0 | 0.0 | 0.0 | 600000.0 | 0.6 | 0.07 |
19 | South | Solar plant | 0.0 | 600000.0 | 25.0 | 0.0 | 0.0 | 600000.0 | 0.0 | 0.07 |
20 | South | Wind plant | 0.0 | 200000.0 | 25.0 | 0.0 | 0.0 | 1500000.0 | 0.0 | 0.07 |
Please check your final table, depending on your data it might not always work as expected.¶
In [ ]:
Copied!