Visualisierung#

Ziel#

Wir visualisieren die Daten im EM-DAT Datensatz

Datenpipeline

# Install a pip package im Jupyter Notebook
!pip3 install pandas
import pandas as pd
data = pd.read_csv('../../data/clean_emdat.csv', index_col=0)
data
Year Country Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) CPI
1 1900 Cabo Verde Climatological Drought Drought 1 11000.0 11000.0 NaN 3.077091
2 1900 India Climatological Drought Drought 1 1250000.0 1250000.0 NaN 3.077091
3 1900 Jamaica Hydrological Flood No Subtype 1 300.0 300.0 NaN 3.077091
4 1900 Japan Geophysical Volcanic activity Ash fall 1 30.0 30.0 NaN 3.077091
5 1900 Turkey Geophysical Earthquake Ground movement 1 140.0 140.0 NaN 3.077091
... ... ... ... ... ... ... ... ... ... ...
10338 2022 Yemen Hydrological Flood Flash flood 1 3400.0 13.0 NaN NaN
10339 2022 South Africa Hydrological Flood No Subtype 7 143119.0 562.0 3.164000e+09 NaN
10340 2022 Zambia Hydrological Flood No Subtype 1 15000.0 3.0 NaN NaN
10341 2022 Zimbabwe Hydrological Flood No Subtype 1 0.0 0.0 NaN NaN
10342 2022 Zimbabwe Meteorological Storm Tropical cyclone 1 3000.0 0.0 NaN NaN

10342 rows × 10 columns

data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10342 entries, 1 to 10342
Data columns (total 10 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Year                          10342 non-null  int64  
 1   Country                       10342 non-null  object 
 2   Disaster Subroup              10342 non-null  object 
 3   Disaster Type                 10342 non-null  object 
 4   Disaster Subtype              10342 non-null  object 
 5   Total Events                  10342 non-null  int64  
 6   Total Affected                10342 non-null  float64
 7   Total Deaths                  10342 non-null  float64
 8   Total Damage (USD, original)  3796 non-null   float64
 9   CPI                           10149 non-null  float64
dtypes: float64(4), int64(2), object(4)
memory usage: 888.8+ KB

Das Dashboard…#

Recherchefragen#

  • Wie viele Todesopfer gibt es in einem Land?

  • Wie stark ist ein Land von Naturkatastrophen betroffen?

  • Welchen Anteil haben die unterschiedlichen Naturkatastrophentypen daran?

  • Wie hat sich die Anzahl der Naturkatastrophen über die Jahre hin entwickelt?

Wie viele Todesopfer gibt es in einem Land?#

Wie viele Todesopfer gibt es insgesamt?#

data["Total Deaths"].sum()
22845977.0

Wie viele Todesopfer gibt es insgesamt in Deutschland?#

data['Country'] == 'Germany'
1        False
2        False
3        False
4        False
5        False
         ...  
10338    False
10339    False
10340    False
10341    False
10342    False
Name: Country, Length: 10342, dtype: bool
data[data['Country'] == 'Germany']
Year Country Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) CPI
3121 1990 Germany Meteorological Storm No Subtype 6 64.0 64.0 4.440000e+09 48.218797
3289 1991 Germany Meteorological Storm No Subtype 1 0.0 0.0 5.000000e+06 50.260853
3431 1992 Germany Geophysical Earthquake Ground movement 1 1525.0 1.0 5.000000e+07 51.783162
3432 1992 Germany Hydrological Flood No Subtype 1 0.0 0.0 3.010000e+07 51.783162
3582 1993 Germany Hydrological Flood Riverine flood 1 100000.0 5.0 6.000000e+08 53.311620
... ... ... ... ... ... ... ... ... ... ...
9469 2019 Germany Meteorological Storm Convective storm 1 1.0 1.0 NaN 94.349092
9708 2020 Germany Meteorological Storm Extra-tropical storm 1 33.0 0.0 NaN 95.512967
9952 2021 Germany Hydrological Flood No Subtype 1 1000.0 197.0 4.000000e+10 100.000000
9953 2021 Germany Meteorological Storm Convective storm 2 604.0 1.0 NaN 100.000000
10188 2022 Germany Meteorological Storm Extra-tropical storm 3 2.0 7.0 NaN NaN

66 rows × 10 columns

data[data['Country'] == 'Germany']['Total Deaths'].sum()
9918.0
data_germany = data[data['Country'] == 'Germany']
data_germany['Total Deaths'].sum()
9918.0

Wie viele Todesopfer gab es insgesamt in Indien?#

data_land = data[data['Country'] == 'India']
data_land['Total Deaths'].sum()
4591827.0

Allgemeine Funktion die Gesamtzahl der Todesopfer eines Landes ausgibt#

def get_total_deaths(data, country):
    data_land = data[data['Country'] == country]
    return data_land['Total Deaths'].sum()
get_total_deaths(data, 'Japan')
239400.0
get_total_deaths(data, 'Sweden')
43.0
get_total_deaths(data, 'South Africa')
2576.0

Dashboard Teil II#

Welchen Anteil haben die unterschiedlichen Naturkatastrophentypen in Deutschland?#

germany_data = data[data['Country'] == 'Germany']
germany_data['Disaster Type'].value_counts(normalize=True)
Storm                   0.545455
Flood                   0.242424
Extreme temperature     0.166667
Earthquake              0.030303
Landslide               0.015152
Name: Disaster Type, dtype: float64

Visualisierung von DataFrames#

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

Matplotlib makes easy things easy and hard things possible.

Install matplotlib:

!pip3 install --upgrade pip
!pip3 install --upgrade Pillow
!pip3 install matplotlib

import matplotlib.pyplot as plt
germany_data = data[data['Country'] == 'Germany']
germany_data['Disaster Type'].value_counts()
Storm                   36
Flood                   16
Extreme temperature     11
Earthquake               2
Landslide                1
Name: Disaster Type, dtype: int64
germany_data['Disaster Type'].value_counts().plot(kind='bar')
<AxesSubplot: >
../_images/11d9774e7609c0bcaabd66a5d28195fe397a56881d1a7308022115b9944c73be.png
germany_data['Disaster Type'].value_counts().plot(kind='pie')
<AxesSubplot: ylabel='Disaster Type'>
../_images/e3f15ad9d3b7af74d120c2ed0de32fcf795780f6e26a5f1adedc23d1cca77733.png

Recherchefragen#

Welche Naturkatastrophen fordern die meisten Todesopfer?#

.groupby() gruppiert einen DataFrame nach den Werten einer oder mehreren Spalten.

Die Spalten nach denen man Gruppieren möchte werden als Argument übergeben. Danach folgt die gewünschte Berechnung auf dieser Gruppe. Das Ergebnis wird als DataFrame zurückgegeben.

data
Year Country Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) CPI
1 1900 Cabo Verde Climatological Drought Drought 1 11000.0 11000.0 NaN 3.077091
2 1900 India Climatological Drought Drought 1 1250000.0 1250000.0 NaN 3.077091
3 1900 Jamaica Hydrological Flood No Subtype 1 300.0 300.0 NaN 3.077091
4 1900 Japan Geophysical Volcanic activity Ash fall 1 30.0 30.0 NaN 3.077091
5 1900 Turkey Geophysical Earthquake Ground movement 1 140.0 140.0 NaN 3.077091
... ... ... ... ... ... ... ... ... ... ...
10338 2022 Yemen Hydrological Flood Flash flood 1 3400.0 13.0 NaN NaN
10339 2022 South Africa Hydrological Flood No Subtype 7 143119.0 562.0 3.164000e+09 NaN
10340 2022 Zambia Hydrological Flood No Subtype 1 15000.0 3.0 NaN NaN
10341 2022 Zimbabwe Hydrological Flood No Subtype 1 0.0 0.0 NaN NaN
10342 2022 Zimbabwe Meteorological Storm Tropical cyclone 1 3000.0 0.0 NaN NaN

10342 rows × 10 columns

data['Total Deaths'].sum()
22845977.0
data.groupby('Disaster Type')['Total Deaths'].sum()
Disaster Type
Animal accident                12.0
Drought                  11733889.0
Earthquake                2343594.0
Extreme temperature        194056.0
Flood                     7002950.0
Fog                          4000.0
Glacial lake outburst         262.0
Insect infestation              0.0
Landslide                   67417.0
Mass movement (dry)          4644.0
Storm                     1403607.0
Volcanic activity           86893.0
Wildfire                     4653.0
Name: Total Deaths, dtype: float64
data.groupby('Disaster Type').sum()
/tmp/ipykernel_88481/3612715955.py:1: FutureWarning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.
  data.groupby('Disaster Type').sum()
Year Total Events Total Affected Total Deaths Total Damage (USD, original) CPI
Disaster Type
Animal accident 2014 1 5.000000e+00 12.0 0.000000e+00 87.366298
Drought 1543555 792 2.869852e+09 11733889.0 2.098881e+11 43039.824471
Earthquake 2134925 1576 2.052983e+08 2343594.0 8.653494e+11 46522.663819
Extreme temperature 1115693 607 1.038223e+08 194056.0 6.886634e+10 38106.532522
Flood 7604541 5730 3.967537e+09 7002950.0 9.771118e+11 239869.775450
Fog 1952 1 4.000000e+03 4000.0 0.000000e+00 9.822074
Glacial lake outburst 4043 3 3.600000e+01 262.0 2.100000e+08 100.000000
Insect infestation 183314 95 2.802200e+06 0.0 2.292000e+08 4880.908507
Landslide 1277348 789 1.480567e+07 67417.0 1.125893e+10 36989.405750
Mass movement (dry) 93320 48 3.105800e+04 4644.0 2.090000e+08 2269.255805
Storm 5473348 4579 1.256494e+09 1403607.0 1.749046e+12 149355.146763
Volcanic activity 457628 269 1.004262e+07 86893.0 6.326912e+09 12030.182594
Wildfire 744227 453 1.836880e+07 4653.0 1.290320e+11 23624.803026

.groupby() kann auch auf mehrere Spalten gleichzeitig angewendet werden

data.groupby(['Disaster Type', 'Disaster Subtype'])['Total Deaths'].sum()
Disaster Type          Disaster Subtype                
Animal accident        No Subtype                                12.0
Drought                Drought                             11733812.0
                       No Subtype                                77.0
Earthquake             Ground movement                      2075949.0
                       No Subtype                               221.0
                       Tsunami                               267424.0
Extreme temperature    Cold wave                              17765.0
                       Heat wave                             172555.0
                       Severe winter conditions                3736.0
Flood                  Coastal flood                           5352.0
                       Flash flood                            70835.0
                       No Subtype                           2480837.0
                       Riverine flood                       4445926.0
Fog                    No Subtype                              4000.0
Glacial lake outburst  No Subtype                               262.0
Insect infestation     Grasshopper                                0.0
                       Locust                                     0.0
                       No Subtype                                 0.0
Landslide              Avalanche                               5227.0
                       Landslide                              41435.0
                       Mudslide                                4913.0
                       No Subtype                             15427.0
                       Rockfall                                 128.0
                       Subsidence                               287.0
Mass movement (dry)    Avalanche                                349.0
                       Landslide                               3774.0
                       No Subtype                                10.0
                       Rockfall                                 477.0
                       Subsidence                                34.0
Storm                  Convective storm                       22730.0
                       Extra-tropical storm                     558.0
                       No Subtype                             25135.0
                       Tropical cyclone                     1355184.0
Volcanic activity      Ash fall                               85831.0
                       Lava flow                                 33.0
                       No Subtype                               516.0
                       Pyroclastic flow                         513.0
Wildfire               Forest fire                             3334.0
                       Land fire (Brush, Bush, Pasture)         966.0
                       No Subtype                               353.0
Name: Total Deaths, dtype: float64

Visualisierung#

data.groupby('Disaster Type')['Total Deaths'].sum().plot(kind='pie')
<AxesSubplot: ylabel='Total Deaths'>
../_images/144c47dd44af3df58a7c64e89aa49031aa9eff92a9443a7cea3928e99f3aa14c.png

Welche Naturkatastrophen fordern die meisten Todesopfer in Deutschland?#

germany_data = data[data['Country'] == 'Germany']
germany_data.groupby('Disaster Type')['Total Deaths'].sum().plot(kind='pie')
<AxesSubplot: ylabel='Total Deaths'>
../_images/562b7072581dc34a6319a926106e49fb160affd6e57cb2d2bd716c97459ab491.png

Generelle Funktion#

country = 'India'
country_data = data[data['Country'] == country]
country_data.groupby('Disaster Type')['Total Deaths'].sum().plot(kind='pie')
<AxesSubplot: ylabel='Total Deaths'>
../_images/cc00518d1b571f4cfe75e982c3a6b593f098c4e455e02af4287960dbf12b472c.png
def plot_death_by_disastertype(data, country):
    country_data = data[data['Country'] == country]
    country_data.groupby('Disaster Type')['Total Deaths'].sum().plot(kind='pie')
plot_death_by_disastertype(data, 'Egypt')
../_images/de62e8ea2ea4454c295b9541de6dcbe493c91a7ca62ee5780b97693098c0d312.png

Dashboard Teil III#

Wie hat sich die Anzahl der Naturkatastrophen über die Jahre hin entwickelt?#

Berechne für jedes Jahr die Summe aus allen Events

data.groupby('Year')['Total Events'].sum()
Year
1900      6
1901      1
1902     10
1903     12
1904      4
       ... 
2018    320
2019    404
2020    395
2021    432
2022    315
Name: Total Events, Length: 123, dtype: int64
data['Total Events'].sum()
14943
data.groupby('Year').min()
Country Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) CPI
Year
1900 Cabo Verde Climatological Drought Ash fall 1 30.0 30.0 30000000.0 3.077091
1901 Japan Geophysical Earthquake Tsunami 1 24.0 18.0 NaN 3.077091
1902 Azerbaijan Geophysical Earthquake Ash fall 1 125.0 86.0 25000000.0 3.200175
1903 Canada Climatological Drought Ash fall 1 0.0 0.0 480000000.0 3.323258
1904 Bangladesh Geophysical Earthquake Ground movement 1 0.0 0.0 NaN 3.323258
... ... ... ... ... ... ... ... ... ...
2018 Afghanistan Climatological Drought Ash fall 1 0.0 0.0 36000.0 92.669670
2019 Afghanistan Biological Drought Ash fall 1 0.0 0.0 4000000.0 94.349092
2020 Afghanistan Biological Drought Ash fall 1 0.0 0.0 1500000.0 95.512967
2021 Afghanistan Climatological Drought Ash fall 1 0.0 0.0 120000.0 100.000000
2022 Afghanistan Climatological Drought Ash fall 1 0.0 0.0 240000.0 NaN

123 rows × 9 columns

data[['Total Events', 'Year']].groupby('Year')['Total Events'].sum()
Year
1900      6
1901      1
1902     10
1903     12
1904      4
       ... 
2018    320
2019    404
2020    395
2021    432
2022    315
Name: Total Events, Length: 123, dtype: int64
yearly_events = data.groupby('Year')['Total Events'].sum()
yearly_events.plot(kind='line', x='Year', y='Total Events', title='Anzahl an Naturkatastrophen pro Jahr')
<AxesSubplot: title={'center': 'Anzahl an Naturkatastrophen pro Jahr'}, xlabel='Year'>
../_images/f72821f0951cdfcafae1ea15e1f0e08a92f8495bb3c46bc892d3b4f75be061eb.png

Plotte für Deutschland die Entwicklung der Anzahl an Naturkatastrophen über die Jahre hinweg

yearly_events =germany_data.groupby('Year')['Total Events'].sum()
yearly_events.plot(kind='line', x='Year', y='Total Events', title='Anzahl an Naturkatastrophen pro Jahr')
<AxesSubplot: title={'center': 'Anzahl an Naturkatastrophen pro Jahr'}, xlabel='Year'>
../_images/73f81321271defa7abfce69fd239ddafb932c598b9a969e05327f8fb866e985d.png

Generelle Funktion

def plot_evolution(data, country):
    land_data = data[data['Country'] == county]
    yearly_events = land_data.groupby('Year')['Total Events'].sum()
    yearly_events.plot(kind='line', x='Year', y='Total Events', title='Anzahl an Naturkatastrophen pro Jahr')