Datenanalyse Teil I#

Natalie Widmann#

Wintersemester 2023 / 2024

Datenanalyse und -verarbeitung in Python#

Ziele#

  • Verständnis der Datenverarbeitung

  • strukturierte Daten bearbeiten und analysieren mit Pandas

  • Visualisierung von Daten

  • Python Packages verwenden

  • unterschiedliche Datenformate (csv, json, excel, txt) einlesen und speichern

Datenpipeline

Was sind Daten?#

Strukturierte Daten#

Strukturierte Daten sind gut organisiert und so formattiert, dass es einfach ist sie zu durchsuchen, sie maschinell zu lesen oder zu verarbeiten. Das einfachste Beispiel ist eine Tabelle in der jede Spalte eine Kategorie oder einen Wert festlegt.

Unstrukturierte Daten#

Im Gegensatz dazu sind unstrukturierte Daten nicht in einem bestimmten Format oder einer festgelegten Struktur verfügbar. Dazu zählen Texte, Bilder, Social Media Feeds, aber auch Audio Files, etc.

Semi-Strukturierte Daten#

Semi-strukturierte Daten bilden eine Mischform. Beispielsweise eine Tabelle mit E-Mail Daten, in der Empfänger, Betreff, Datum und Absender strukturierte Informationen enthalten, der eigentliche Text jedoch unstrukturiert ist.

Was sind Daten?#

Daten

Pandas#

Pandas ist ein Python Package und ist abgeleitet aus “Python and data analysis”.

Pandas stellt die Grundfunktionalitäten für das Arbeiten mit strukturierten Daten zur Verfügung.

pandas

Photo by Stone Wang on Unsplash

Python Packages#

Packages, auch Module genannt, sind vorgefertigte Code-Pakete, deren Funktionen wir wir verwenden können ohne diese selbst programmieren zu müssen.

Manche Packages sind in Python vorinstalliert und müssen nur noch importiert werden, wie beispielsweise

# Zufälliger Integer Wert mit random package
import random
random.randint(10,20)
12
# das heutige Datum über datetime ausgeben lassen
import datetime
datetime.datetime.today()
datetime.datetime(2023, 12, 6, 10, 37, 28, 342404)

Installation von Python Packages#

Packages die von der Python Community zur Verfügung gestellt werden, müssen vor der Verwendung installiert werden. Dafür kann pip als Packagemanager verwendet werden.

Tipps für die Installation von Python Packages in Windows, Linux und Mac gibt es hier: https://packaging.python.org/en/latest/tutorials/installing-packages/

In Jupyter Notebooks können Packages wie folgt installiert werden:

# Install a pip package im Jupyter Notebook
import sys
!pip install pandas
!pip install openpyxl
Requirement already satisfied: pandas in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (2.1.3)
Requirement already satisfied: numpy<2,>=1.23.2 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from pandas) (1.26.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from pandas) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from pandas) (2023.3)
Requirement already satisfied: six>=1.5 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Requirement already satisfied: openpyxl in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (3.1.2)
Requirement already satisfied: et-xmlfile in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from openpyxl) (1.1.0)
import pandas

Idee, Daten finden & Verifikation#

Datenpipeline1

Aggregated figures for Natural Disasters in EM-DAT#

Link: https://data.humdata.org/dataset/emdat-country-profiles

In 1988, the Centre for Research on the Epidemiology of Disasters (CRED) launched the Emergency Events Database (EM-DAT). EM-DAT was created with the initial support of the World Health Organisation (WHO) and the Belgian Government.

The main objective of the database is to serve the purposes of humanitarian action at national and international levels. The initiative aims to rationalise decision making for disaster preparedness, as well as provide an objective base for vulnerability assessment and priority setting.

EM-DAT contains essential core data on the occurrence and effects of over 22,000 mass disasters in the world from 1900 to the present day. The database is compiled from various sources, including UN agencies, non-governmental organisations, insurance companies, research institutes and press agencies.

import pandas as pd
pd.set_option('display.float_format', '{:.2f}'.format)

data = pd.read_excel('../../data/emdat.xlsx', engine="openpyxl")
data
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
0 #date +occurred #country +name #country +code #cause +group #cause +subgroup #cause +type #cause +subtype #frequency #affected +ind #affected +ind +killed NaN #value +usd NaN
1 2000 Afghanistan AFG Natural Climatological Drought Drought 1 2580000 37 50000.00 84975 58.84
2 2000 Algeria DZA Natural Hydrological Flood Flash flood 2 105 37 NaN NaN 58.84
3 2000 Algeria DZA Natural Hydrological Flood Flood (General) 2 100 7 NaN NaN 58.84
4 2000 Algeria DZA Natural Meteorological Storm Storm (General) 1 10 4 NaN NaN 58.84
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5834 2023 Viet Nam VNM Natural Meteorological Storm Tropical cyclone 1 3 1 NaN NaN NaN
5835 2023 Yemen YEM Natural Hydrological Flood Flood (General) 1 169035 39 NaN NaN NaN
5836 2023 Zambia ZMB Natural Hydrological Flood Flash flood 1 154608 NaN NaN NaN NaN
5837 2023 Zambia ZMB Natural Hydrological Flood Flood (General) 1 22000 NaN NaN NaN NaN
5838 2023 Zimbabwe ZWE Natural Meteorological Storm Tropical cyclone 1 NaN 2 NaN NaN NaN

5839 rows × 13 columns

Datenexploration und -bereinigung#

Datenpipeline1

Überblick über die Daten#

# head() gibt die ersten 5 Zeilen aus
data.head()
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
0 #date +occurred #country +name #country +code #cause +group #cause +subgroup #cause +type #cause +subtype #frequency #affected +ind #affected +ind +killed NaN #value +usd NaN
1 2000 Afghanistan AFG Natural Climatological Drought Drought 1 2580000 37 50000.00 84975 58.84
2 2000 Algeria DZA Natural Hydrological Flood Flash flood 2 105 37 NaN NaN 58.84
3 2000 Algeria DZA Natural Hydrological Flood Flood (General) 2 100 7 NaN NaN 58.84
4 2000 Algeria DZA Natural Meteorological Storm Storm (General) 1 10 4 NaN NaN 58.84

Wie groß ist der Datensatz? Wie viele Zeilen und wie viele Spalten sind vorhanden?

data.shape
(5839, 13)
print(f'Anzahl an Zeilen: {data.shape[0]}')
print(f'Anzahl an Spalten: {data.shape[1]}')
Anzahl an Zeilen: 5839
Anzahl an Spalten: 13

Die Spaltennamen

print(data.columns)
Index(['Year', 'Country', 'ISO', 'Disaster Group', 'Disaster Subroup',
       'Disaster Type', 'Disaster Subtype', 'Total Events', 'Total Affected',
       'Total Deaths', 'Total Damage (USD, original)',
       'Total Damage (USD, adjusted)', 'CPI'],
      dtype='object')

info() für mehr Infos über die Spalten

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5839 entries, 0 to 5838
Data columns (total 13 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Year                          5839 non-null   object 
 1   Country                       5839 non-null   object 
 2   ISO                           5839 non-null   object 
 3   Disaster Group                5839 non-null   object 
 4   Disaster Subroup              5839 non-null   object 
 5   Disaster Type                 5839 non-null   object 
 6   Disaster Subtype              5839 non-null   object 
 7   Total Events                  5839 non-null   object 
 8   Total Affected                4693 non-null   object 
 9   Total Deaths                  4119 non-null   object 
 10  Total Damage (USD, original)  2032 non-null   float64
 11  Total Damage (USD, adjusted)  2000 non-null   object 
 12  CPI                           5648 non-null   float64
dtypes: float64(2), object(11)
memory usage: 593.2+ KB

describe() zeigt die grundlegenden statistischen Eigenschaften von Spalten mit numerischem Datentyp, also int und float.

Die Methode berechnet:

  • die Anzahl an fehlenden Werten

  • Durchschnitt

  • Standardabweichung

  • Zahlenrange

  • Media

  • 0.25 und 0.75 Quartile

data.describe()
Total Damage (USD, original) CPI
count 2032.00 5648.00
mean 1639896721.42 75.90
std 8550683641.43 11.06
min 0.00 58.84
25% 20000000.00 66.73
50% 123000000.00 74.51
75% 742750000.00 83.76
max 210000000000.00 100.00

.unique() zeigt die unterschiedlichen Werte einer Spalte an

data['Year'].unique()
array(['#date +occurred', 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
       2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018,
       2019, 2020, 2021, 2022, 2023], dtype=object)

Data Cleaning: erste Zeile im DataFrame entfernen#

data.index
RangeIndex(start=0, stop=5839, step=1)
data.drop(index=0)
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
1 2000 Afghanistan AFG Natural Climatological Drought Drought 1 2580000 37 50000.00 84975 58.84
2 2000 Algeria DZA Natural Hydrological Flood Flash flood 2 105 37 NaN NaN 58.84
3 2000 Algeria DZA Natural Hydrological Flood Flood (General) 2 100 7 NaN NaN 58.84
4 2000 Algeria DZA Natural Meteorological Storm Storm (General) 1 10 4 NaN NaN 58.84
5 2000 Angola AGO Natural Hydrological Flood Flood (General) 3 9011 15 NaN NaN 58.84
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5834 2023 Viet Nam VNM Natural Meteorological Storm Tropical cyclone 1 3 1 NaN NaN NaN
5835 2023 Yemen YEM Natural Hydrological Flood Flood (General) 1 169035 39 NaN NaN NaN
5836 2023 Zambia ZMB Natural Hydrological Flood Flash flood 1 154608 NaN NaN NaN NaN
5837 2023 Zambia ZMB Natural Hydrological Flood Flood (General) 1 22000 NaN NaN NaN NaN
5838 2023 Zimbabwe ZWE Natural Meteorological Storm Tropical cyclone 1 NaN 2 NaN NaN NaN

5838 rows × 13 columns

data
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
0 #date +occurred #country +name #country +code #cause +group #cause +subgroup #cause +type #cause +subtype #frequency #affected +ind #affected +ind +killed NaN #value +usd NaN
1 2000 Afghanistan AFG Natural Climatological Drought Drought 1 2580000 37 50000.00 84975 58.84
2 2000 Algeria DZA Natural Hydrological Flood Flash flood 2 105 37 NaN NaN 58.84
3 2000 Algeria DZA Natural Hydrological Flood Flood (General) 2 100 7 NaN NaN 58.84
4 2000 Algeria DZA Natural Meteorological Storm Storm (General) 1 10 4 NaN NaN 58.84
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5834 2023 Viet Nam VNM Natural Meteorological Storm Tropical cyclone 1 3 1 NaN NaN NaN
5835 2023 Yemen YEM Natural Hydrological Flood Flood (General) 1 169035 39 NaN NaN NaN
5836 2023 Zambia ZMB Natural Hydrological Flood Flash flood 1 154608 NaN NaN NaN NaN
5837 2023 Zambia ZMB Natural Hydrological Flood Flood (General) 1 22000 NaN NaN NaN NaN
5838 2023 Zimbabwe ZWE Natural Meteorological Storm Tropical cyclone 1 NaN 2 NaN NaN NaN

5839 rows × 13 columns

data = data.drop(index=0)
# data.drop(index=0, inplace=True)

Datentypen abfragen und anpassen#

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5838 entries, 1 to 5838
Data columns (total 13 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Year                          5838 non-null   object 
 1   Country                       5838 non-null   object 
 2   ISO                           5838 non-null   object 
 3   Disaster Group                5838 non-null   object 
 4   Disaster Subroup              5838 non-null   object 
 5   Disaster Type                 5838 non-null   object 
 6   Disaster Subtype              5838 non-null   object 
 7   Total Events                  5838 non-null   object 
 8   Total Affected                4692 non-null   object 
 9   Total Deaths                  4118 non-null   object 
 10  Total Damage (USD, original)  2032 non-null   float64
 11  Total Damage (USD, adjusted)  1999 non-null   object 
 12  CPI                           5648 non-null   float64
dtypes: float64(2), object(11)
memory usage: 593.1+ KB
# Datentyp Abfrage mit dem Attribut
data['Year'].dtype
dtype('O')
# Umwandlung des Datentyp
data["Year"] = pd.to_numeric(data["Year"])
data['Year'].dtype
dtype('int64')
# Auf alle integer und float Spalten anwenden
cols = ['Total Events', 'Total Affected', 'Total Deaths', 'Total Damage (USD, adjusted)']
for col in cols:
    data[col] = pd.to_numeric(data[col])
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5838 entries, 1 to 5838
Data columns (total 13 columns):
 #   Column                        Non-Null Count  Dtype  
---  ------                        --------------  -----  
 0   Year                          5838 non-null   int64  
 1   Country                       5838 non-null   object 
 2   ISO                           5838 non-null   object 
 3   Disaster Group                5838 non-null   object 
 4   Disaster Subroup              5838 non-null   object 
 5   Disaster Type                 5838 non-null   object 
 6   Disaster Subtype              5838 non-null   object 
 7   Total Events                  5838 non-null   int64  
 8   Total Affected                4692 non-null   float64
 9   Total Deaths                  4118 non-null   float64
 10  Total Damage (USD, original)  2032 non-null   float64
 11  Total Damage (USD, adjusted)  1999 non-null   float64
 12  CPI                           5648 non-null   float64
dtypes: float64(5), int64(2), object(6)
memory usage: 593.1+ KB

Überblick über die numerischen Daten#

data.describe()
Year Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
count 5838.00 5838.00 4692.00 4118.00 2032.00 1999.00 5648.00
mean 2011.19 1.52 962499.69 343.87 1639896721.42 2067216253.87 75.90
std 7.06 1.28 8601461.99 5402.59 8550683641.43 11078428355.46 11.06
min 2000.00 1.00 1.00 1.00 0.00 0.00 58.84
25% 2005.00 1.00 1000.00 4.00 20000000.00 24791922.00 66.73
50% 2011.00 1.00 10000.00 14.00 123000000.00 162699012.00 74.51
75% 2017.00 2.00 100000.00 47.75 742750000.00 992183337.00 83.76
max 2023.00 17.00 330000000.00 222570.00 210000000000.00 273218372541.00 100.00

Überblick über die Objekt Daten#

# Unterschiedliche Länder
countries = data['Country'].unique()
countries
array(['Afghanistan', 'Algeria', 'Angola', 'Argentina', 'Armenia',
       'Australia', 'Austria', 'Azerbaijan', 'Bangladesh', 'Belarus',
       'Belize', 'Bhutan', 'Bolivia (Plurinational State of)',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Bulgaria',
       'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Chile', 'China',
       'Colombia', 'Costa Rica', 'Croatia', 'Cuba', 'Cyprus', 'Czechia',
       "Democratic People's Republic of Korea", 'Ecuador', 'Egypt',
       'El Salvador', 'Eswatini', 'Ethiopia', 'Fiji', 'France',
       'French Guiana', 'Georgia', 'Greece', 'Guatemala', 'Guinea',
       'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia',
       'Iran (Islamic Republic of)', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kyrgyzstan',
       "Lao People's Democratic Republic", 'Madagascar', 'Malawi',
       'Malaysia', 'Mali', 'Mexico', 'Mongolia', 'Morocco', 'Mozambique',
       'Namibia', 'Nepal', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria',
       'North Macedonia', 'Norway', 'Pakistan', 'Panama',
       'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Poland',
       'Portugal', 'Republic of Korea', 'Republic of Moldova', 'Romania',
       'Russian Federation', 'Réunion', 'Senegal', 'Serbia Montenegro',
       'Slovakia', 'Somalia', 'South Africa', 'Spain', 'Sri Lanka',
       'Sudan', 'Switzerland', 'Taiwan (Province of China)', 'Tajikistan',
       'Thailand', 'Turkmenistan', 'Türkiye', 'Uganda', 'Ukraine',
       'United Kingdom of Great Britain and Northern Ireland',
       'United Republic of Tanzania', 'United States of America',
       'Uruguay', 'Uzbekistan', 'Venezuela (Bolivarian Republic of)',
       'Viet Nam', 'Zambia', 'Zimbabwe', 'Bahamas', 'Burkina Faso',
       'Canary Islands', 'Cayman Islands', 'Central African Republic',
       'Chad', 'Cook Islands', 'Democratic Republic of the Congo',
       'Djibouti', 'Dominican Republic', 'Gambia', 'Germany', 'Ghana',
       'Kenya', 'Latvia', 'Lesotho', 'Lithuania', 'Mauritania', 'Myanmar',
       'Puerto Rico', 'Rwanda', 'Saint Helena', 'Samoa',
       'Syrian Arab Republic', 'Timor-Leste', 'Tonga', 'Vanuatu', 'Yemen',
       'Albania', 'Barbados', 'Belgium', 'Cabo Verde', 'Congo', 'Denmark',
       'Grenada', 'Guam', 'Guinea-Bissau', 'Lebanon', 'Mauritius',
       'Micronesia (Federated States of)', 'Netherlands (Kingdom of the)',
       'Northern Mariana Islands', 'Oman',
       'Saint Vincent and the Grenadines', 'Saudi Arabia', 'Seychelles',
       'Solomon Islands', 'Sweden', 'American Samoa', 'Bermuda',
       'China, Hong Kong Special Administrative Region', 'Comoros',
       'Eritrea', 'Luxembourg', 'New Caledonia', 'Slovenia', 'Tunisia',
       'Dominica', 'Guadeloupe', 'Iraq', 'Maldives', 'Niue',
       'Saint Lucia', 'Sierra Leone', 'Trinidad and Tobago',
       'Turks and Caicos Islands', 'United States Virgin Islands',
       'Estonia', 'Finland', 'Guyana', 'Tokelau', 'Montserrat',
       'Suriname', 'Togo', 'Benin', 'Côte d’Ivoire', 'Liberia',
       'Martinique', 'Montenegro', 'Serbia', 'Antigua and Barbuda',
       'Kiribati', 'Marshall Islands', 'Saint Kitts and Nevis',
       'South Sudan', 'State of Palestine', 'Gabon', 'French Polynesia',
       'Tuvalu', 'Palau', 'Wallis and Futuna Islands', 'Libya',
       'Anguilla', 'British Virgin Islands',
       'China, Macao Special Administrative Region', 'Saint Barthélemy',
       'Saint Martin (French Part)', 'Sint Maarten (Dutch part)',
       'United Arab Emirates', 'Kuwait', 'Qatar', 'Isle of Man',
       'Sao Tome and Principe', 'Malta'], dtype=object)
len(countries)
216
# Vorkommen von Ländern der Liste
'Germany' in countries
True
# Vorkommen von Deutschland
for country in countries:
    if 'german' in country.lower():
        print(country)
Germany
 data['Disaster Group'].unique()
array(['Natural'], dtype=object)

.value_counts() zeigt wie oft eine Spalte die unterschiedlichen Werte annimmt.

data['Disaster Subroup'].value_counts()
Disaster Subroup
Hydrological         2731
Meteorological       1956
Climatological        616
Geophysical           504
Biological             30
Extra-terrestrial       1
Name: count, dtype: int64

Mit dem Argument normalize=True wird das Vorkommen der Werte automatisch ins Verhältnis gesetzt.

data['Disaster Subroup'].value_counts(normalize=True)
Disaster Subroup
Hydrological        0.47
Meteorological      0.34
Climatological      0.11
Geophysical         0.09
Biological          0.01
Extra-terrestrial   0.00
Name: proportion, dtype: float64
data['Disaster Type'].value_counts()
Disaster Type
Flood                          2400
Storm                          1510
Extreme temperature             446
Earthquake                      387
Drought                         384
Mass movement (wet)             331
Wildfire                        229
Volcanic activity               104
Infestation                      29
Mass movement (dry)              13
Glacial lake outburst flood       3
Impact                            1
Animal incident                   1
Name: count, dtype: int64
data['Disaster Type'].value_counts(normalize=True)
Disaster Type
Flood                         0.41
Storm                         0.26
Extreme temperature           0.08
Earthquake                    0.07
Drought                       0.07
Mass movement (wet)           0.06
Wildfire                      0.04
Volcanic activity             0.02
Infestation                   0.00
Mass movement (dry)           0.00
Glacial lake outburst flood   0.00
Impact                        0.00
Animal incident               0.00
Name: proportion, dtype: float64

Dataframes Sortieren#

Dataframes können anhand einer oder meherer Spalten sortiert werden.

data.sort_values(by="Total Affected")
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
2574 2009 Viet Nam VNM Natural Hydrological Mass movement (wet) Landslide (wet) 1 1.00 13.00 NaN NaN 73.31
5056 2020 Taiwan (Province of China) TWN Natural Meteorological Storm Tropical cyclone 1 1.00 1.00 NaN NaN 88.44
1505 2005 Netherlands (Kingdom of the) NLD Natural Meteorological Storm Extra-tropical storm 1 1.00 NaN NaN NaN 66.73
1876 2007 Barbados BRB Natural Geophysical Earthquake Ground movement 1 1.00 NaN NaN NaN 70.85
147 2000 Mexico MEX Natural Geophysical Earthquake Ground movement 1 1.00 NaN NaN NaN 58.84
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5815 2023 Tajikistan TJK Natural Hydrological Flood Flood (General) 1 NaN 21.00 NaN NaN NaN
5819 2023 Türkiye TUR Natural Hydrological Flood Flood (General) 1 NaN 19.00 25000000.00 NaN NaN
5825 2023 United States of America USA Natural Hydrological Flood Flood (General) 1 NaN NaN NaN NaN NaN
5826 2023 United States of America USA Natural Meteorological Extreme temperature Heat wave 1 NaN 14.00 NaN NaN NaN
5838 2023 Zimbabwe ZWE Natural Meteorological Storm Tropical cyclone 1 NaN 2.00 NaN NaN NaN

5838 rows × 13 columns

# 10 schlimmsten Naturkatastrophen
data.sort_values(by="Total Affected", ascending=False).head(n=10)
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
3778 2015 India IND Natural Climatological Drought Drought 1 330000000.00 NaN 3000000000.00 3704226000.00 80.99
658 2002 India IND Natural Climatological Drought Drought 1 300000000.00 NaN 910722000.00 1481735695.00 61.46
892 2003 China CHN Natural Hydrological Flood Riverine flood 6 155924986.00 662.00 15329640000.00 24387552800.00 62.86
2626 2010 China CHN Natural Hydrological Flood Riverine flood 5 140194136.00 1911.00 18171000000.00 24387512516.00 74.51
1911 2007 China CHN Natural Hydrological Flood Riverine flood 9 108793242.00 967.00 4919155000.00 6943174065.00 70.85
599 2002 China CHN Natural Meteorological Storm Sand/Dust storm 1 100000000.00 NaN NaN NaN 61.46
2885 2011 China CHN Natural Hydrological Flood Riverine flood 5 93360000.00 628.00 10704130000.00 13926499896.00 76.86
4126 2016 United States of America USA Natural Meteorological Storm Blizzard/Winter storm 4 85000057.00 90.00 2125000000.00 2591136966.00 82.01
593 2002 China CHN Natural Hydrological Flood Flash flood 1 80035257.00 793.00 3100000000.00 5043669370.00 61.46
2170 2008 China CHN Natural Meteorological Extreme temperature Severe winter conditions 2 77000000.00 145.00 21100000000.00 28680657590.00 73.57
# Mehrere Argumente zum Sortieren sind möglich
data.sort_values(by=["Disaster Type", "Total Affected"], ascending=[True, False]).head(n=10)
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
3592 2014 Niger NER Natural Biological Animal incident Animal incident 1 5.00 12.00 NaN NaN 80.89
3778 2015 India IND Natural Climatological Drought Drought 1 330000000.00 NaN 3000000000.00 3704226000.00 80.99
658 2002 India IND Natural Climatological Drought Drought 1 300000000.00 NaN 910722000.00 1481735695.00 61.46
590 2002 China CHN Natural Climatological Drought Drought 3 64560000.00 NaN 1210000000.00 1968658044.00 61.46
2400 2009 China CHN Natural Climatological Drought Drought 2 60160000.00 NaN 3600000000.00 4910842514.00 73.31
889 2003 China CHN Natural Climatological Drought Drought 2 51000000.00 NaN NaN NaN 62.86
106 2000 India IND Natural Climatological Drought Drought 1 50000000.00 20.00 588000000.00 999309177.00 58.84
2623 2010 China CHN Natural Climatological Drought Drought 1 35000000.00 NaN 2370000000.00 3180804835.00 74.51
3500 2014 China CHN Natural Climatological Drought Drought 2 27726000.00 NaN 3680000000.00 4549240500.00 80.89
3485 2014 Brazil BRA Natural Climatological Drought Drought 1 27000000.00 NaN 5000000000.00 6181033288.00 80.89

Indexing and Retriving Data#

Auf die Werte einer Spalte kann <dataframe>['<spaltenname>'] zugegriffen werden.

data['Year']
1       2000
2       2000
3       2000
4       2000
5       2000
        ... 
5834    2023
5835    2023
5836    2023
5837    2023
5838    2023
Name: Year, Length: 5838, dtype: int64

Darauf können weitere Operationen oder Methoden angewendet werden:

data['Year'] + 10
1       2010
2       2010
3       2010
4       2010
5       2010
        ... 
5834    2033
5835    2033
5836    2033
5837    2033
5838    2033
Name: Year, Length: 5838, dtype: int64
data['Year'].mean()
2011.1887632750943

Mehrere Spalten werden ausgewählt indem eine Liste von Spaltennamen übergeben wird

data[['Year', 'Country', 'Disaster Type', 'Total Affected']]
Year Country Disaster Type Total Affected
1 2000 Afghanistan Drought 2580000.00
2 2000 Algeria Flood 105.00
3 2000 Algeria Flood 100.00
4 2000 Algeria Storm 10.00
5 2000 Angola Flood 9011.00
... ... ... ... ...
5834 2023 Viet Nam Storm 3.00
5835 2023 Yemen Flood 169035.00
5836 2023 Zambia Flood 154608.00
5837 2023 Zambia Flood 22000.00
5838 2023 Zimbabwe Storm NaN

5838 rows × 4 columns

Boolean Indexing#

Die ausgewählten Daten können auch gefilteret werden, in dem eine Bedingung mitgegeben wird.

data[data['Country'] == 'Germany']
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
370 2001 Germany DEU Natural Meteorological Storm Lightning/Thunderstorms 1 NaN 6.00 300000000.00 495838437.00 60.50
636 2002 Germany DEU Natural Hydrological Flood Flood (General) 1 330108.00 27.00 11600000000.00 18873085384.00 61.46
637 2002 Germany DEU Natural Meteorological Storm Extra-tropical storm 1 NaN 11.00 1800000000.00 2928582215.00 61.46
638 2002 Germany DEU Natural Meteorological Storm Storm (General) 2 19.00 11.00 250000000.00 406747530.00 61.46
929 2003 Germany DEU Natural Meteorological Extreme temperature Heat wave 1 NaN 9355.00 1650000000.00 2624945016.00 62.86
930 2003 Germany DEU Natural Meteorological Storm Extra-tropical storm 1 NaN 5.00 300000000.00 477262730.00 62.86
931 2003 Germany DEU Natural Meteorological Storm Lightning/Thunderstorms 1 NaN 10.00 NaN NaN 62.86
1163 2004 Germany DEU Natural Geophysical Earthquake Ground movement 1 150.00 NaN 12000000.00 18592738.00 64.54
1164 2004 Germany DEU Natural Meteorological Storm Storm (General) 1 NaN 2.00 130000000.00 201421324.00 64.54
1434 2005 Germany DEU Natural Hydrological Flood Riverine flood 2 450.00 1.00 220000000.00 329681571.00 66.73
1435 2005 Germany DEU Natural Meteorological Extreme temperature Cold wave 1 165.00 1.00 300000000.00 449565778.00 66.73
1436 2005 Germany DEU Natural Meteorological Storm Extra-tropical storm 1 2.00 2.00 270000000.00 404609200.00 66.73
1694 2006 Germany DEU Natural Hydrological Flood Riverine flood 1 1000.00 NaN NaN NaN 68.88
1695 2006 Germany DEU Natural Meteorological Extreme temperature Heat wave 1 NaN 2.00 NaN NaN 68.88
1696 2006 Germany DEU Natural Meteorological Extreme temperature Severe winter conditions 1 NaN 10.00 NaN NaN 68.88
1697 2006 Germany DEU Natural Meteorological Storm Hail 1 100.00 1.00 NaN NaN 68.88
1698 2006 Germany DEU Natural Meteorological Storm Storm (General) 2 200.00 10.00 NaN NaN 68.88
1948 2007 Germany DEU Natural Hydrological Flood Riverine flood 1 NaN 1.00 NaN NaN 70.85
1949 2007 Germany DEU Natural Meteorological Storm Blizzard/Winter storm 1 NaN 7.00 NaN NaN 70.85
1950 2007 Germany DEU Natural Meteorological Storm Extra-tropical storm 1 130.00 11.00 5500000000.00 7763011606.00 70.85
2205 2008 Germany DEU Natural Meteorological Storm Extra-tropical storm 1 NaN 5.00 1200000000.00 1631127446.00 73.57
2206 2008 Germany DEU Natural Meteorological Storm Severe weather 1 NaN 3.00 1500000000.00 2038909307.00 73.57
2435 2009 Germany DEU Natural Hydrological Flood Riverine flood 1 NaN NaN 20000000.00 27282458.00 73.31
2436 2009 Germany DEU Natural Meteorological Extreme temperature Cold wave 2 NaN 15.00 NaN NaN 73.31
2437 2009 Germany DEU Natural Meteorological Storm Lightning/Thunderstorms 1 NaN 1.00 50000000.00 68206146.00 73.31
2668 2010 Germany DEU Natural Hydrological Flood Flash flood 1 NaN 3.00 NaN NaN 74.51
2669 2010 Germany DEU Natural Meteorological Extreme temperature Cold wave 1 NaN 1.00 NaN NaN 74.51
2670 2010 Germany DEU Natural Meteorological Storm Blizzard/Winter storm 1 NaN NaN NaN NaN 74.51
2671 2010 Germany DEU Natural Meteorological Storm Extra-tropical storm 1 NaN 4.00 1000000000.00 1342111745.00 74.51
2907 2011 Germany DEU Natural Hydrological Flood Riverine flood 1 NaN 4.00 NaN NaN 76.86
3128 2012 Germany DEU Natural Meteorological Extreme temperature Cold wave 2 NaN 6.00 NaN NaN 78.45
3334 2013 Germany DEU Natural Hydrological Flood Riverine flood 1 6350.00 4.00 12900000000.00 16205763565.00 79.60
3335 2013 Germany DEU Natural Meteorological Storm Extra-tropical storm 2 2.00 7.00 NaN NaN 79.60
3336 2013 Germany DEU Natural Meteorological Storm Hail 1 NaN NaN 4800000000.00 6030051559.00 79.60
3528 2014 Germany DEU Natural Meteorological Storm Lightning/Thunderstorms 2 1.00 8.00 400000000.00 494482663.00 80.89
3987 2016 Germany DEU Natural Hydrological Flood Flood (General) 1 NaN 7.00 2000000000.00 2438717145.00 82.01
4219 2017 Germany DEU Natural Hydrological Flood Riverine flood 1 600.00 NaN NaN NaN 83.76
4220 2017 Germany DEU Natural Meteorological Storm Hail 1 NaN 2.00 740000000.00 883505559.00 83.76
4221 2017 Germany DEU Natural Meteorological Storm Severe weather 1 24.00 3.00 159000000.00 189834303.00 83.76
4448 2018 Germany DEU Natural Meteorological Extreme temperature Heat wave 1 NaN NaN NaN NaN 85.80
4449 2018 Germany DEU Natural Meteorological Storm Extra-tropical storm 1 12.00 5.00 588475000.00 685844110.00 85.80
4694 2019 Germany DEU Natural Meteorological Extreme temperature Heat wave 2 NaN 4.00 NaN NaN 87.36
4695 2019 Germany DEU Natural Meteorological Storm Blizzard/Winter storm 1 NaN 1.00 NaN NaN 87.36
4934 2020 Germany DEU Natural Meteorological Storm Extra-tropical storm 1 33.00 NaN NaN NaN 88.44
5194 2021 Germany DEU Natural Hydrological Flood Flood (General) 1 1000.00 197.00 40000000000.00 43201119615.00 92.59
5195 2021 Germany DEU Natural Meteorological Storm Lightning/Thunderstorms 1 600.00 NaN NaN NaN 92.59
5196 2021 Germany DEU Natural Meteorological Storm Storm (General) 1 4.00 1.00 NaN NaN 92.59
5461 2022 Germany DEU Natural Meteorological Extreme temperature Heat wave 1 NaN 8173.00 NaN NaN 100.00
5462 2022 Germany DEU Natural Meteorological Storm Extra-tropical storm 3 2.00 7.00 1023156000.00 1023156000.00 100.00
5707 2023 Germany DEU Natural Meteorological Storm Severe weather 1 NaN 1.00 NaN NaN NaN
data[data['Total Deaths'] >= 1000]
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
109 2000 India IND Natural Hydrological Flood Riverine flood 2 46600000.00 1751.00 734500000.00 1248286718.00 58.84
358 2001 El Salvador SLV Natural Geophysical Earthquake Ground movement 2 1590550.00 1159.00 1848500000.00 3055191171.00 60.50
386 2001 India IND Natural Geophysical Earthquake Ground movement 1 6321812.00 20005.00 2623000000.00 4335280736.00 60.50
542 2002 Afghanistan AFG Natural Geophysical Earthquake Ground movement 3 100891.00 1200.00 NaN NaN 61.46
663 2002 India IND Natural Meteorological Extreme temperature Heat wave 1 NaN 1030.00 NaN NaN 61.46
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5692 2023 Democratic Republic of the Congo COD Natural Hydrological Flood Flash flood 1 50000.00 2970.00 10000000.00 NaN NaN
5745 2023 Libya LBY Natural Meteorological Storm Storm (General) 1 1600000.00 12352.00 NaN NaN NaN
5755 2023 Morocco MAR Natural Geophysical Earthquake Ground movement 1 1002476.00 2497.00 NaN NaN NaN
5811 2023 Syrian Arab Republic SYR Natural Geophysical Earthquake Ground movement 3 4109320.00 4500.00 8900000000.00 NaN NaN
5818 2023 Türkiye TUR Natural Geophysical Earthquake Ground movement 3 9207698.00 50103.00 34000000000.00 NaN NaN

90 rows × 13 columns

Wie viele Menschen sind im Schnitt pro Erdbeben betroffen?

data[data['Disaster Type'] == 'Earthquake']
Year Country ISO Disaster Group Disaster Subroup Disaster Type Disaster Subtype Total Events Total Affected Total Deaths Total Damage (USD, original) Total Damage (USD, adjusted) CPI
22 2000 Azerbaijan AZE Natural Geophysical Earthquake Ground movement 1 3294.00 31.00 10000000.00 16995054.00 58.84
24 2000 Bangladesh BGD Natural Geophysical Earthquake Ground movement 1 1000.00 NaN NaN NaN 58.84
56 2000 China CHN Natural Geophysical Earthquake Ground movement 5 2105050.00 9.00 116983000.00 198813241.00 58.84
64 2000 Colombia COL Natural Geophysical Earthquake Ground movement 1 430.00 2.00 NaN NaN 58.84
95 2000 Greece GRC Natural Geophysical Earthquake Ground movement 1 600.00 NaN NaN NaN 58.84
... ... ... ... ... ... ... ... ... ... ... ... ... ...
5775 2023 Papua New Guinea PNG Natural Geophysical Earthquake Ground movement 1 16274.00 8.00 NaN NaN NaN
5777 2023 Peru PER Natural Geophysical Earthquake Ground movement 1 141.00 1.00 NaN NaN NaN
5811 2023 Syrian Arab Republic SYR Natural Geophysical Earthquake Ground movement 3 4109320.00 4500.00 8900000000.00 NaN NaN
5814 2023 Tajikistan TJK Natural Geophysical Earthquake Ground movement 1 2205.00 NaN NaN NaN NaN
5818 2023 Türkiye TUR Natural Geophysical Earthquake Ground movement 3 9207698.00 50103.00 34000000000.00 NaN NaN

387 rows × 13 columns

data[data['Disaster Type'] == 'Earthquake']['Total Affected']
22        3294.00
24        1000.00
56     2105050.00
64         430.00
95         600.00
          ...    
5775     16274.00
5777       141.00
5811   4109320.00
5814      2205.00
5818   9207698.00
Name: Total Affected, Length: 387, dtype: float64
data[data['Disaster Type'] == 'Earthquake']['Total Affected'].mean()
373040.7386666667

Weitere Recherchefragen#

  • Wie viele Naturkatastrophen gab es in Deutschland?

  • In welchem Jahr gabe es die meisten Naturkatastrophen?

  • Welche Länder sind am stärksten von Naturkatastrophen betroffen?

  • Welche Länder sind von Naturkatastrophen betroffen haben aber vergleichsweise geringe Todesfälle?

  • Welche Naturkatastrophen sind am tödlichsten?