Datenanalyse Teil I#
Natalie Widmann#
Wintersemester 2023 / 2024
Datenanalyse und -verarbeitung in Python#
Ziele#
Verständnis der Datenverarbeitung
strukturierte Daten bearbeiten und analysieren mit Pandas
Visualisierung von Daten
Python Packages verwenden
unterschiedliche Datenformate (csv, json, excel, txt) einlesen und speichern
Was sind Daten?#
Strukturierte Daten#
Strukturierte Daten sind gut organisiert und so formattiert, dass es einfach ist sie zu durchsuchen, sie maschinell zu lesen oder zu verarbeiten. Das einfachste Beispiel ist eine Tabelle in der jede Spalte eine Kategorie oder einen Wert festlegt.
Unstrukturierte Daten#
Im Gegensatz dazu sind unstrukturierte Daten nicht in einem bestimmten Format oder einer festgelegten Struktur verfügbar. Dazu zählen Texte, Bilder, Social Media Feeds, aber auch Audio Files, etc.
Semi-Strukturierte Daten#
Semi-strukturierte Daten bilden eine Mischform. Beispielsweise eine Tabelle mit E-Mail Daten, in der Empfänger, Betreff, Datum und Absender strukturierte Informationen enthalten, der eigentliche Text jedoch unstrukturiert ist.
Was sind Daten?#
Pandas#
Pandas ist ein Python Package und ist abgeleitet aus “Python and data analysis”.
Pandas stellt die Grundfunktionalitäten für das Arbeiten mit strukturierten Daten zur Verfügung.
Photo by Stone Wang on Unsplash
Python Packages#
Packages, auch Module genannt, sind vorgefertigte Code-Pakete, deren Funktionen wir wir verwenden können ohne diese selbst programmieren zu müssen.
Manche Packages sind in Python vorinstalliert und müssen nur noch importiert werden, wie beispielsweise
# Zufälliger Integer Wert mit random package
import random
random.randint(10,20)
12
# das heutige Datum über datetime ausgeben lassen
import datetime
datetime.datetime.today()
datetime.datetime(2023, 12, 6, 10, 37, 28, 342404)
Installation von Python Packages#
Packages die von der Python Community zur Verfügung gestellt werden, müssen vor der Verwendung installiert werden. Dafür kann pip
als Packagemanager verwendet werden.
Tipps für die Installation von Python Packages in Windows, Linux und Mac gibt es hier: https://packaging.python.org/en/latest/tutorials/installing-packages/
In Jupyter Notebooks können Packages wie folgt installiert werden:
# Install a pip package im Jupyter Notebook
import sys
!pip install pandas
!pip install openpyxl
Requirement already satisfied: pandas in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (2.1.3)
Requirement already satisfied: numpy<2,>=1.23.2 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from pandas) (1.26.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from pandas) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from pandas) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from pandas) (2023.3)
Requirement already satisfied: six>=1.5 in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Requirement already satisfied: openpyxl in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (3.1.2)
Requirement already satisfied: et-xmlfile in /home/natalie/Dokumente/Datenjournalismus in Python/Code/.venv/lib/python3.11/site-packages (from openpyxl) (1.1.0)
import pandas
Idee, Daten finden & Verifikation#
Aggregated figures for Natural Disasters in EM-DAT#
Link: https://data.humdata.org/dataset/emdat-country-profiles
In 1988, the Centre for Research on the Epidemiology of Disasters (CRED) launched the Emergency Events Database (EM-DAT). EM-DAT was created with the initial support of the World Health Organisation (WHO) and the Belgian Government.
The main objective of the database is to serve the purposes of humanitarian action at national and international levels. The initiative aims to rationalise decision making for disaster preparedness, as well as provide an objective base for vulnerability assessment and priority setting.
EM-DAT contains essential core data on the occurrence and effects of over 22,000 mass disasters in the world from 1900 to the present day. The database is compiled from various sources, including UN agencies, non-governmental organisations, insurance companies, research institutes and press agencies.
import pandas as pd
pd.set_option('display.float_format', '{:.2f}'.format)
data = pd.read_excel('../../data/emdat.xlsx', engine="openpyxl")
data
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | #date +occurred | #country +name | #country +code | #cause +group | #cause +subgroup | #cause +type | #cause +subtype | #frequency | #affected +ind | #affected +ind +killed | NaN | #value +usd | NaN |
1 | 2000 | Afghanistan | AFG | Natural | Climatological | Drought | Drought | 1 | 2580000 | 37 | 50000.00 | 84975 | 58.84 |
2 | 2000 | Algeria | DZA | Natural | Hydrological | Flood | Flash flood | 2 | 105 | 37 | NaN | NaN | 58.84 |
3 | 2000 | Algeria | DZA | Natural | Hydrological | Flood | Flood (General) | 2 | 100 | 7 | NaN | NaN | 58.84 |
4 | 2000 | Algeria | DZA | Natural | Meteorological | Storm | Storm (General) | 1 | 10 | 4 | NaN | NaN | 58.84 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5834 | 2023 | Viet Nam | VNM | Natural | Meteorological | Storm | Tropical cyclone | 1 | 3 | 1 | NaN | NaN | NaN |
5835 | 2023 | Yemen | YEM | Natural | Hydrological | Flood | Flood (General) | 1 | 169035 | 39 | NaN | NaN | NaN |
5836 | 2023 | Zambia | ZMB | Natural | Hydrological | Flood | Flash flood | 1 | 154608 | NaN | NaN | NaN | NaN |
5837 | 2023 | Zambia | ZMB | Natural | Hydrological | Flood | Flood (General) | 1 | 22000 | NaN | NaN | NaN | NaN |
5838 | 2023 | Zimbabwe | ZWE | Natural | Meteorological | Storm | Tropical cyclone | 1 | NaN | 2 | NaN | NaN | NaN |
5839 rows × 13 columns
Datenexploration und -bereinigung#
Überblick über die Daten#
# head() gibt die ersten 5 Zeilen aus
data.head()
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | #date +occurred | #country +name | #country +code | #cause +group | #cause +subgroup | #cause +type | #cause +subtype | #frequency | #affected +ind | #affected +ind +killed | NaN | #value +usd | NaN |
1 | 2000 | Afghanistan | AFG | Natural | Climatological | Drought | Drought | 1 | 2580000 | 37 | 50000.00 | 84975 | 58.84 |
2 | 2000 | Algeria | DZA | Natural | Hydrological | Flood | Flash flood | 2 | 105 | 37 | NaN | NaN | 58.84 |
3 | 2000 | Algeria | DZA | Natural | Hydrological | Flood | Flood (General) | 2 | 100 | 7 | NaN | NaN | 58.84 |
4 | 2000 | Algeria | DZA | Natural | Meteorological | Storm | Storm (General) | 1 | 10 | 4 | NaN | NaN | 58.84 |
Wie groß ist der Datensatz? Wie viele Zeilen und wie viele Spalten sind vorhanden?
data.shape
(5839, 13)
print(f'Anzahl an Zeilen: {data.shape[0]}')
print(f'Anzahl an Spalten: {data.shape[1]}')
Anzahl an Zeilen: 5839
Anzahl an Spalten: 13
Die Spaltennamen
print(data.columns)
Index(['Year', 'Country', 'ISO', 'Disaster Group', 'Disaster Subroup',
'Disaster Type', 'Disaster Subtype', 'Total Events', 'Total Affected',
'Total Deaths', 'Total Damage (USD, original)',
'Total Damage (USD, adjusted)', 'CPI'],
dtype='object')
info()
für mehr Infos über die Spalten
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5839 entries, 0 to 5838
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 5839 non-null object
1 Country 5839 non-null object
2 ISO 5839 non-null object
3 Disaster Group 5839 non-null object
4 Disaster Subroup 5839 non-null object
5 Disaster Type 5839 non-null object
6 Disaster Subtype 5839 non-null object
7 Total Events 5839 non-null object
8 Total Affected 4693 non-null object
9 Total Deaths 4119 non-null object
10 Total Damage (USD, original) 2032 non-null float64
11 Total Damage (USD, adjusted) 2000 non-null object
12 CPI 5648 non-null float64
dtypes: float64(2), object(11)
memory usage: 593.2+ KB
describe()
zeigt die grundlegenden statistischen Eigenschaften von Spalten mit numerischem Datentyp, also int
und float
.
Die Methode berechnet:
die Anzahl an fehlenden Werten
Durchschnitt
Standardabweichung
Zahlenrange
Media
0.25 und 0.75 Quartile
data.describe()
Total Damage (USD, original) | CPI | |
---|---|---|
count | 2032.00 | 5648.00 |
mean | 1639896721.42 | 75.90 |
std | 8550683641.43 | 11.06 |
min | 0.00 | 58.84 |
25% | 20000000.00 | 66.73 |
50% | 123000000.00 | 74.51 |
75% | 742750000.00 | 83.76 |
max | 210000000000.00 | 100.00 |
.unique()
zeigt die unterschiedlichen Werte einer Spalte an
data['Year'].unique()
array(['#date +occurred', 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007,
2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018,
2019, 2020, 2021, 2022, 2023], dtype=object)
Data Cleaning: erste Zeile im DataFrame entfernen#
data.index
RangeIndex(start=0, stop=5839, step=1)
data.drop(index=0)
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2000 | Afghanistan | AFG | Natural | Climatological | Drought | Drought | 1 | 2580000 | 37 | 50000.00 | 84975 | 58.84 |
2 | 2000 | Algeria | DZA | Natural | Hydrological | Flood | Flash flood | 2 | 105 | 37 | NaN | NaN | 58.84 |
3 | 2000 | Algeria | DZA | Natural | Hydrological | Flood | Flood (General) | 2 | 100 | 7 | NaN | NaN | 58.84 |
4 | 2000 | Algeria | DZA | Natural | Meteorological | Storm | Storm (General) | 1 | 10 | 4 | NaN | NaN | 58.84 |
5 | 2000 | Angola | AGO | Natural | Hydrological | Flood | Flood (General) | 3 | 9011 | 15 | NaN | NaN | 58.84 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5834 | 2023 | Viet Nam | VNM | Natural | Meteorological | Storm | Tropical cyclone | 1 | 3 | 1 | NaN | NaN | NaN |
5835 | 2023 | Yemen | YEM | Natural | Hydrological | Flood | Flood (General) | 1 | 169035 | 39 | NaN | NaN | NaN |
5836 | 2023 | Zambia | ZMB | Natural | Hydrological | Flood | Flash flood | 1 | 154608 | NaN | NaN | NaN | NaN |
5837 | 2023 | Zambia | ZMB | Natural | Hydrological | Flood | Flood (General) | 1 | 22000 | NaN | NaN | NaN | NaN |
5838 | 2023 | Zimbabwe | ZWE | Natural | Meteorological | Storm | Tropical cyclone | 1 | NaN | 2 | NaN | NaN | NaN |
5838 rows × 13 columns
data
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | #date +occurred | #country +name | #country +code | #cause +group | #cause +subgroup | #cause +type | #cause +subtype | #frequency | #affected +ind | #affected +ind +killed | NaN | #value +usd | NaN |
1 | 2000 | Afghanistan | AFG | Natural | Climatological | Drought | Drought | 1 | 2580000 | 37 | 50000.00 | 84975 | 58.84 |
2 | 2000 | Algeria | DZA | Natural | Hydrological | Flood | Flash flood | 2 | 105 | 37 | NaN | NaN | 58.84 |
3 | 2000 | Algeria | DZA | Natural | Hydrological | Flood | Flood (General) | 2 | 100 | 7 | NaN | NaN | 58.84 |
4 | 2000 | Algeria | DZA | Natural | Meteorological | Storm | Storm (General) | 1 | 10 | 4 | NaN | NaN | 58.84 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5834 | 2023 | Viet Nam | VNM | Natural | Meteorological | Storm | Tropical cyclone | 1 | 3 | 1 | NaN | NaN | NaN |
5835 | 2023 | Yemen | YEM | Natural | Hydrological | Flood | Flood (General) | 1 | 169035 | 39 | NaN | NaN | NaN |
5836 | 2023 | Zambia | ZMB | Natural | Hydrological | Flood | Flash flood | 1 | 154608 | NaN | NaN | NaN | NaN |
5837 | 2023 | Zambia | ZMB | Natural | Hydrological | Flood | Flood (General) | 1 | 22000 | NaN | NaN | NaN | NaN |
5838 | 2023 | Zimbabwe | ZWE | Natural | Meteorological | Storm | Tropical cyclone | 1 | NaN | 2 | NaN | NaN | NaN |
5839 rows × 13 columns
data = data.drop(index=0)
# data.drop(index=0, inplace=True)
Datentypen abfragen und anpassen#
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5838 entries, 1 to 5838
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 5838 non-null object
1 Country 5838 non-null object
2 ISO 5838 non-null object
3 Disaster Group 5838 non-null object
4 Disaster Subroup 5838 non-null object
5 Disaster Type 5838 non-null object
6 Disaster Subtype 5838 non-null object
7 Total Events 5838 non-null object
8 Total Affected 4692 non-null object
9 Total Deaths 4118 non-null object
10 Total Damage (USD, original) 2032 non-null float64
11 Total Damage (USD, adjusted) 1999 non-null object
12 CPI 5648 non-null float64
dtypes: float64(2), object(11)
memory usage: 593.1+ KB
# Datentyp Abfrage mit dem Attribut
data['Year'].dtype
dtype('O')
# Umwandlung des Datentyp
data["Year"] = pd.to_numeric(data["Year"])
data['Year'].dtype
dtype('int64')
# Auf alle integer und float Spalten anwenden
cols = ['Total Events', 'Total Affected', 'Total Deaths', 'Total Damage (USD, adjusted)']
for col in cols:
data[col] = pd.to_numeric(data[col])
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5838 entries, 1 to 5838
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Year 5838 non-null int64
1 Country 5838 non-null object
2 ISO 5838 non-null object
3 Disaster Group 5838 non-null object
4 Disaster Subroup 5838 non-null object
5 Disaster Type 5838 non-null object
6 Disaster Subtype 5838 non-null object
7 Total Events 5838 non-null int64
8 Total Affected 4692 non-null float64
9 Total Deaths 4118 non-null float64
10 Total Damage (USD, original) 2032 non-null float64
11 Total Damage (USD, adjusted) 1999 non-null float64
12 CPI 5648 non-null float64
dtypes: float64(5), int64(2), object(6)
memory usage: 593.1+ KB
Überblick über die numerischen Daten#
data.describe()
Year | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|
count | 5838.00 | 5838.00 | 4692.00 | 4118.00 | 2032.00 | 1999.00 | 5648.00 |
mean | 2011.19 | 1.52 | 962499.69 | 343.87 | 1639896721.42 | 2067216253.87 | 75.90 |
std | 7.06 | 1.28 | 8601461.99 | 5402.59 | 8550683641.43 | 11078428355.46 | 11.06 |
min | 2000.00 | 1.00 | 1.00 | 1.00 | 0.00 | 0.00 | 58.84 |
25% | 2005.00 | 1.00 | 1000.00 | 4.00 | 20000000.00 | 24791922.00 | 66.73 |
50% | 2011.00 | 1.00 | 10000.00 | 14.00 | 123000000.00 | 162699012.00 | 74.51 |
75% | 2017.00 | 2.00 | 100000.00 | 47.75 | 742750000.00 | 992183337.00 | 83.76 |
max | 2023.00 | 17.00 | 330000000.00 | 222570.00 | 210000000000.00 | 273218372541.00 | 100.00 |
Überblick über die Objekt Daten#
# Unterschiedliche Länder
countries = data['Country'].unique()
countries
array(['Afghanistan', 'Algeria', 'Angola', 'Argentina', 'Armenia',
'Australia', 'Austria', 'Azerbaijan', 'Bangladesh', 'Belarus',
'Belize', 'Bhutan', 'Bolivia (Plurinational State of)',
'Bosnia and Herzegovina', 'Botswana', 'Brazil', 'Bulgaria',
'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Chile', 'China',
'Colombia', 'Costa Rica', 'Croatia', 'Cuba', 'Cyprus', 'Czechia',
"Democratic People's Republic of Korea", 'Ecuador', 'Egypt',
'El Salvador', 'Eswatini', 'Ethiopia', 'Fiji', 'France',
'French Guiana', 'Georgia', 'Greece', 'Guatemala', 'Guinea',
'Haiti', 'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia',
'Iran (Islamic Republic of)', 'Ireland', 'Israel', 'Italy',
'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kyrgyzstan',
"Lao People's Democratic Republic", 'Madagascar', 'Malawi',
'Malaysia', 'Mali', 'Mexico', 'Mongolia', 'Morocco', 'Mozambique',
'Namibia', 'Nepal', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria',
'North Macedonia', 'Norway', 'Pakistan', 'Panama',
'Papua New Guinea', 'Paraguay', 'Peru', 'Philippines', 'Poland',
'Portugal', 'Republic of Korea', 'Republic of Moldova', 'Romania',
'Russian Federation', 'Réunion', 'Senegal', 'Serbia Montenegro',
'Slovakia', 'Somalia', 'South Africa', 'Spain', 'Sri Lanka',
'Sudan', 'Switzerland', 'Taiwan (Province of China)', 'Tajikistan',
'Thailand', 'Turkmenistan', 'Türkiye', 'Uganda', 'Ukraine',
'United Kingdom of Great Britain and Northern Ireland',
'United Republic of Tanzania', 'United States of America',
'Uruguay', 'Uzbekistan', 'Venezuela (Bolivarian Republic of)',
'Viet Nam', 'Zambia', 'Zimbabwe', 'Bahamas', 'Burkina Faso',
'Canary Islands', 'Cayman Islands', 'Central African Republic',
'Chad', 'Cook Islands', 'Democratic Republic of the Congo',
'Djibouti', 'Dominican Republic', 'Gambia', 'Germany', 'Ghana',
'Kenya', 'Latvia', 'Lesotho', 'Lithuania', 'Mauritania', 'Myanmar',
'Puerto Rico', 'Rwanda', 'Saint Helena', 'Samoa',
'Syrian Arab Republic', 'Timor-Leste', 'Tonga', 'Vanuatu', 'Yemen',
'Albania', 'Barbados', 'Belgium', 'Cabo Verde', 'Congo', 'Denmark',
'Grenada', 'Guam', 'Guinea-Bissau', 'Lebanon', 'Mauritius',
'Micronesia (Federated States of)', 'Netherlands (Kingdom of the)',
'Northern Mariana Islands', 'Oman',
'Saint Vincent and the Grenadines', 'Saudi Arabia', 'Seychelles',
'Solomon Islands', 'Sweden', 'American Samoa', 'Bermuda',
'China, Hong Kong Special Administrative Region', 'Comoros',
'Eritrea', 'Luxembourg', 'New Caledonia', 'Slovenia', 'Tunisia',
'Dominica', 'Guadeloupe', 'Iraq', 'Maldives', 'Niue',
'Saint Lucia', 'Sierra Leone', 'Trinidad and Tobago',
'Turks and Caicos Islands', 'United States Virgin Islands',
'Estonia', 'Finland', 'Guyana', 'Tokelau', 'Montserrat',
'Suriname', 'Togo', 'Benin', 'Côte d’Ivoire', 'Liberia',
'Martinique', 'Montenegro', 'Serbia', 'Antigua and Barbuda',
'Kiribati', 'Marshall Islands', 'Saint Kitts and Nevis',
'South Sudan', 'State of Palestine', 'Gabon', 'French Polynesia',
'Tuvalu', 'Palau', 'Wallis and Futuna Islands', 'Libya',
'Anguilla', 'British Virgin Islands',
'China, Macao Special Administrative Region', 'Saint Barthélemy',
'Saint Martin (French Part)', 'Sint Maarten (Dutch part)',
'United Arab Emirates', 'Kuwait', 'Qatar', 'Isle of Man',
'Sao Tome and Principe', 'Malta'], dtype=object)
len(countries)
216
# Vorkommen von Ländern der Liste
'Germany' in countries
True
# Vorkommen von Deutschland
for country in countries:
if 'german' in country.lower():
print(country)
Germany
data['Disaster Group'].unique()
array(['Natural'], dtype=object)
.value_counts()
zeigt wie oft eine Spalte die unterschiedlichen Werte annimmt.
data['Disaster Subroup'].value_counts()
Disaster Subroup
Hydrological 2731
Meteorological 1956
Climatological 616
Geophysical 504
Biological 30
Extra-terrestrial 1
Name: count, dtype: int64
Mit dem Argument normalize=True
wird das Vorkommen der Werte automatisch ins Verhältnis gesetzt.
data['Disaster Subroup'].value_counts(normalize=True)
Disaster Subroup
Hydrological 0.47
Meteorological 0.34
Climatological 0.11
Geophysical 0.09
Biological 0.01
Extra-terrestrial 0.00
Name: proportion, dtype: float64
data['Disaster Type'].value_counts()
Disaster Type
Flood 2400
Storm 1510
Extreme temperature 446
Earthquake 387
Drought 384
Mass movement (wet) 331
Wildfire 229
Volcanic activity 104
Infestation 29
Mass movement (dry) 13
Glacial lake outburst flood 3
Impact 1
Animal incident 1
Name: count, dtype: int64
data['Disaster Type'].value_counts(normalize=True)
Disaster Type
Flood 0.41
Storm 0.26
Extreme temperature 0.08
Earthquake 0.07
Drought 0.07
Mass movement (wet) 0.06
Wildfire 0.04
Volcanic activity 0.02
Infestation 0.00
Mass movement (dry) 0.00
Glacial lake outburst flood 0.00
Impact 0.00
Animal incident 0.00
Name: proportion, dtype: float64
Dataframes Sortieren#
Dataframes können anhand einer oder meherer Spalten sortiert werden.
data.sort_values(by="Total Affected")
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2574 | 2009 | Viet Nam | VNM | Natural | Hydrological | Mass movement (wet) | Landslide (wet) | 1 | 1.00 | 13.00 | NaN | NaN | 73.31 |
5056 | 2020 | Taiwan (Province of China) | TWN | Natural | Meteorological | Storm | Tropical cyclone | 1 | 1.00 | 1.00 | NaN | NaN | 88.44 |
1505 | 2005 | Netherlands (Kingdom of the) | NLD | Natural | Meteorological | Storm | Extra-tropical storm | 1 | 1.00 | NaN | NaN | NaN | 66.73 |
1876 | 2007 | Barbados | BRB | Natural | Geophysical | Earthquake | Ground movement | 1 | 1.00 | NaN | NaN | NaN | 70.85 |
147 | 2000 | Mexico | MEX | Natural | Geophysical | Earthquake | Ground movement | 1 | 1.00 | NaN | NaN | NaN | 58.84 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5815 | 2023 | Tajikistan | TJK | Natural | Hydrological | Flood | Flood (General) | 1 | NaN | 21.00 | NaN | NaN | NaN |
5819 | 2023 | Türkiye | TUR | Natural | Hydrological | Flood | Flood (General) | 1 | NaN | 19.00 | 25000000.00 | NaN | NaN |
5825 | 2023 | United States of America | USA | Natural | Hydrological | Flood | Flood (General) | 1 | NaN | NaN | NaN | NaN | NaN |
5826 | 2023 | United States of America | USA | Natural | Meteorological | Extreme temperature | Heat wave | 1 | NaN | 14.00 | NaN | NaN | NaN |
5838 | 2023 | Zimbabwe | ZWE | Natural | Meteorological | Storm | Tropical cyclone | 1 | NaN | 2.00 | NaN | NaN | NaN |
5838 rows × 13 columns
# 10 schlimmsten Naturkatastrophen
data.sort_values(by="Total Affected", ascending=False).head(n=10)
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3778 | 2015 | India | IND | Natural | Climatological | Drought | Drought | 1 | 330000000.00 | NaN | 3000000000.00 | 3704226000.00 | 80.99 |
658 | 2002 | India | IND | Natural | Climatological | Drought | Drought | 1 | 300000000.00 | NaN | 910722000.00 | 1481735695.00 | 61.46 |
892 | 2003 | China | CHN | Natural | Hydrological | Flood | Riverine flood | 6 | 155924986.00 | 662.00 | 15329640000.00 | 24387552800.00 | 62.86 |
2626 | 2010 | China | CHN | Natural | Hydrological | Flood | Riverine flood | 5 | 140194136.00 | 1911.00 | 18171000000.00 | 24387512516.00 | 74.51 |
1911 | 2007 | China | CHN | Natural | Hydrological | Flood | Riverine flood | 9 | 108793242.00 | 967.00 | 4919155000.00 | 6943174065.00 | 70.85 |
599 | 2002 | China | CHN | Natural | Meteorological | Storm | Sand/Dust storm | 1 | 100000000.00 | NaN | NaN | NaN | 61.46 |
2885 | 2011 | China | CHN | Natural | Hydrological | Flood | Riverine flood | 5 | 93360000.00 | 628.00 | 10704130000.00 | 13926499896.00 | 76.86 |
4126 | 2016 | United States of America | USA | Natural | Meteorological | Storm | Blizzard/Winter storm | 4 | 85000057.00 | 90.00 | 2125000000.00 | 2591136966.00 | 82.01 |
593 | 2002 | China | CHN | Natural | Hydrological | Flood | Flash flood | 1 | 80035257.00 | 793.00 | 3100000000.00 | 5043669370.00 | 61.46 |
2170 | 2008 | China | CHN | Natural | Meteorological | Extreme temperature | Severe winter conditions | 2 | 77000000.00 | 145.00 | 21100000000.00 | 28680657590.00 | 73.57 |
# Mehrere Argumente zum Sortieren sind möglich
data.sort_values(by=["Disaster Type", "Total Affected"], ascending=[True, False]).head(n=10)
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
3592 | 2014 | Niger | NER | Natural | Biological | Animal incident | Animal incident | 1 | 5.00 | 12.00 | NaN | NaN | 80.89 |
3778 | 2015 | India | IND | Natural | Climatological | Drought | Drought | 1 | 330000000.00 | NaN | 3000000000.00 | 3704226000.00 | 80.99 |
658 | 2002 | India | IND | Natural | Climatological | Drought | Drought | 1 | 300000000.00 | NaN | 910722000.00 | 1481735695.00 | 61.46 |
590 | 2002 | China | CHN | Natural | Climatological | Drought | Drought | 3 | 64560000.00 | NaN | 1210000000.00 | 1968658044.00 | 61.46 |
2400 | 2009 | China | CHN | Natural | Climatological | Drought | Drought | 2 | 60160000.00 | NaN | 3600000000.00 | 4910842514.00 | 73.31 |
889 | 2003 | China | CHN | Natural | Climatological | Drought | Drought | 2 | 51000000.00 | NaN | NaN | NaN | 62.86 |
106 | 2000 | India | IND | Natural | Climatological | Drought | Drought | 1 | 50000000.00 | 20.00 | 588000000.00 | 999309177.00 | 58.84 |
2623 | 2010 | China | CHN | Natural | Climatological | Drought | Drought | 1 | 35000000.00 | NaN | 2370000000.00 | 3180804835.00 | 74.51 |
3500 | 2014 | China | CHN | Natural | Climatological | Drought | Drought | 2 | 27726000.00 | NaN | 3680000000.00 | 4549240500.00 | 80.89 |
3485 | 2014 | Brazil | BRA | Natural | Climatological | Drought | Drought | 1 | 27000000.00 | NaN | 5000000000.00 | 6181033288.00 | 80.89 |
Indexing and Retriving Data#
Auf die Werte einer Spalte kann <dataframe>['<spaltenname>']
zugegriffen werden.
data['Year']
1 2000
2 2000
3 2000
4 2000
5 2000
...
5834 2023
5835 2023
5836 2023
5837 2023
5838 2023
Name: Year, Length: 5838, dtype: int64
Darauf können weitere Operationen oder Methoden angewendet werden:
data['Year'] + 10
1 2010
2 2010
3 2010
4 2010
5 2010
...
5834 2033
5835 2033
5836 2033
5837 2033
5838 2033
Name: Year, Length: 5838, dtype: int64
data['Year'].mean()
2011.1887632750943
Mehrere Spalten werden ausgewählt indem eine Liste von Spaltennamen übergeben wird
data[['Year', 'Country', 'Disaster Type', 'Total Affected']]
Year | Country | Disaster Type | Total Affected | |
---|---|---|---|---|
1 | 2000 | Afghanistan | Drought | 2580000.00 |
2 | 2000 | Algeria | Flood | 105.00 |
3 | 2000 | Algeria | Flood | 100.00 |
4 | 2000 | Algeria | Storm | 10.00 |
5 | 2000 | Angola | Flood | 9011.00 |
... | ... | ... | ... | ... |
5834 | 2023 | Viet Nam | Storm | 3.00 |
5835 | 2023 | Yemen | Flood | 169035.00 |
5836 | 2023 | Zambia | Flood | 154608.00 |
5837 | 2023 | Zambia | Flood | 22000.00 |
5838 | 2023 | Zimbabwe | Storm | NaN |
5838 rows × 4 columns
Boolean Indexing#
Die ausgewählten Daten können auch gefilteret werden, in dem eine Bedingung mitgegeben wird.
data[data['Country'] == 'Germany']
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
370 | 2001 | Germany | DEU | Natural | Meteorological | Storm | Lightning/Thunderstorms | 1 | NaN | 6.00 | 300000000.00 | 495838437.00 | 60.50 |
636 | 2002 | Germany | DEU | Natural | Hydrological | Flood | Flood (General) | 1 | 330108.00 | 27.00 | 11600000000.00 | 18873085384.00 | 61.46 |
637 | 2002 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 1 | NaN | 11.00 | 1800000000.00 | 2928582215.00 | 61.46 |
638 | 2002 | Germany | DEU | Natural | Meteorological | Storm | Storm (General) | 2 | 19.00 | 11.00 | 250000000.00 | 406747530.00 | 61.46 |
929 | 2003 | Germany | DEU | Natural | Meteorological | Extreme temperature | Heat wave | 1 | NaN | 9355.00 | 1650000000.00 | 2624945016.00 | 62.86 |
930 | 2003 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 1 | NaN | 5.00 | 300000000.00 | 477262730.00 | 62.86 |
931 | 2003 | Germany | DEU | Natural | Meteorological | Storm | Lightning/Thunderstorms | 1 | NaN | 10.00 | NaN | NaN | 62.86 |
1163 | 2004 | Germany | DEU | Natural | Geophysical | Earthquake | Ground movement | 1 | 150.00 | NaN | 12000000.00 | 18592738.00 | 64.54 |
1164 | 2004 | Germany | DEU | Natural | Meteorological | Storm | Storm (General) | 1 | NaN | 2.00 | 130000000.00 | 201421324.00 | 64.54 |
1434 | 2005 | Germany | DEU | Natural | Hydrological | Flood | Riverine flood | 2 | 450.00 | 1.00 | 220000000.00 | 329681571.00 | 66.73 |
1435 | 2005 | Germany | DEU | Natural | Meteorological | Extreme temperature | Cold wave | 1 | 165.00 | 1.00 | 300000000.00 | 449565778.00 | 66.73 |
1436 | 2005 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 1 | 2.00 | 2.00 | 270000000.00 | 404609200.00 | 66.73 |
1694 | 2006 | Germany | DEU | Natural | Hydrological | Flood | Riverine flood | 1 | 1000.00 | NaN | NaN | NaN | 68.88 |
1695 | 2006 | Germany | DEU | Natural | Meteorological | Extreme temperature | Heat wave | 1 | NaN | 2.00 | NaN | NaN | 68.88 |
1696 | 2006 | Germany | DEU | Natural | Meteorological | Extreme temperature | Severe winter conditions | 1 | NaN | 10.00 | NaN | NaN | 68.88 |
1697 | 2006 | Germany | DEU | Natural | Meteorological | Storm | Hail | 1 | 100.00 | 1.00 | NaN | NaN | 68.88 |
1698 | 2006 | Germany | DEU | Natural | Meteorological | Storm | Storm (General) | 2 | 200.00 | 10.00 | NaN | NaN | 68.88 |
1948 | 2007 | Germany | DEU | Natural | Hydrological | Flood | Riverine flood | 1 | NaN | 1.00 | NaN | NaN | 70.85 |
1949 | 2007 | Germany | DEU | Natural | Meteorological | Storm | Blizzard/Winter storm | 1 | NaN | 7.00 | NaN | NaN | 70.85 |
1950 | 2007 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 1 | 130.00 | 11.00 | 5500000000.00 | 7763011606.00 | 70.85 |
2205 | 2008 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 1 | NaN | 5.00 | 1200000000.00 | 1631127446.00 | 73.57 |
2206 | 2008 | Germany | DEU | Natural | Meteorological | Storm | Severe weather | 1 | NaN | 3.00 | 1500000000.00 | 2038909307.00 | 73.57 |
2435 | 2009 | Germany | DEU | Natural | Hydrological | Flood | Riverine flood | 1 | NaN | NaN | 20000000.00 | 27282458.00 | 73.31 |
2436 | 2009 | Germany | DEU | Natural | Meteorological | Extreme temperature | Cold wave | 2 | NaN | 15.00 | NaN | NaN | 73.31 |
2437 | 2009 | Germany | DEU | Natural | Meteorological | Storm | Lightning/Thunderstorms | 1 | NaN | 1.00 | 50000000.00 | 68206146.00 | 73.31 |
2668 | 2010 | Germany | DEU | Natural | Hydrological | Flood | Flash flood | 1 | NaN | 3.00 | NaN | NaN | 74.51 |
2669 | 2010 | Germany | DEU | Natural | Meteorological | Extreme temperature | Cold wave | 1 | NaN | 1.00 | NaN | NaN | 74.51 |
2670 | 2010 | Germany | DEU | Natural | Meteorological | Storm | Blizzard/Winter storm | 1 | NaN | NaN | NaN | NaN | 74.51 |
2671 | 2010 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 1 | NaN | 4.00 | 1000000000.00 | 1342111745.00 | 74.51 |
2907 | 2011 | Germany | DEU | Natural | Hydrological | Flood | Riverine flood | 1 | NaN | 4.00 | NaN | NaN | 76.86 |
3128 | 2012 | Germany | DEU | Natural | Meteorological | Extreme temperature | Cold wave | 2 | NaN | 6.00 | NaN | NaN | 78.45 |
3334 | 2013 | Germany | DEU | Natural | Hydrological | Flood | Riverine flood | 1 | 6350.00 | 4.00 | 12900000000.00 | 16205763565.00 | 79.60 |
3335 | 2013 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 2 | 2.00 | 7.00 | NaN | NaN | 79.60 |
3336 | 2013 | Germany | DEU | Natural | Meteorological | Storm | Hail | 1 | NaN | NaN | 4800000000.00 | 6030051559.00 | 79.60 |
3528 | 2014 | Germany | DEU | Natural | Meteorological | Storm | Lightning/Thunderstorms | 2 | 1.00 | 8.00 | 400000000.00 | 494482663.00 | 80.89 |
3987 | 2016 | Germany | DEU | Natural | Hydrological | Flood | Flood (General) | 1 | NaN | 7.00 | 2000000000.00 | 2438717145.00 | 82.01 |
4219 | 2017 | Germany | DEU | Natural | Hydrological | Flood | Riverine flood | 1 | 600.00 | NaN | NaN | NaN | 83.76 |
4220 | 2017 | Germany | DEU | Natural | Meteorological | Storm | Hail | 1 | NaN | 2.00 | 740000000.00 | 883505559.00 | 83.76 |
4221 | 2017 | Germany | DEU | Natural | Meteorological | Storm | Severe weather | 1 | 24.00 | 3.00 | 159000000.00 | 189834303.00 | 83.76 |
4448 | 2018 | Germany | DEU | Natural | Meteorological | Extreme temperature | Heat wave | 1 | NaN | NaN | NaN | NaN | 85.80 |
4449 | 2018 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 1 | 12.00 | 5.00 | 588475000.00 | 685844110.00 | 85.80 |
4694 | 2019 | Germany | DEU | Natural | Meteorological | Extreme temperature | Heat wave | 2 | NaN | 4.00 | NaN | NaN | 87.36 |
4695 | 2019 | Germany | DEU | Natural | Meteorological | Storm | Blizzard/Winter storm | 1 | NaN | 1.00 | NaN | NaN | 87.36 |
4934 | 2020 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 1 | 33.00 | NaN | NaN | NaN | 88.44 |
5194 | 2021 | Germany | DEU | Natural | Hydrological | Flood | Flood (General) | 1 | 1000.00 | 197.00 | 40000000000.00 | 43201119615.00 | 92.59 |
5195 | 2021 | Germany | DEU | Natural | Meteorological | Storm | Lightning/Thunderstorms | 1 | 600.00 | NaN | NaN | NaN | 92.59 |
5196 | 2021 | Germany | DEU | Natural | Meteorological | Storm | Storm (General) | 1 | 4.00 | 1.00 | NaN | NaN | 92.59 |
5461 | 2022 | Germany | DEU | Natural | Meteorological | Extreme temperature | Heat wave | 1 | NaN | 8173.00 | NaN | NaN | 100.00 |
5462 | 2022 | Germany | DEU | Natural | Meteorological | Storm | Extra-tropical storm | 3 | 2.00 | 7.00 | 1023156000.00 | 1023156000.00 | 100.00 |
5707 | 2023 | Germany | DEU | Natural | Meteorological | Storm | Severe weather | 1 | NaN | 1.00 | NaN | NaN | NaN |
data[data['Total Deaths'] >= 1000]
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
109 | 2000 | India | IND | Natural | Hydrological | Flood | Riverine flood | 2 | 46600000.00 | 1751.00 | 734500000.00 | 1248286718.00 | 58.84 |
358 | 2001 | El Salvador | SLV | Natural | Geophysical | Earthquake | Ground movement | 2 | 1590550.00 | 1159.00 | 1848500000.00 | 3055191171.00 | 60.50 |
386 | 2001 | India | IND | Natural | Geophysical | Earthquake | Ground movement | 1 | 6321812.00 | 20005.00 | 2623000000.00 | 4335280736.00 | 60.50 |
542 | 2002 | Afghanistan | AFG | Natural | Geophysical | Earthquake | Ground movement | 3 | 100891.00 | 1200.00 | NaN | NaN | 61.46 |
663 | 2002 | India | IND | Natural | Meteorological | Extreme temperature | Heat wave | 1 | NaN | 1030.00 | NaN | NaN | 61.46 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5692 | 2023 | Democratic Republic of the Congo | COD | Natural | Hydrological | Flood | Flash flood | 1 | 50000.00 | 2970.00 | 10000000.00 | NaN | NaN |
5745 | 2023 | Libya | LBY | Natural | Meteorological | Storm | Storm (General) | 1 | 1600000.00 | 12352.00 | NaN | NaN | NaN |
5755 | 2023 | Morocco | MAR | Natural | Geophysical | Earthquake | Ground movement | 1 | 1002476.00 | 2497.00 | NaN | NaN | NaN |
5811 | 2023 | Syrian Arab Republic | SYR | Natural | Geophysical | Earthquake | Ground movement | 3 | 4109320.00 | 4500.00 | 8900000000.00 | NaN | NaN |
5818 | 2023 | Türkiye | TUR | Natural | Geophysical | Earthquake | Ground movement | 3 | 9207698.00 | 50103.00 | 34000000000.00 | NaN | NaN |
90 rows × 13 columns
Wie viele Menschen sind im Schnitt pro Erdbeben betroffen?
data[data['Disaster Type'] == 'Earthquake']
Year | Country | ISO | Disaster Group | Disaster Subroup | Disaster Type | Disaster Subtype | Total Events | Total Affected | Total Deaths | Total Damage (USD, original) | Total Damage (USD, adjusted) | CPI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
22 | 2000 | Azerbaijan | AZE | Natural | Geophysical | Earthquake | Ground movement | 1 | 3294.00 | 31.00 | 10000000.00 | 16995054.00 | 58.84 |
24 | 2000 | Bangladesh | BGD | Natural | Geophysical | Earthquake | Ground movement | 1 | 1000.00 | NaN | NaN | NaN | 58.84 |
56 | 2000 | China | CHN | Natural | Geophysical | Earthquake | Ground movement | 5 | 2105050.00 | 9.00 | 116983000.00 | 198813241.00 | 58.84 |
64 | 2000 | Colombia | COL | Natural | Geophysical | Earthquake | Ground movement | 1 | 430.00 | 2.00 | NaN | NaN | 58.84 |
95 | 2000 | Greece | GRC | Natural | Geophysical | Earthquake | Ground movement | 1 | 600.00 | NaN | NaN | NaN | 58.84 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
5775 | 2023 | Papua New Guinea | PNG | Natural | Geophysical | Earthquake | Ground movement | 1 | 16274.00 | 8.00 | NaN | NaN | NaN |
5777 | 2023 | Peru | PER | Natural | Geophysical | Earthquake | Ground movement | 1 | 141.00 | 1.00 | NaN | NaN | NaN |
5811 | 2023 | Syrian Arab Republic | SYR | Natural | Geophysical | Earthquake | Ground movement | 3 | 4109320.00 | 4500.00 | 8900000000.00 | NaN | NaN |
5814 | 2023 | Tajikistan | TJK | Natural | Geophysical | Earthquake | Ground movement | 1 | 2205.00 | NaN | NaN | NaN | NaN |
5818 | 2023 | Türkiye | TUR | Natural | Geophysical | Earthquake | Ground movement | 3 | 9207698.00 | 50103.00 | 34000000000.00 | NaN | NaN |
387 rows × 13 columns
data[data['Disaster Type'] == 'Earthquake']['Total Affected']
22 3294.00
24 1000.00
56 2105050.00
64 430.00
95 600.00
...
5775 16274.00
5777 141.00
5811 4109320.00
5814 2205.00
5818 9207698.00
Name: Total Affected, Length: 387, dtype: float64
data[data['Disaster Type'] == 'Earthquake']['Total Affected'].mean()
373040.7386666667
Weitere Recherchefragen#
Wie viele Naturkatastrophen gab es in Deutschland?
In welchem Jahr gabe es die meisten Naturkatastrophen?
Welche Länder sind am stärksten von Naturkatastrophen betroffen?
Welche Länder sind von Naturkatastrophen betroffen haben aber vergleichsweise geringe Todesfälle?
Welche Naturkatastrophen sind am tödlichsten?