Data Analysis with pandas#

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

Import and Inspect Data#

Import a csv file with the function pd.read_csv. The file should be saved to the same directory as the current notebook:

df = pd.read_csv('wind.csv')

The data is imported as a pandas dataframe object which we think of as spreadsheet:

type(df)
pandas.core.frame.DataFrame

View the first 5 rows of the dataframe:

df.head()
day month year dayofyear avg_wind_speed
0 13 4 2023 103 13.0
1 12 4 2023 102 13.0
2 11 4 2023 101 17.5
3 10 4 2023 100 11.5
4 9 4 2023 99 19.5

View the last 5 rows of the dataframe:

df.tail()
day month year dayofyear avg_wind_speed
9995 1 12 1995 335 19.5
9996 30 11 1995 334 19.5
9997 29 11 1995 333 25.0
9998 28 11 1995 332 21.5
9999 27 11 1995 331 17.5

View a summary of the dataframe:

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   day             10000 non-null  int64  
 1   month           10000 non-null  int64  
 2   year            10000 non-null  int64  
 3   dayofyear       10000 non-null  int64  
 4   avg_wind_speed  10000 non-null  float64
dtypes: float64(1), int64(4)
memory usage: 390.8 KB

Select Rows and Columns#

Create New Columns#

Aggregate Data#

df[['month','day','avg_wind_speed']].groupby(['month','day']).mean().unstack()
avg_wind_speed
day 1 2 3 4 5 6 7 8 9 10 ... 22 23 24 25 26 27 28 29 30 31
month
1 14.625000 16.339286 16.714286 15.267857 15.267857 14.446429 16.642857 15.625000 15.910714 14.125000 ... 13.803571 14.160714 13.178571 11.964286 13.214286 12.071429 12.553571 15.142857 14.178571 13.892857
2 13.642857 12.517857 12.964286 14.553571 17.410714 13.732143 14.000000 13.821429 10.660714 12.303571 ... 15.285714 15.410714 13.482143 14.285714 14.500000 12.803571 14.500000 15.357143 NaN NaN
3 15.142857 15.982143 14.785714 15.607143 15.357143 13.392857 13.392857 14.428571 15.303571 15.267857 ... 15.625000 16.267857 12.857143 14.678571 14.410714 16.125000 18.285714 16.303571 14.482143 16.214286
4 15.250000 16.160714 15.589286 13.482143 15.142857 13.017857 14.821429 15.821429 14.321429 14.517857 ... 16.037037 14.962963 13.574074 14.370370 15.259259 15.833333 12.796296 14.388889 14.129630 NaN
5 14.962963 14.962963 14.018519 13.796296 14.351852 13.777778 14.370370 14.648148 12.592593 11.518519 ... 13.185185 13.574074 13.833333 13.648148 13.296296 13.518519 14.277778 13.629630 13.740741 11.092593
6 14.018519 13.444444 12.777778 13.907407 14.296296 11.888889 13.648148 13.759259 14.388889 12.574074 ... 14.092593 13.388889 12.425926 12.074074 14.555556 13.555556 13.537037 13.111111 12.462963 NaN
7 15.611111 14.685185 13.500000 12.777778 13.462963 12.666667 14.777778 12.000000 12.166667 13.814815 ... 14.555556 14.462963 13.185185 12.722222 12.611111 12.444444 13.148148 12.407407 12.962963 12.555556
8 15.000000 13.222222 11.907407 12.629630 13.407407 13.037037 12.777778 13.333333 11.351852 12.518519 ... 12.407407 11.111111 12.166667 12.055556 12.814815 13.333333 11.703704 13.981481 13.555556 12.981481
9 12.462963 11.574074 11.777778 12.481481 11.500000 12.444444 12.833333 14.685185 14.055556 13.129630 ... 12.592593 12.388889 12.703704 13.407407 13.685185 13.166667 13.481481 14.148148 14.944444 NaN
10 12.796296 13.314815 12.574074 12.907407 11.277778 11.629630 13.277778 14.314815 13.629630 15.425926 ... 12.629630 13.500000 12.703704 15.851852 13.185185 13.611111 14.722222 14.388889 14.851852 14.222222
11 11.851852 12.870370 13.611111 15.703704 13.370370 15.870370 13.592593 13.333333 14.129630 12.518519 ... 13.962963 14.962963 15.555556 15.203704 14.611111 16.303571 14.500000 15.535714 15.553571 NaN
12 15.625000 14.357143 12.589286 13.839286 12.017857 12.285714 13.428571 13.392857 14.107143 14.160714 ... 13.053571 13.160714 13.625000 13.375000 14.678571 14.982143 12.303571 14.160714 13.517857 12.357143

12 rows × 31 columns