Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 5 - the url template is outdated leading to 404: Not Found #50

Open
jakobkolb opened this issue Jul 15, 2016 · 12 comments
Open

Comments

@jakobkolb
Copy link

Apparently, the site for Canadian historical weather data changed their site.

@GillesMoyse
Copy link

GillesMoyse commented Aug 2, 2016

2 things to fix in the notebook :

  • new version for url_template : url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
  • in weather_mar2012 = pd.read_csv("data/eng-hourly-03012012-03312012.csv", skiprows=15, index_col='Date/Time', parse_dates=True, encoding='latin1'), remove header=True

Sent a PR.

@andreas-h
Copy link

also, the encoding='latin1' should go (at least on Python3)

@hsuanie
Copy link

hsuanie commented Sep 15, 2017

Hello. I tried with the updated codes. But I got an error stating as follows:
File b'data/eng-hourly-03012012-03312012.csv' does not exist

Please kindly help me thanks!

@Enkerli
Copy link

Enkerli commented Jul 9, 2018

At this point (July 2018), the following works in Python3:
In[]: url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"

and:
In[]: url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)

An important change, apart from the URL itself, is that header accepts an integer (row number) instead of a boolean.

Because of the encoding change, we need to change this, as well:
In[]: weather_mar2012[u"Temp (°C)"].plot(figsize=(15, 5))

Also, the “Data Quality” column disappeared. This requires tweaks while working with columns.

In[]: weather_mar2012.columns = [ u'Year', u'Month', u'Day', u'Time', u'Temp (C)', u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag', u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', u'Wind Chill Flag', u'Weather']
In[]: weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)

In[]:

def download_weather_month(year, month):
    if month == 1:
        year += 1
    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=0)
    weather_data = weather_data.dropna(axis=1)
    weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
    weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
    return weather_data

@mvresh
Copy link

mvresh commented Apr 18, 2019

At this point (July 2018), the following works in Python3:
In[]: url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"

and:
In[]: url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)

An important change, apart from the URL itself, is that header accepts an integer (row number) instead of a boolean.

Because of the encoding change, we need to change this, as well:
In[]: weather_mar2012[u"Temp (°C)"].plot(figsize=(15, 5))

Also, the “Data Quality” column disappeared. This requires tweaks while working with columns.

In[]: weather_mar2012.columns = [ u'Year', u'Month', u'Day', u'Time', u'Temp (C)', u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag', u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', u'Wind Chill Flag', u'Weather']
In[]: weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)

In[]:

def download_weather_month(year, month):
    if month == 1:
        year += 1
    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=0)
    weather_data = weather_data.dropna(axis=1)
    weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
    weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
    return weather_data

When using the url template and the weather data to compare the temperatures with bikes data, code seems to be not working. I modified url template and made the changes required in later parts, and everything is running well. But when I tried to output first three rows of the data, its showing nothing.

@mvresh
Copy link

mvresh commented Apr 18, 2019

Here's the code :

`

getting weather data to look at temps

 def get_weather_data(year):
      url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"

  # airport station is 5415, hence that was used

  data_by_month = []

  for month in range(1,13):

    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, encoding='utf-8', header=0)
    weather_data.columns = map(lambda x: x.replace('\xb0', ''), weather_data.columns)
    

    # xbo is degree symbol

    weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time'], axis=1)
    data_by_month.append(weather_data.dropna())

  return pd.concat(data_by_month).dropna(axis=1, how='all').dropna()

weather_data = get_weather_data(2012)

weather_data[:5]

`

@kbridge
Copy link

kbridge commented Jun 3, 2022

url_template = "http://climate.weather.gc.ca/climate_data/bulk_data_e.html?stationID=5415&Year={year}&Month={month}&format=csv&timeframe=1&submit=%20Download+Data"
# url_template = 'https://raw.githubusercontent.com/kbridge/weather-data/main/weather_data_{year}_{month}.csv'
url = url_template.format(month=3, year=2012)
weather_mar2012 = pd.read_csv(url, index_col='Date/Time (LST)', parse_dates=True, encoding='utf-8-sig')

Summary:

  • url_template is the same as @GillesMoyse posted. That url is too slow to load. You can change it to my mirror at gist.
  • header=True is removed.
  • skiprows=15 is removed because there is no metadata before the CSV data anymore.
  • index_col is changed from 'Date/Time' to 'Date/Time (LST)'.
  • encoding is changed from 'latin1' to 'utf-8-sig'. We need to use the -sig variant to skip the UTF-8 BOM; otherwise, the first column will contain weird characters .

@kbridge
Copy link

kbridge commented Jun 5, 2022

Before renaming the columns to eliminate ° characters, drop some unexpected new columns first:

weather_mar2012 = weather_mar2012.drop(['Longitude (x)', 'Latitude (y)', 'Station Name', 'Climate ID', 'Precip. Amount (mm)', 'Precip. Amount Flag'], axis=1)

And the renaming code becomes

weather_mar2012.columns = [
    u'Year', u'Month', u'Day', u'Time', u'Temp (C)', 
    u'Temp Flag', u'Dew Point Temp (C)', u'Dew Point Temp Flag', 
    u'Rel Hum (%)', u'Rel Hum Flag', u'Wind Dir (10s deg)', u'Wind Dir Flag', 
    u'Wind Spd (km/h)', u'Wind Spd Flag', u'Visibility (km)', u'Visibility Flag',
    u'Stn Press (kPa)', u'Stn Press Flag', u'Hmdx', u'Hmdx Flag', u'Wind Chill', 
    u'Wind Chill Flag', u'Weather']

Column Data Quality is removed because the new data doesn't contain the column anymore.

This also renames the column Time (LST) to Time.

@kbridge
Copy link

kbridge commented Jun 5, 2022

No need to drop the column Data Quality anymore:

-weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time', 'Data Quality'], axis=1)
+weather_mar2012 = weather_mar2012.drop(['Year', 'Month', 'Day', 'Time'], axis=1)

@kbridge
Copy link

kbridge commented Jun 5, 2022

temperatures.head is a method now, so you should

-print(temperatures.head)
+print(temperatures.head())

@kbridge
Copy link

kbridge commented Jun 5, 2022

Change download_weather_month to this:

# mirror
# url_template = 'https://raw.githubusercontent.com/kbridge/weather-data/main/weather_data_{year}_{month}.csv'

def download_weather_month(year, month):
    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, index_col='Date/Time (LST)', parse_dates=True, encoding='utf-8-sig')
    weather_data = weather_data.dropna(axis=1)
    weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
    weather_data = weather_data.drop([
        'Year',
        'Day',
        'Month',
        'Time (LST)',
        'Longitude (x)',
        'Latitude (y)',
        'Station Name',
        'Climate ID',
    ], axis=1)
    return weather_data

which was

def download_weather_month(year, month):
    if month == 1:
        year += 1
    url = url_template.format(year=year, month=month)
    weather_data = pd.read_csv(url, skiprows=15, index_col='Date/Time', parse_dates=True, header=True)
    weather_data = weather_data.dropna(axis=1)
    weather_data.columns = [col.replace('\xb0', '') for col in weather_data.columns]
    weather_data = weather_data.drop(['Year', 'Day', 'Month', 'Time', 'Data Quality'], axis=1)
    return weather_data

@kbridge
Copy link

kbridge commented Jun 5, 2022

Sorry I have used this issue as if it is my own memo. But I will be glad if my comments help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants