Skip to content

Commit

Permalink
docs: Update IOC notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
pmav99 committed Jun 26, 2024
1 parent 1497b04 commit 78addc8
Show file tree
Hide file tree
Showing 5 changed files with 917 additions and 102 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ clean_notebooks:
pre-commit run nbstripout -a

exec_notebooks:
pytest --nbmake --nbmake-timeout=60 --nbmake-kernel=python3 $$(git ls-files | grep ipynb)
pytest --ff --nbmake --nbmake-timeout=90 --nbmake-kernel=python3 $$(git ls-files | grep ipynb)

docs:
make -C docs html
Expand Down
193 changes: 95 additions & 98 deletions examples/IOC_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,24 +11,14 @@
"source": [
"import logging\n",
"\n",
"import shapely\n",
"import hvplot.pandas\n",
"import geopandas as gpd\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd\n",
"import shapely\n",
"import xarray as xr\n",
"\n",
"from searvey import ioc\n",
"\n",
"logging.basicConfig(\n",
" level=20,\n",
" style=\"{\",\n",
" format=\"{asctime:s}; {levelname:8s}; {threadName:23s}; {name:<25s} {lineno:5d}; {message:s}\",\n",
")\n",
"\n",
"logging.getLogger(\"urllib3\").setLevel(30)\n",
"logging.getLogger(\"parso\").setLevel(30)\n",
"\n",
"logger = logging.getLogger(__name__)"
"import searvey"
]
},
{
Expand All @@ -38,7 +28,9 @@
"tags": []
},
"source": [
"## Retrieve Station Metadata"
"## Retrieve Station Metadata\n",
"\n",
"In order to retrieve station metadata we need to use the `get_ioc_stations()` function which returns a `geopandas.GeoDataFrame`:"
]
},
{
Expand All @@ -50,8 +42,8 @@
},
"outputs": [],
"source": [
"ioc_stations = ioc.get_ioc_stations()\n",
"ioc_stations"
"ioc_stations = searvey.get_ioc_stations()\n",
"len(ioc_stations)"
]
},
{
Expand All @@ -63,13 +55,7 @@
},
"outputs": [],
"source": [
"figure, axis = plt.subplots(1, 1)\n",
"figure.set_size_inches(12, 12 / 1.61803398875)\n",
"\n",
"countries = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n",
"_ = countries.plot(color='lightgrey', ax=axis, zorder=-1)\n",
"_ = ioc_stations.plot(ax=axis)\n",
"_ = axis.set_title(f'all IOC stations')"
"ioc_stations.columns"
]
},
{
Expand All @@ -81,33 +67,33 @@
},
"outputs": [],
"source": [
"ioc_stations.columns"
"with pd.option_context('display.max_columns', None):\n",
" ioc_stations.sample(3).sort_index()"
]
},
{
"cell_type": "markdown",
"cell_type": "code",
"execution_count": null,
"id": "5",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"## Retrieve station metadata from arbitrary polygon"
"world_plot = ioc_stations.hvplot(geo=True, tiles=True, hover_cols=[\"ioc_code\", \"location\"])\n",
"world_plot.opts(width=800, height=500)"
]
},
{
"cell_type": "code",
"execution_count": null,
"cell_type": "markdown",
"id": "6",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"east_coast = shapely.geometry.box(-85, 25, -65, 45)\n",
"east_coast\n",
"## Retrieve station metadata from arbitrary polygon\n",
"\n",
"east_stations = ioc.get_ioc_stations(region=east_coast)\n",
"east_stations"
"We can filter the IOC stations using any shapely object. E.g. to only select stations in the East Coast of US:"
]
},
{
Expand All @@ -119,143 +105,154 @@
},
"outputs": [],
"source": [
"east_stations[~east_stations.contacts.str.contains(\"NOAA\", na=False)]"
]
},
{
"cell_type": "markdown",
"id": "8",
"metadata": {},
"source": [
"## Retrieve IOC station data"
"east_coast = shapely.geometry.box(-85, 25, -65, 45)\n",
"east_coast_stations = searvey.get_ioc_stations(region=east_coast)\n",
"len(east_coast_stations)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9",
"id": "8",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"east_data = ioc.get_ioc_data(\n",
" ioc_metadata=east_stations,\n",
" endtime=\"2020-05-30\",\n",
" period=3,\n",
")\n",
"east_data"
"east_coast_stations.hvplot.points(geo=True, tiles=True)"
]
},
{
"cell_type": "markdown",
"id": "9",
"metadata": {},
"source": [
"## Retrieve IOC station data\n",
"\n",
"The function for retrieving data is called `fetch_ioc_station()` and it returns \n",
"\n",
"In its simplest form it only requires the station_id (i.e. IOC_CODE) and it will retrieve the last week of data:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10",
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"def drop_all_nan_vars(ds: xr.Dataset) -> xr.Dataset:\n",
" for var in ds.data_vars:\n",
" if ds[var].notnull().sum() == 0:\n",
" ds = ds.drop_vars(var)\n",
" return ds\n",
"\n",
"ds = drop_all_nan_vars(east_data.sel(ioc_code=\"setp1\"))\n",
"ds"
"df = searvey.fetch_ioc_station(\"acap2\")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "11",
"metadata": {
"tags": []
},
"metadata": {},
"source": [
"As you can see not all the data are suitable for use...\n",
"\n",
"More specifically, the `rad` seems to have been re-calibrated in the afternoon of 2020-05-28:"
"We can also explicitly specify the start and the end date. E.g. to retrieve the first 10 days of May 2024:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "12",
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"fix, axes = plt.subplots(1, 1)\n",
"\n",
"_ = ds.prs.plot(ax=axes, label=\"prs\")\n",
"_ = ds.rad.plot(ax=axes, label=\"rad\")\n",
"_ = ds.ra2.plot(ax=axes, label=\"ra2\")\n",
"axes.legend()"
"df = searvey.fetch_ioc_station(\n",
" station_id=\"alva\",\n",
" start_date=pd.Timestamp(\"2024-05-01\"),\n",
" end_date=pd.Timestamp(\"2024-05-10\"),\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "13",
"metadata": {
"tags": []
},
"metadata": {},
"source": [
"Similarly some stations might have missing data"
"If we request more than 30 days, then multiple HTTP requests are send to the IOC servers via multithreading and the responses are merged to a single dataframe. \n",
"\n",
"In this case, setting `progress_bar=True` can be helpful in monitoring the progress of the HTTP requests. \n",
"For example to retrieve data for the first 6 months of 2020:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14",
"metadata": {
"tags": []
},
"metadata": {},
"outputs": [],
"source": [
"bahamas = ds.where(ds.country == \"Bahamas\")\n",
"bahamas"
"df = searvey.fetch_ioc_station(\n",
" station_id=\"alva\",\n",
" start_date=pd.Timestamp(\"2020-01-01\"),\n",
" end_date=pd.Timestamp(\"2020-06-01\"),\n",
" progress_bar=True,\n",
")\n",
"df"
]
},
{
"cell_type": "markdown",
"id": "15",
"metadata": {},
"source": [
"Keep in mind that each IOC station may return dataframes with different sensors/columns. For example the `setp1` station in Bahamas returns a bunch of them:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15",
"id": "16",
"metadata": {},
"outputs": [],
"source": [
"bahamas.ra2.plot()"
"bahamas = searvey.fetch_ioc_station(\n",
" station_id=\"setp1\",\n",
" start_date=pd.Timestamp(\"2020-05-25\"),\n",
" end_date=pd.Timestamp(\"2020-05-30\"),\n",
" progress_bar=False,\n",
")\n",
"bahamas"
]
},
{
"cell_type": "markdown",
"id": "16",
"metadata": {
"tags": []
},
"id": "17",
"metadata": {},
"source": [
"Trying to fill the missing values is not that difficult, but you probably need to review the results"
"Nevertheless, the returned timeseries are **not** ready to be used. \n",
"\n",
"E.g. we see that in the last days of May the `rad` sensor was offline for some time:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17",
"metadata": {
"tags": []
},
"id": "18",
"metadata": {},
"outputs": [],
"source": [
"bahamas.ra2.interpolate_na(dim=\"time\", method=\"linear\").plot()"
"bahamas.rad.hvplot(grid=True)"
]
},
{
"cell_type": "markdown",
"id": "19",
"metadata": {},
"source": [
"So the IOC data **do** need some data-cleaning."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"display_name": "searvey",
"language": "python",
"name": "python3"
"name": "searvey"
},
"language_info": {
"codemirror_mode": {
Expand Down
Loading

0 comments on commit 78addc8

Please sign in to comment.