-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Replace FilePointer with universal pathlib (#413)
* Checkpoint upath * Remove storage options and clean up imports. * Better type hints * Add columns kwarg. * Add pip install s3fs * Clear output again.
- Loading branch information
1 parent
a94b85b
commit cb30209
Showing
15 changed files
with
223 additions
and
128 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Accessing Remote Data\n", | ||
"\n", | ||
"If you're accessing hipscat catalogs on a local file system, a typical path string like `\"/path/to/catalogs\"` will be sufficient. This tutorial will help you get started if you need to access data over HTTP/S, cloud storage, or have some additional parameters for connecting to your data.\n", | ||
"\n", | ||
"We use [`fsspec`](https://github.com/fsspec/filesystem_spec) and [`universal_pathlib`](https://github.com/fsspec/universal_pathlib) to create connections to remote sources for data. Please refer to their documentation for a list of supported filesystems and any filesystem-specific parameters.\n", | ||
"\n", | ||
"Below, we provide some a basic workflow for accessing remote data, as well as filesystem-specific hints." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## HTTP / HTTPS\n", | ||
"\n", | ||
"Firstly, make sure to install the fsspect http package:\n", | ||
"\n", | ||
"```\n", | ||
"pip install aiohttp \n", | ||
"```\n", | ||
"OR\n", | ||
"```\n", | ||
"conda install aiohttp\n", | ||
"```\n", | ||
"\n", | ||
"Occasionally, with HTTPS data, you may see issues with missing certificates. If you encounter a `FileNotFoundError`, but you're pretty sure the file should be found:\n", | ||
"\n", | ||
"1. Check your network and server availability\n", | ||
"2. On Linux be sure that openSSL and ca-certificates are in place\n", | ||
"3. On Mac run `/Applications/Python\\ 3.*/Install\\ Certificates.command`" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from upath import UPath\n", | ||
"\n", | ||
"test_path = UPath(\"https://data.lsdb.io/unstable/gaia_dr3/gaia/\")\n", | ||
"test_path.exists()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import lsdb\n", | ||
"\n", | ||
"cat = lsdb.read_hipscat(\"https://data.lsdb.io/unstable/gaia_dr3/gaia/\")\n", | ||
"cat" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## S3 and cloud resources\n", | ||
"\n", | ||
"You'll want to install the `s3fs` package\n", | ||
"\n", | ||
"```\n", | ||
"pip install s3fs\n", | ||
"```\n", | ||
"\n", | ||
"(or `adlfs` or `gcsfs` for your cloud provider).\n", | ||
"\n", | ||
"You can pass your credentials once when creating the `UPath` instance, and use that configuration for any other paths constructed from that instance. In the case of PanStarrs, this is hosted in a public cloud bucket. You are still required to pass `anon = True` for public data buckets, in lieu of credentials. You can confirm that the path and credentials are good, before incurring any further expensive data reads." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install s3fs --quiet" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"panstarrs_path = UPath(\"s3://stpubdata/panstarrs/ps1/public/hipscat/\", anon=True)\n", | ||
"test_path.exists()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cat = lsdb.read_hipscat(panstarrs_path / \"otmo\")\n", | ||
"cat" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cat = lsdb.read_hipscat(panstarrs_path / \"detection\")\n", | ||
"cat" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "hipscatenv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.14" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.