From 843ec5ff39df2b94b5506f776f329fe3ee8d61aa Mon Sep 17 00:00:00 2001 From: Simon Perkins Date: Tue, 10 Sep 2024 17:17:46 +0200 Subject: [PATCH] Initial documentation --- README.rst | 116 +++++++++++++++++++++++++++++++++++++++++ doc/Makefile | 20 +++++++ doc/make.bat | 35 +++++++++++++ doc/source/api.rst | 8 +++ doc/source/conf.py | 65 +++++++++++++++++++++++ doc/source/index.rst | 20 +++++++ doc/source/install.rst | 59 +++++++++++++++++++++ doc/source/readme.rst | 1 + 8 files changed, 324 insertions(+) create mode 100644 doc/Makefile create mode 100644 doc/make.bat create mode 100644 doc/source/api.rst create mode 100644 doc/source/conf.py create mode 100644 doc/source/index.rst create mode 100644 doc/source/install.rst create mode 100644 doc/source/readme.rst diff --git a/README.rst b/README.rst index e69de29..d6ad515 100644 --- a/README.rst +++ b/README.rst @@ -0,0 +1,116 @@ +xarray-ms +========= + +xarray-ms presents a Measurement Set v4 view (MSv4) over +`CASA Measurement Sets `_ (MSv2). +It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications +to be developed on well-understood MSv2 data. + +.. code-block:: python + + >>> import xarray_ms + >>> import xarray + >>> ds = xarray.open_dataset("/data/L795830_SB001_uv.MS/", + chunks={"time": 2000, "baseline": 1000}) + >>> ds + Size: 70GB + Dimensions: (time: 28760, baseline: 2775, frequency: 16, + polarization: 4, uvw_label: 3) + Coordinates: + antenna1_name (baseline) object 22kB dask.array + antenna2_name (baseline) object 22kB dask.array + baseline_id (baseline) int64 22kB dask.array + * frequency (frequency) float64 128B 1.202e+08 ... 1.204e+08 + * polarization (polarization) + FLAG (time, baseline, frequency, polarization) uint8 5GB dask.array + TIME_CENTROID (time, baseline) float64 638MB dask.array + UVW (time, baseline, uvw_label) float64 2GB dask.array + VISIBILITY (time, baseline, frequency, polarization) complex64 41GB dask.array + WEIGHT (time, baseline, frequency, polarization) float32 20GB dask.array + Attributes: + antenna_xds: Size: 4kB\nDimensions: (... + version: 0.0.1 + creation_date: 2024-09-10T14:29:22.587984+00:00 + data_description_id: 0 + +Measurement Set v4 +------------------ + +NRAO_/SKAO_ are developing a new xarray-based `Measurement Set v4 specification `_. +While there are many changes some of the major highlights are: + +* xarray_ is used to define the specification. +* MSv4 data consists of Datasets of ndarrays on a regular time-channel grid. + MSv2 data is tabular and, while in many instances the time-channel grid is regular, + this was not guaranteed, especially after MSv2 datasets had been transformed by various tasks. + + +xarray_ Datasets are self-describing and they are therefore easier to reason about and work with. +Additionally, the regularity of data will make writing MSv4-based software less complex. + +xradio +------ + +`casangi/xradio `_ provides a reference implementation that converts +CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore_ +package. + +Why xarray-ms? +-------------- + +* By developing against an MSv4 xarray view over MSv2 data, + developers can develop applications on well-understood data, + and then seamlessly transition to newer formats. + Data can also be exported to newer formats (principally zarr_) via xarray's + native I/O routines. + However, the xarray view of either format looks the same to the software developer. + +* xarray-ms builds on xarray's + `backend API `_: + Implementing a formal CASA MSv2 backend has a number of automatically benefits: + + * Use of xarray's internal I/O routines such as ``open_dataset`` or ``to_zarr``. + * Use of xarray's `lazy loading mechanism `_. + * Automatic access to any `chunked array types `_ + supported by xarray including, but not limited to dask_. + * Arbitrary chunking along any xarray dimension. + +* xarray-ms uses arcae_, a high-performance backend to CASA Tables implementing + a subset of python-casacore_'s interface. +* Some limited support for irregular MSv2 data via padding. + +Work in Progress +---------------- + +.. warning:: + + xarray-ms is currently under active development and does not yet + have feature parity with xradio_. + +.. warning:: + + The Measurement Set v4 specification is currently under active development. + +Most measures information and many secondary sub-tables are currently missing. +However, the most important parts of the ``MAIN`` tables, +as well as the ``ANTENNA``, ``POLARIZATON`` and ``SPECTRAL_WINDOW`` +sub-tables are implemented and should be sufficient to start +developing software that uses xarray-ms. + +.. _SKAO: https://www.skao.int/ +.. _NRAO: https://public.nrao.edu/ +.. _msv4-spec: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/ +.. _xradio: https://github.com/casangi/xradio +.. _dask-ms: https://github.com/ratt-ru/dask-ms +.. _arcae: https://github.com/ratt-ru/arcae +.. _dask: https://www.dask.org/ +.. _python-casacore: https://github.com/casacore/python-casacore/ +.. _xarray: https://github.com/pydata/xarray +.. _xarray_backend: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html +.. _xarray_lazy: https://docs.xarray.dev/en/latest/internals/internal-design.html#lazy-indexing-classes +.. _xarray_chunked_arrays: https://docs.xarray.dev/en/latest/internals/chunked-arrays.html +.. _zarr: https://zarr.dev/ diff --git a/doc/Makefile b/doc/Makefile new file mode 100644 index 0000000..d0c3cbf --- /dev/null +++ b/doc/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = source +BUILDDIR = build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/doc/make.bat b/doc/make.bat new file mode 100644 index 0000000..747ffb7 --- /dev/null +++ b/doc/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=source +set BUILDDIR=build + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.https://www.sphinx-doc.org/ + exit /b 1 +) + +if "%1" == "" goto help + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/doc/source/api.rst b/doc/source/api.rst new file mode 100644 index 0000000..fc42564 --- /dev/null +++ b/doc/source/api.rst @@ -0,0 +1,8 @@ +API +=== + +Opening Measurement Sets +------------------------ + +.. autoclass:: xarray_ms.backend.msv2.entrypoint.MSv2PartitionEntryPoint + :members: open_dataset, open_datatree diff --git a/doc/source/conf.py b/doc/source/conf.py new file mode 100644 index 0000000..37df17a --- /dev/null +++ b/doc/source/conf.py @@ -0,0 +1,65 @@ +# Configuration file for the Sphinx documentation builder. +# +# For the full list of built-in configuration values, see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Project information ----------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information + +# type: ignore + +project = "xarray-ms" +copyright = "2024, Simon Perkins" +author = "Simon Perkins" +release = "0.2.0" + +# -- General configuration --------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration + +extensions = [ + "sphinx.ext.autodoc", + "sphinx.ext.autosummary", + "sphinx.ext.extlinks", + "sphinx_copybutton", + "sphinx.ext.doctest", + "sphinx.ext.napoleon", + "sphinx.ext.intersphinx", +] + +templates_path = ["_templates"] +exclude_patterns = [] + +# Napoleon settings +napoleon_google_docstring = True +napoleon_numpy_docstring = False +napoleon_include_init_with_doc = False +napoleon_include_private_with_doc = False +napoleon_include_special_with_doc = True +napoleon_use_admonition_for_examples = False +napoleon_use_admonition_for_notes = False +napoleon_use_admonition_for_references = False +napoleon_use_ivar = False +napoleon_use_param = True +napoleon_use_rtype = True +napoleon_preprocess_types = False +napoleon_type_aliases = None +napoleon_attr_annotations = True + +# -- Options for HTML output ------------------------------------------------- +# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output + +html_theme = "pydata_sphinx_theme" +html_static_path = ["_static"] + +extlinks = { + "issue": ("https://github.com/ratt-ru/xarray-ms/issues/%s", "GH#"), + "pr": ("https://github.com/ratt-ru/xarray-ms/pull/%s", "GH#"), +} + +# Example configuration for intersphinx: refer to the Python standard library. +intersphinx_mapping = { + "dask": ("https://dask.pydata.org/en/stable", None), + "numpy": ("https://numpy.org/doc/stable/", None), + "python": ("https://docs.python.org/3/", None), + "xarray": ("https://docs.xarray.dev/en/stable", None), +} diff --git a/doc/source/index.rst b/doc/source/index.rst new file mode 100644 index 0000000..2045570 --- /dev/null +++ b/doc/source/index.rst @@ -0,0 +1,20 @@ +.. xarray-ms documentation master file, created by + sphinx-quickstart on Tue Sep 10 10:36:27 2024. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +xarray-ms documentation +======================= + +Add your content using ``reStructuredText`` syntax. See the +`reStructuredText `_ +documentation for details. + + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + readme + install + api diff --git a/doc/source/install.rst b/doc/source/install.rst new file mode 100644 index 0000000..673b00e --- /dev/null +++ b/doc/source/install.rst @@ -0,0 +1,59 @@ +Installation +============ + +.. code-block:: bash + + $ pip install xarray-ms + +Development +=========== + +Firstly, install Python `Poetry `_. + +.. _poetry: https://python-poetry.org/ + +Then, the following commands will install the required dependencies, +optional testing dependencies, documentation and development dependencies +in a suitable virtual environment: + +.. code-block:: bash + + $ cd /code/arcae + $ poetry env use 3.11 + $ poetry install -E testing --with doc --with dev + $ poetry run pre-commit install + $ poetry shell + +The pre-commit hooks can be manually executed as follows: + +.. code-block:: bash + + $ poetry run pre-commit run -a + + +Test Suite +---------- + +Run the following command within the arcae source code directory to +execute the test suite + +.. code-block:: bash + + $ cd /code/arcae + $ poetry install -E testing --with dev + $ poetry run py.test -s -vvv tests/ + + +Documentation +------------- + +Run the following command within the doc sub-directory to +build the Sphinx documentation + +.. code-block:: bash + + $ cd /code/arcae + $ poetry install --with doc + $ poetry shell + $ cd doc + $ make html diff --git a/doc/source/readme.rst b/doc/source/readme.rst new file mode 100644 index 0000000..a6210d3 --- /dev/null +++ b/doc/source/readme.rst @@ -0,0 +1 @@ +.. include:: ../../README.rst