-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Ransom Williams <[email protected]>
- Loading branch information
Showing
13 changed files
with
1,223 additions
and
62 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -119,28 +119,71 @@ multiple labels can refer to the same bundle via its commit hash. | |
|
||
Current use of Datamon at One Concern with respect to intra-Argo workflow usage relies on the | ||
[kubernetes sidecar](https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/) | ||
pattern where a shared volume is used as the transport layer for application layer | ||
pattern wherein a shared volume (transport layer) ramifies application layer | ||
communication to coordinate between the _main container_, where a data-science program | ||
accesses data provided by Datamon and produces data for Datamon to upload, and the | ||
_sidecar container_, where Datamon provides data for access (via streaming through | ||
main memory directly from GCS) and then, after the main container is done outputting | ||
data to a shared Kubernetes volume, uploads the results of the data-science program | ||
to GCS. Ensuring that, for example, the streaming data is ready for access (sidecar to | ||
main-container messaging) as well as notification that the data-science program has | ||
produced output data to upload (main-container to sidecar messaging), is the responsibility | ||
of a couple of shell scripts that both ship inside the `gcr.io/onec-co/datamon-fuse-sidecar` | ||
container, which is versioned along with | ||
_sidecar container_, where Datamon provides data for access (as hierarchical filesystems, | ||
as SQL databases, etc.). | ||
After the main container's DAG-node-specific data-science program outputs data | ||
(to shared Kubernetes volume, to a PostgreSQL instance in the sidecar, and so on), | ||
the sidecar container uploads the results of the data-science program to GCS. | ||
|
||
Ensuring that data is ready for access (sidecar to main-container messaging) | ||
as well as notification that the data-science program has | ||
produced output data to upload (main-container to sidecar messaging), | ||
is the responsibility of a few shell scripts shipped as part and parcel of the | ||
Docker images that practicably constitute sidecars. | ||
While there's precisely one application container per Argo node, | ||
a Kubernetes container created from an arbitrary image, | ||
sidecars are additional containers in the same Kubernetes pod | ||
-- or Argo DAG node, we can say, approximately synonymously -- | ||
that concert datamon-based data-ferrying setups with the application container. | ||
|
||
_Aside_: as additional kinds of data sources and sinks are added, | ||
we may also refer to "sidecars" as "batteries," and so on as semantic drift | ||
of the shell scripts shears away feature creep in the application binary. | ||
|
||
There are currently two batteries-included® images | ||
|
||
* `gcr.io/onec-co/datamon-fuse-sidecar` | ||
provides hierarchical filesystem access | ||
* `gcr.io/onec-co/datamon-pg-sidecar` | ||
provides PostgreSQL database access | ||
|
||
Both are versioned along with | ||
[github releases](https://github.com/oneconcern/datamon/releases/) | ||
of the desktop binary: to access release `0.4` as listed on the github releases page, | ||
use the tag `v0.4` as in `gcr.io/onec-co/datamon-fuse-sidecar:v0.4` when | ||
writing Dockerfiles or Kubernetes-like YAML that accesses the sidecar container image. | ||
of the | ||
[desktop binary](#os-x-install-guide). | ||
to access recent releases listed on the github releases page, | ||
use the git tag as the Docker image tag: | ||
At time of writing, | ||
[v0.7](https://github.com/oneconcern/datamon/releases/tag/v0.7) | ||
is the latest release tag, and (with some elisions) | ||
```yaml | ||
spec: | ||
... | ||
containers: | ||
- name: datamon-sidecar | ||
- image: gcr.io/onec-co/datamon-fuse-sidecar:v0.7 | ||
... | ||
``` | ||
would be the corresponding Kubernetes YAML to access the sidecar container image. | ||
|
||
_Aside_: historically, and in case it's necessary to roll back to an now-antient | ||
version of the sidecar image, releases were tagged in git without the `v` prefix, | ||
and Docker tags prepended `v` to the git tag. | ||
For instance, `0.4` is listed on the github releases page, while | ||
the tag `v0.4` as in `gcr.io/onec-co/datamon-fuse-sidecar:v0.4` was used when writing | ||
Dockerfiles or Kubernetes-like YAML to accesses the sidecar container image. | ||
|
||
Users need only place the `wrap_application.sh` script located in the root directory | ||
of the sidecar container within the main container. This can be accomplished via | ||
an `initContainer` without duplicating version of the Datamon sidecar image in | ||
both the main application Dockerfile as well as the YAML. When using a block-storage GCS | ||
product, we might've specified a data-science application's Argo DAG node with something | ||
like | ||
of each of the sidecar containers within the main container. | ||
This | ||
[can be accomplished](https://github.com/oneconcern/datamon/blob/master/hack/k8s/example-coord.template.yaml#L15-L24) | ||
via an `initContainer` without duplicating version of the Datamon sidecar | ||
image in both the main application Dockerfile as well as the YAML. | ||
When using a block-storage GCS product, we might've specified a data-science application's | ||
Argo DAG node with something like | ||
|
||
```yaml | ||
command: ["app"] | ||
|
@@ -151,53 +194,98 @@ whereas with `wrap_application.sh` in place, this would be something to the effe | |
|
||
```yaml | ||
command: ["/path/to/wrap_application.sh"] | ||
args: ["-c", "/path/to/coordination_directory", "--", "app", "param1", "param2"] | ||
args: ["-c", "/path/to/coordination_directory", "-b", "fuse", "--", "app", "param1", "param2"] | ||
``` | ||
|
||
That is, `wrap_application.sh` has the following usage | ||
|
||
```shell | ||
wrap_application.sh -c <coordination_directory> -- <application_command> | ||
wrap_application.sh -c <coordination_directory> -b <sidecar_kind> -- <application_command> | ||
``` | ||
|
||
where | ||
* `<coordination_directory>` is an empty directory in a shared volume | ||
(an | ||
[`emptyDir`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) | ||
using memory-backed storage suffices). each coordination directory (not necessarily the volume) | ||
corresponds to a particular DAG node (i.e. Kubernetes pod) and vice-versa. | ||
* `<sidecar_kind>` is in correspondence with the containers specified in the YAML | ||
and may be among | ||
- `fuse` | ||
- `postgres` | ||
* `<application_command>` is the data-science application command exactly as it | ||
would appear without the wrapper script. That is, the wrapper script, relies the | ||
[conventional UNIX syntax](http://zsh.sourceforge.net/Guide/zshguide02.html#l11) | ||
for stating that options to a command are done being declared. | ||
|
||
Meanwhile, each sidecar's datamon-specific batteries have their corresponding usages. | ||
|
||
##### `gcr.io/onec-co/datamon-fuse-sidecar` -- `wrap_datamon.sh` | ||
|
||
Provides filesystem representations (i.e. a folder) of [datamon bundles](#data-modeling). | ||
Since bundles' filelists are serialized filesystem representations, | ||
the `wrap_datamon.sh` interface is tightly coupled to that of the self-documenting | ||
`datamon` binary itself. | ||
|
||
```shell | ||
./wrap_datamon.sh -c <coord_dir> -d <bin_cmd_I> -d <bin_cmd_J> ... | ||
``` | ||
|
||
where `<coordination_directory>` is an empty directory in a shared volume | ||
(an | ||
[`emptyDir`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) | ||
using memory-backed storage suffices). In the case of Argo workflows in particular, | ||
the empty directory (and not necessarily the volume) ought to be specific to a | ||
particular DAG node (i.e. Kubernetes pod). Each node uses a unique directory. | ||
Meanwhile, `<application_command>` is the data-science application command exactly as it | ||
would appear without the wrapper script. | ||
That is, the wrapper script, relies the | ||
[conventional UNIX syntax](http://zsh.sourceforge.net/Guide/zshguide02.html#l11) | ||
for stating that options to a command are done being declared. | ||
|
||
Meanwhile, `wrap_datamon.sh` similarly accepts a single `-c` option to specify the | ||
location of the coordination directory. | ||
Additionally, `wrap_datamon.sh` accepts a `-d` option. The parameters to this option are | ||
among the standard Datamon CLI commands: | ||
|
||
* `config` | ||
* `bundle mount` | ||
* `bundle upload` | ||
* `-c` the same coordination directory passed to `wrap_application.sh` | ||
* `-d` all parameters, exactly as passed to the datamon binary, except as a | ||
single scalar (quoted) parameter, for one of the following commands | ||
- `config` sets user information associated with any bundles created by the node | ||
- `bundle mount` provides sources for data-science applications | ||
- `bundle upload` provides sinks for data-science applications | ||
|
||
Multiple (or none) `bundle mount` and `bundle upload` commands may be specified, | ||
and at most one `config` command is allowed so that an example `wrap_datamon.sh` | ||
YAML might be | ||
|
||
```yaml | ||
command: ["./wrap_datamon.sh"] | ||
args: ["-c", "/tmp/coord", "-d", "config create --name \"Coord\" --email [email protected]", "-d", "bundle upload --path /tmp/upload --message \"result of container coordination demo\" --repo ransom-datamon-test-repo --label coordemo", "-d", "bundle mount --repo ransom-datamon-test-repo --label testlabel --destination /tmp --mount /tmp/mount --stream"] | ||
args: ["-c", "/tmp/coord", "-d", "config create --name \"Coord\" --email [email protected]", "-d", "bundle upload --path /tmp/upload --message \"result of container coordination demo\" --repo ransom-datamon-test-repo --label coordemo", "-d", "bundle mount --repo ransom-datamon-test-repo --label testlabel --mount /tmp/mount --stream"] | ||
``` | ||
|
||
or from the shell | ||
|
||
```shell | ||
./wrap_datamon.sh -c /tmp/coord -d 'config create --name "Coord" --email [email protected]' -d 'bundle upload --path /tmp/upload --message "result of container coordination demo" --repo ransom-datamon-test-repo --label coordemo' -d 'bundle mount --repo ransom-datamon-test-repo --label testlabel --destination /tmp --mount /tmp/mount --stream' | ||
./wrap_datamon.sh -c /tmp/coord -d 'config create --name "Coord" --email [email protected]' -d 'bundle upload --path /tmp/upload --message "result of container coordination demo" --repo ransom-datamon-test-repo --label coordemo' -d 'bundle mount --repo ransom-datamon-test-repo --label testlabel --mount /tmp/mount --stream' | ||
``` | ||
|
||
where, in particular, the `-d` (Datamon) options passed to the shell wrapper are | ||
scalars. | ||
##### `gcr.io/onec-co/datamon-pg-sidecar` -- `wrap_datamon_pg.sh` | ||
|
||
Provides Postgres databases as bundles and vice versa. | ||
Since the datamon binary does not include any Postgres-specific notions, | ||
the UI here is more decoupled than that of `wrap_datamon.sh`. | ||
|
||
```shell | ||
./wrap_datamon.sh -c <coord_dir> -x [db_opts_1] -d [db_opts_2] ... | ||
``` | ||
|
||
where `-c` is as before and each `-x` flag delimits per-database options. | ||
Every database created in the sidecar is uploaded to datamon, and the `db_opts` | ||
that affect the availability of the database from the application container | ||
or the upload of the database to datamon are | ||
|
||
* `-p` IP port used to connect to the database | ||
* `-m` message written to the database's bundle | ||
* `-l` label written to the bundle | ||
* `-r` repo containing bundle | ||
|
||
Additionally, databases may be initialized with databases previously | ||
stored in datamon by this sidecar. The slightly expanded form of a | ||
`[db_opts]` in the above usage is then | ||
|
||
``` | ||
-p <port> [sink_opts] -s [source_opts] | ||
``` | ||
where `<port>` and `[sink_opts]` are as above, and `[source_opts]` are | ||
* `-r` repo containing the source bundle | ||
* `-l` label of the source bundle | ||
* `-b` bundle id | ||
# OS X install guide | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
FROM debian | ||
|
||
RUN apt-get update &&\ | ||
curl -sL https://deb.nodesource.com/setup_10.x | bash &&\ | ||
apt-get install -y \ | ||
curl \ | ||
postgresql \ | ||
ca-certificates \ | ||
gnupg \ | ||
zsh \ | ||
vim \ | ||
&&\ | ||
apt-get autoremove -yqq &&\ | ||
apt-get clean -y &&\ | ||
apt-get autoclean -yqq &&\ | ||
rm -rf \ | ||
/tmp/* \ | ||
/var/tmp/* \ | ||
/var/lib/apt/lists/* \ | ||
/usr/share/doc/* \ | ||
/usr/share/locale/* \ | ||
/var/cache/debconf/*-old | ||
|
||
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add - | ||
|
||
## BEGIN tini | ||
|
||
ENV TINI_VERSION v0.18.0 | ||
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static-amd64 /tmp/tini-static-amd64 | ||
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static-amd64.asc /tmp/tini-static-amd64.asc | ||
|
||
# omitting gpg verification during development/demo | ||
# RUN for key in \ | ||
# 595E85A6B1B4779EA4DAAEC70B588DFF0527A9B7 \ | ||
# ; do \ | ||
# gpg --keyserver hkp://pgp.mit.edu:80 --recv-keys "$key" || \ | ||
# gpg --keyserver hkp://ipv4.pool.sks-keyservers.net --recv-keys "$key" || \ | ||
# gpg --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys "$key" ; \ | ||
# done | ||
|
||
# RUN gpg --verify /tmp/tini-static-amd64.asc | ||
|
||
RUN install -m 0755 /tmp/tini-static-amd64 /bin/tini | ||
|
||
## END tini | ||
|
||
RUN useradd -u 1020 -ms /bin/bash developer | ||
RUN groupadd -g 2000 developers | ||
RUN usermod -g developers developer | ||
RUN chown -R developer:developers /bin/tini | ||
|
||
ADD hack/fuse-demo/mock_application_pg.sh . | ||
RUN chmod a+x mock_application_pg.sh | ||
|
||
USER developer | ||
|
||
RUN touch ~/.zshrc | ||
|
||
ENTRYPOINT [ "/tmp/coord/.scripts/wrap_application.sh"] | ||
CMD [ "./mock_application.sh"] |
Oops, something went wrong.