Skip to content

Commit

Permalink
init PG sidecar
Browse files Browse the repository at this point in the history
Signed-off-by: Ransom Williams <[email protected]>
  • Loading branch information
ransomw1c committed Oct 2, 2019
1 parent cd3b2d6 commit 08787f1
Show file tree
Hide file tree
Showing 13 changed files with 1,223 additions and 62 deletions.
2 changes: 1 addition & 1 deletion .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ jobs:
- run:
name: Run golang linter
command: |
golangci-lint run --disable golint,funlen,godox,whitespace,stylecheck --build-tags fuse_cli --max-same-issues 0 --verbose
golangci-lint run --disable golint,funlen,godox,whitespace,stylecheck,unparam --build-tags fuse_cli --max-same-issues 0 --verbose
- run:
name: Lint shell scripts
command: |
Expand Down
43 changes: 38 additions & 5 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ help:
} \
{ lastLine = $$0 }' $(MAKEFILE_LIST)

.PHONY: build-and-push-fuse-sidecar
## build sidecar container used in Argo workflows
build-and-push-fuse-sidecar: build-datamon-binaries
.PHONY: build-and-push-fuse-sidecar-img
## build FUSE sidecar container used in Argo workflows
build-and-push-fuse-sidecar-img:
@echo 'building fuse sidecar container'
docker build \
--progress plain \
Expand All @@ -60,6 +60,27 @@ build-and-push-fuse-sidecar: build-datamon-binaries
.
docker push gcr.io/onec-co/datamon-fuse-sidecar

.PHONY: build-and-push-fuse-sidecar
## build FUSE sidecar container used in Argo workflows
build-and-push-fuse-sidecar: build-datamon-binaries build-and-push-fuse-sidecar-img

.PHONY: build-and-push-pg-sidecar-img
## build postgres sidecar container used in Argo workflows
build-and-push-pg-sidecar-img:
@echo 'building pg sidecar container'
docker build \
--progress plain \
-t gcr.io/onec-co/datamon-pg-sidecar \
-t gcr.io/onec-co/datamon-pg-sidecar:${GITHUB_USER}-$$(date '+%Y%m%d') \
-t gcr.io/onec-co/datamon-pg-sidecar:$(subst /,_,$(GIT_BRANCH)) \
--ssh default \
-f sidecar-pg.Dockerfile \
.
docker push gcr.io/onec-co/datamon-pg-sidecar

.PHONY: build-and-push-pg-sidecar
## build postgres sidecar container used in Argo workflows
build-and-push-pg-sidecar: build-datamon-binaries build-and-push-pg-sidecar-img

.PHONY: build-and-push-datamover
## build sidecar container used in Argo workflows
Expand Down Expand Up @@ -245,7 +266,7 @@ fuse-demo-ro: fuse-demo-build-shell fuse-demo-build-sidecar
.PHONY: fuse-demo-coord-build-app
## build shell container used in fuse demo
fuse-demo-coord-build-app:
@echo 'building fuse demo container'
@echo 'building fuse demo application container'
docker build \
--progress plain \
-t gcr.io/onec-co/datamon-fuse-demo-coord-app \
Expand All @@ -257,7 +278,7 @@ fuse-demo-coord-build-app:
.PHONY: fuse-demo-coord-build-datamon
## build shell container used in fuse demo
fuse-demo-coord-build-datamon:
@echo 'building fuse demo container'
@echo 'building fuse demo sidecar container'
docker build \
--progress plain \
-t gcr.io/onec-co/datamon-fuse-demo-coord-datamon \
Expand All @@ -266,6 +287,18 @@ fuse-demo-coord-build-datamon:
.
docker push gcr.io/onec-co/datamon-fuse-demo-coord-datamon

.PHONY: pg-demo-coord-build-app
## build shell container used in fuse demo
pg-demo-coord-build-app:
@echo 'building pg demo application container'
docker build \
--progress plain \
-t gcr.io/onec-co/datamon-pg-demo-coord-app \
--ssh default \
-f ./hack/fuse-demo/coord-app-pg.Dockerfile \
.
docker push gcr.io/onec-co/datamon-pg-demo-coord-app

.PHONY: profile-metrics
## Build the metrics collection binary and write output
profile-metrics:
Expand Down
174 changes: 131 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,28 +119,71 @@ multiple labels can refer to the same bundle via its commit hash.

Current use of Datamon at One Concern with respect to intra-Argo workflow usage relies on the
[kubernetes sidecar](https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/)
pattern where a shared volume is used as the transport layer for application layer
pattern wherein a shared volume (transport layer) ramifies application layer
communication to coordinate between the _main container_, where a data-science program
accesses data provided by Datamon and produces data for Datamon to upload, and the
_sidecar container_, where Datamon provides data for access (via streaming through
main memory directly from GCS) and then, after the main container is done outputting
data to a shared Kubernetes volume, uploads the results of the data-science program
to GCS. Ensuring that, for example, the streaming data is ready for access (sidecar to
main-container messaging) as well as notification that the data-science program has
produced output data to upload (main-container to sidecar messaging), is the responsibility
of a couple of shell scripts that both ship inside the `gcr.io/onec-co/datamon-fuse-sidecar`
container, which is versioned along with
_sidecar container_, where Datamon provides data for access (as hierarchical filesystems,
as SQL databases, etc.).
After the main container's DAG-node-specific data-science program outputs data
(to shared Kubernetes volume, to a PostgreSQL instance in the sidecar, and so on),
the sidecar container uploads the results of the data-science program to GCS.

Ensuring that data is ready for access (sidecar to main-container messaging)
as well as notification that the data-science program has
produced output data to upload (main-container to sidecar messaging),
is the responsibility of a few shell scripts shipped as part and parcel of the
Docker images that practicably constitute sidecars.
While there's precisely one application container per Argo node,
a Kubernetes container created from an arbitrary image,
sidecars are additional containers in the same Kubernetes pod
-- or Argo DAG node, we can say, approximately synonymously --
that concert datamon-based data-ferrying setups with the application container.

_Aside_: as additional kinds of data sources and sinks are added,
we may also refer to "sidecars" as "batteries," and so on as semantic drift
of the shell scripts shears away feature creep in the application binary.

There are currently two batteries-included® images

* `gcr.io/onec-co/datamon-fuse-sidecar`
provides hierarchical filesystem access
* `gcr.io/onec-co/datamon-pg-sidecar`
provides PostgreSQL database access

Both are versioned along with
[github releases](https://github.com/oneconcern/datamon/releases/)
of the desktop binary: to access release `0.4` as listed on the github releases page,
use the tag `v0.4` as in `gcr.io/onec-co/datamon-fuse-sidecar:v0.4` when
writing Dockerfiles or Kubernetes-like YAML that accesses the sidecar container image.
of the
[desktop binary](#os-x-install-guide).
to access recent releases listed on the github releases page,
use the git tag as the Docker image tag:
At time of writing,
[v0.7](https://github.com/oneconcern/datamon/releases/tag/v0.7)
is the latest release tag, and (with some elisions)
```yaml
spec:
...
containers:
- name: datamon-sidecar
- image: gcr.io/onec-co/datamon-fuse-sidecar:v0.7
...
```
would be the corresponding Kubernetes YAML to access the sidecar container image.

_Aside_: historically, and in case it's necessary to roll back to an now-antient
version of the sidecar image, releases were tagged in git without the `v` prefix,
and Docker tags prepended `v` to the git tag.
For instance, `0.4` is listed on the github releases page, while
the tag `v0.4` as in `gcr.io/onec-co/datamon-fuse-sidecar:v0.4` was used when writing
Dockerfiles or Kubernetes-like YAML to accesses the sidecar container image.

Users need only place the `wrap_application.sh` script located in the root directory
of the sidecar container within the main container. This can be accomplished via
an `initContainer` without duplicating version of the Datamon sidecar image in
both the main application Dockerfile as well as the YAML. When using a block-storage GCS
product, we might've specified a data-science application's Argo DAG node with something
like
of each of the sidecar containers within the main container.
This
[can be accomplished](https://github.com/oneconcern/datamon/blob/master/hack/k8s/example-coord.template.yaml#L15-L24)
via an `initContainer` without duplicating version of the Datamon sidecar
image in both the main application Dockerfile as well as the YAML.
When using a block-storage GCS product, we might've specified a data-science application's
Argo DAG node with something like

```yaml
command: ["app"]
Expand All @@ -151,53 +194,98 @@ whereas with `wrap_application.sh` in place, this would be something to the effe

```yaml
command: ["/path/to/wrap_application.sh"]
args: ["-c", "/path/to/coordination_directory", "--", "app", "param1", "param2"]
args: ["-c", "/path/to/coordination_directory", "-b", "fuse", "--", "app", "param1", "param2"]
```

That is, `wrap_application.sh` has the following usage

```shell
wrap_application.sh -c <coordination_directory> -- <application_command>
wrap_application.sh -c <coordination_directory> -b <sidecar_kind> -- <application_command>
```

where
* `<coordination_directory>` is an empty directory in a shared volume
(an
[`emptyDir`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir)
using memory-backed storage suffices). each coordination directory (not necessarily the volume)
corresponds to a particular DAG node (i.e. Kubernetes pod) and vice-versa.
* `<sidecar_kind>` is in correspondence with the containers specified in the YAML
and may be among
- `fuse`
- `postgres`
* `<application_command>` is the data-science application command exactly as it
would appear without the wrapper script. That is, the wrapper script, relies the
[conventional UNIX syntax](http://zsh.sourceforge.net/Guide/zshguide02.html#l11)
for stating that options to a command are done being declared.

Meanwhile, each sidecar's datamon-specific batteries have their corresponding usages.

##### `gcr.io/onec-co/datamon-fuse-sidecar` -- `wrap_datamon.sh`

Provides filesystem representations (i.e. a folder) of [datamon bundles](#data-modeling).
Since bundles' filelists are serialized filesystem representations,
the `wrap_datamon.sh` interface is tightly coupled to that of the self-documenting
`datamon` binary itself.

```shell
./wrap_datamon.sh -c <coord_dir> -d <bin_cmd_I> -d <bin_cmd_J> ...
```

where `<coordination_directory>` is an empty directory in a shared volume
(an
[`emptyDir`](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir)
using memory-backed storage suffices). In the case of Argo workflows in particular,
the empty directory (and not necessarily the volume) ought to be specific to a
particular DAG node (i.e. Kubernetes pod). Each node uses a unique directory.
Meanwhile, `<application_command>` is the data-science application command exactly as it
would appear without the wrapper script.
That is, the wrapper script, relies the
[conventional UNIX syntax](http://zsh.sourceforge.net/Guide/zshguide02.html#l11)
for stating that options to a command are done being declared.

Meanwhile, `wrap_datamon.sh` similarly accepts a single `-c` option to specify the
location of the coordination directory.
Additionally, `wrap_datamon.sh` accepts a `-d` option. The parameters to this option are
among the standard Datamon CLI commands:

* `config`
* `bundle mount`
* `bundle upload`
* `-c` the same coordination directory passed to `wrap_application.sh`
* `-d` all parameters, exactly as passed to the datamon binary, except as a
single scalar (quoted) parameter, for one of the following commands
- `config` sets user information associated with any bundles created by the node
- `bundle mount` provides sources for data-science applications
- `bundle upload` provides sinks for data-science applications

Multiple (or none) `bundle mount` and `bundle upload` commands may be specified,
and at most one `config` command is allowed so that an example `wrap_datamon.sh`
YAML might be

```yaml
command: ["./wrap_datamon.sh"]
args: ["-c", "/tmp/coord", "-d", "config create --name \"Coord\" --email [email protected]", "-d", "bundle upload --path /tmp/upload --message \"result of container coordination demo\" --repo ransom-datamon-test-repo --label coordemo", "-d", "bundle mount --repo ransom-datamon-test-repo --label testlabel --destination /tmp --mount /tmp/mount --stream"]
args: ["-c", "/tmp/coord", "-d", "config create --name \"Coord\" --email [email protected]", "-d", "bundle upload --path /tmp/upload --message \"result of container coordination demo\" --repo ransom-datamon-test-repo --label coordemo", "-d", "bundle mount --repo ransom-datamon-test-repo --label testlabel --mount /tmp/mount --stream"]
```

or from the shell

```shell
./wrap_datamon.sh -c /tmp/coord -d 'config create --name "Coord" --email [email protected]' -d 'bundle upload --path /tmp/upload --message "result of container coordination demo" --repo ransom-datamon-test-repo --label coordemo' -d 'bundle mount --repo ransom-datamon-test-repo --label testlabel --destination /tmp --mount /tmp/mount --stream'
./wrap_datamon.sh -c /tmp/coord -d 'config create --name "Coord" --email [email protected]' -d 'bundle upload --path /tmp/upload --message "result of container coordination demo" --repo ransom-datamon-test-repo --label coordemo' -d 'bundle mount --repo ransom-datamon-test-repo --label testlabel --mount /tmp/mount --stream'
```

where, in particular, the `-d` (Datamon) options passed to the shell wrapper are
scalars.
##### `gcr.io/onec-co/datamon-pg-sidecar` -- `wrap_datamon_pg.sh`

Provides Postgres databases as bundles and vice versa.
Since the datamon binary does not include any Postgres-specific notions,
the UI here is more decoupled than that of `wrap_datamon.sh`.

```shell
./wrap_datamon.sh -c <coord_dir> -x [db_opts_1] -d [db_opts_2] ...
```

where `-c` is as before and each `-x` flag delimits per-database options.
Every database created in the sidecar is uploaded to datamon, and the `db_opts`
that affect the availability of the database from the application container
or the upload of the database to datamon are

* `-p` IP port used to connect to the database
* `-m` message written to the database's bundle
* `-l` label written to the bundle
* `-r` repo containing bundle

Additionally, databases may be initialized with databases previously
stored in datamon by this sidecar. The slightly expanded form of a
`[db_opts]` in the above usage is then

```
-p <port> [sink_opts] -s [source_opts]
```
where `<port>` and `[sink_opts]` are as above, and `[source_opts]` are
* `-r` repo containing the source bundle
* `-l` label of the source bundle
* `-b` bundle id
# OS X install guide
Expand Down
60 changes: 60 additions & 0 deletions hack/fuse-demo/coord-app-pg.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
FROM debian

RUN apt-get update &&\
curl -sL https://deb.nodesource.com/setup_10.x | bash &&\
apt-get install -y \
curl \
postgresql \
ca-certificates \
gnupg \
zsh \
vim \
&&\
apt-get autoremove -yqq &&\
apt-get clean -y &&\
apt-get autoclean -yqq &&\
rm -rf \
/tmp/* \
/var/tmp/* \
/var/lib/apt/lists/* \
/usr/share/doc/* \
/usr/share/locale/* \
/var/cache/debconf/*-old

RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -

## BEGIN tini

ENV TINI_VERSION v0.18.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static-amd64 /tmp/tini-static-amd64
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini-static-amd64.asc /tmp/tini-static-amd64.asc

# omitting gpg verification during development/demo
# RUN for key in \
# 595E85A6B1B4779EA4DAAEC70B588DFF0527A9B7 \
# ; do \
# gpg --keyserver hkp://pgp.mit.edu:80 --recv-keys "$key" || \
# gpg --keyserver hkp://ipv4.pool.sks-keyservers.net --recv-keys "$key" || \
# gpg --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys "$key" ; \
# done

# RUN gpg --verify /tmp/tini-static-amd64.asc

RUN install -m 0755 /tmp/tini-static-amd64 /bin/tini

## END tini

RUN useradd -u 1020 -ms /bin/bash developer
RUN groupadd -g 2000 developers
RUN usermod -g developers developer
RUN chown -R developer:developers /bin/tini

ADD hack/fuse-demo/mock_application_pg.sh .
RUN chmod a+x mock_application_pg.sh

USER developer

RUN touch ~/.zshrc

ENTRYPOINT [ "/tmp/coord/.scripts/wrap_application.sh"]
CMD [ "./mock_application.sh"]
Loading

0 comments on commit 08787f1

Please sign in to comment.