Update readme an arch diagram

DataONEorg · Aug 7, 2022 · 102941f · 102941f
1 parent d8fbec1
commit 102941f
Show file tree

Hide file tree

Showing 3 changed files with 2,431 additions and 49 deletions.
diff --git a/README.md b/README.md
@@ -1,30 +1,30 @@
 # Slinky, the DataONE Graph Store
 
-## Overview
+A Linked Open Data interface to [DataONE](https://dataone.org) designed to run on [Kubernetes](https://kubernetes.io).
 
-Service for the DataONE Linked Open Data graph.
+## Overview
 
-This repository contains the deployment and code that makes up the
-DataONE graph store.
+Slinky is essentially just a backround job system hooked up to an RDF triplestore that converts DataONE's holdings into Linked Open Data.
 
-The main infrastructure of the service is composed of four services and is essentially a backround job system ([RQ](https://python-rq.org/)) hooked into an RDF triplestore ([Virtuoso](http://vos.openlinksw.com/owiki/wiki/VOS)) for persistence:
+It's made up of five main components:
 
-1. `virtuoso`: Acts as the backend graph store
-2. `scheduler`: An [APSchduler](https://apscheduler.readthedocs.org) process that schedules jobs (e.g., update graph with new datasets) on the `worker` at specified intervals
-3. `worker`: An [RQ](http://python-rq.org/) worker process to run scheduled jobs
-4. `redis`: A [Redis](http://redis.io) instance to act as a persistent store for the `worker` and for saving application state
+1. `web`: Provides a public-facing API over Slinky
+2. `virtuoso`: Acts as the backend graph store
+3. `scheduler`: An [RQScheduler](https://github.com/rq/rq-scheduler) process that enqueues repeated jobs in a cron-like fashion
+4. `worker`: One or more [RQ](http://python-rq.org/) processes that runs enqueues jobs
+5. `redis`: A [Redis](http://redis.io) instance to act as a persistent store for the `worker` and for saving application state
 
 ![slinky architecture diagram showing the components in the list above connected with arrows](./docs/slinky-architecture.png)
 
-As the service runs, the graph store will be continuously updated as datasets are added/updated on [DataOne](https://www.dataone.org/). Another scheduled job exports the statements in the graph store and produces a Turtle dump of all statements at [http://dataone.org/d1lod.ttl](http://dataone.org/d1lod.ttl).
+As the service runs, the graph store will be continuously updated as datasets are added/updated on [DataOne](https://www.dataone.org/).
 
 ### Contents of This Repository
 
-```
+```text
 .
-├── d1lod       # The code that handles the graph updates
-├── deploy      # The Kubernetes deployment files
-├── docs        # Detailed documentation beyond this file
+├── d1lod  # Python package used by services
+├── docs   # Documentation
+├── helm   # A Helm chart for deploying on Kubernetes
 ```
 
 ## What's in the graph?
@@ -38,12 +38,21 @@ However, a [Docker Compose](https://docs.docker.com/compose/) file has been prov
 
 ### Deployment on Kubernetes
 
-The deployment is agnostic of startup sequence, but the `make` tool
-provided is an easy way to manage the stack.
+To make installing Slinky straightforward, we provide a [Helm](https://helm.sh) chart.
 
-To deploy with `make`, navigate to the `deploy/` directory and run `make help` for a complete list of commands.
+Pre-requisites are:
 
-Alternatively, each deployment file can be applied using `kubectl apply -f <filename>`.
+- A [Kubernetes](https://kubernetes.io) cluster
+- [Helm](https://helm.sh)
+
+Install the Chart by running:
+
+```sh
+cd helm
+helm install $YOUR_NAME .
+```
+
+See the [README](./helm/README.md) for more information, including how to customize installation of the Chart to support Ingress and persistent storage.
 
 ### Local Deployment with Docker Compose
 
@@ -57,12 +66,10 @@ After a few minutes, you should be able to visit http://localhost:9181 to see th
 
 ### Virtuoso
 
-#### Deployment
-
 The virtuoso deployment is a custom image that includes a runtime script
 for enabling sparql updates. This command is run alongside the Virtuoso
 startup script in a different process and completes when the Virtuoso
-server comes online. This subsystem fully automated and shouldn't need
+server comes online. This subsystem is fully automated and shouldn't need
 manual intervention during deployments.
 
 #### Protecting the Virtuoso SPARQLEndpoint
@@ -76,38 +83,15 @@ should be done for all new production deployments_.
 
 ### Scaling Workers
 
-For high workloads, the workers can be scaled with
+To scale the number of workers processing datasets beyond the default, run:
 
-```
+```sh
 kubectl scale --replicas=3 deployments/{dataset-pod-name}
-kubectl scale --replicas=3 deployments/{default-pod-name}
 ```
 
 ## Testing
 
-Tests are written using [PyTest](http://pytest.org/latest/). Install [PyTest](http://pytest.org/latest/) with
-
-```
-pip install pytest
-cd d1lod
-pytest
-```
-
-As of writing, only tests for the supporting Python package (in directory './d1lod') have been written.
-
-### docker-compose
-
-It's possible to run the unit tests using the docker-compose file
-included in the `./d1lod` folder. To run the dockerized unit tests,
-
-1. git clone
-2. `cd d1lod`
-3. `docker-compose up`
-4. docker exec -it <slinky container id> bash
-5. `cd /tmp`
-6. `pip install pytest`
-7. `pytest`
+A test suite is provided for the `d1lod` Python package used by workers.
+Tests are written using [pytest](http://pytest.org).
 
-To test local changes, build the slinky image using the provided
-dockerfile in `./d1lod`. Modify the docker-compose file to use the local
-image before starting the stack.
+See the [d1lod README](./d1lod/README.md) for more information.