Skip to content

Commit

Permalink
Update readme an arch diagram
Browse files Browse the repository at this point in the history
  • Loading branch information
amoeba committed Aug 7, 2022
1 parent d8fbec1 commit 102941f
Show file tree
Hide file tree
Showing 3 changed files with 2,431 additions and 49 deletions.
82 changes: 33 additions & 49 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,30 +1,30 @@
# Slinky, the DataONE Graph Store

## Overview
A Linked Open Data interface to [DataONE](https://dataone.org) designed to run on [Kubernetes](https://kubernetes.io).

Service for the DataONE Linked Open Data graph.
## Overview

This repository contains the deployment and code that makes up the
DataONE graph store.
Slinky is essentially just a backround job system hooked up to an RDF triplestore that converts DataONE's holdings into Linked Open Data.

The main infrastructure of the service is composed of four services and is essentially a backround job system ([RQ](https://python-rq.org/)) hooked into an RDF triplestore ([Virtuoso](http://vos.openlinksw.com/owiki/wiki/VOS)) for persistence:
It's made up of five main components:

1. `virtuoso`: Acts as the backend graph store
2. `scheduler`: An [APSchduler](https://apscheduler.readthedocs.org) process that schedules jobs (e.g., update graph with new datasets) on the `worker` at specified intervals
3. `worker`: An [RQ](http://python-rq.org/) worker process to run scheduled jobs
4. `redis`: A [Redis](http://redis.io) instance to act as a persistent store for the `worker` and for saving application state
1. `web`: Provides a public-facing API over Slinky
2. `virtuoso`: Acts as the backend graph store
3. `scheduler`: An [RQScheduler](https://github.com/rq/rq-scheduler) process that enqueues repeated jobs in a cron-like fashion
4. `worker`: One or more [RQ](http://python-rq.org/) processes that runs enqueues jobs
5. `redis`: A [Redis](http://redis.io) instance to act as a persistent store for the `worker` and for saving application state

![slinky architecture diagram showing the components in the list above connected with arrows](./docs/slinky-architecture.png)

As the service runs, the graph store will be continuously updated as datasets are added/updated on [DataOne](https://www.dataone.org/). Another scheduled job exports the statements in the graph store and produces a Turtle dump of all statements at [http://dataone.org/d1lod.ttl](http://dataone.org/d1lod.ttl).
As the service runs, the graph store will be continuously updated as datasets are added/updated on [DataOne](https://www.dataone.org/).

### Contents of This Repository

```
```text
.
├── d1lod # The code that handles the graph updates
├── deploy # The Kubernetes deployment files
├── docs # Detailed documentation beyond this file
├── d1lod # Python package used by services
├── docs # Documentation
├── helm # A Helm chart for deploying on Kubernetes
```

## What's in the graph?
Expand All @@ -38,12 +38,21 @@ However, a [Docker Compose](https://docs.docker.com/compose/) file has been prov

### Deployment on Kubernetes

The deployment is agnostic of startup sequence, but the `make` tool
provided is an easy way to manage the stack.
To make installing Slinky straightforward, we provide a [Helm](https://helm.sh) chart.

To deploy with `make`, navigate to the `deploy/` directory and run `make help` for a complete list of commands.
Pre-requisites are:

Alternatively, each deployment file can be applied using `kubectl apply -f <filename>`.
- A [Kubernetes](https://kubernetes.io) cluster
- [Helm](https://helm.sh)

Install the Chart by running:

```sh
cd helm
helm install $YOUR_NAME .
```

See the [README](./helm/README.md) for more information, including how to customize installation of the Chart to support Ingress and persistent storage.

### Local Deployment with Docker Compose

Expand All @@ -57,12 +66,10 @@ After a few minutes, you should be able to visit http://localhost:9181 to see th

### Virtuoso

#### Deployment

The virtuoso deployment is a custom image that includes a runtime script
for enabling sparql updates. This command is run alongside the Virtuoso
startup script in a different process and completes when the Virtuoso
server comes online. This subsystem fully automated and shouldn't need
server comes online. This subsystem is fully automated and shouldn't need
manual intervention during deployments.

#### Protecting the Virtuoso SPARQLEndpoint
Expand All @@ -76,38 +83,15 @@ should be done for all new production deployments_.

### Scaling Workers

For high workloads, the workers can be scaled with
To scale the number of workers processing datasets beyond the default, run:

```
```sh
kubectl scale --replicas=3 deployments/{dataset-pod-name}
kubectl scale --replicas=3 deployments/{default-pod-name}
```

## Testing

Tests are written using [PyTest](http://pytest.org/latest/). Install [PyTest](http://pytest.org/latest/) with

```
pip install pytest
cd d1lod
pytest
```

As of writing, only tests for the supporting Python package (in directory './d1lod') have been written.

### docker-compose

It's possible to run the unit tests using the docker-compose file
included in the `./d1lod` folder. To run the dockerized unit tests,

1. git clone
2. `cd d1lod`
3. `docker-compose up`
4. docker exec -it <slinky container id> bash
5. `cd /tmp`
6. `pip install pytest`
7. `pytest`
A test suite is provided for the `d1lod` Python package used by workers.
Tests are written using [pytest](http://pytest.org).

To test local changes, build the slinky image using the provided
dockerfile in `./d1lod`. Modify the docker-compose file to use the local
image before starting the stack.
See the [d1lod README](./d1lod/README.md) for more information.
Loading

0 comments on commit 102941f

Please sign in to comment.