Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data provider #69

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,29 @@ jobs:
env:
TARGETPLATFORM: linux/amd64

data-provider-codegen-check:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run the codegen script and fail if there's any git diff. Tested this by making a small change in one of the individual config_schema.json, not running the codegen script, and confirming the github action fails.

runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: 3.10.16

- name: Run code generation script
run: |
python ./apps/scripts/update_shared_data_provider_code.py

- name: Check for changes
run: |
if [[ $(git status --porcelain) ]]; then
echo "Generated code is out of sync. Please run the script and commit the changes."
exit 1
fi

test-evm:
runs-on: ubuntu-latest

Expand Down
10 changes: 10 additions & 0 deletions apps/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,3 +17,13 @@ The Stork Network receives signed data feeds from publishers and aggregates them
The easiest way to become a Stork Publisher is to run the Stork Publisher Agent docker container on your infrastructure and send price updates to the Agent through a local websocket. The Stork Publisher Agent will sign your price updates with your private key and send them to the Stork Network.

See [Stork Publisher Agent Docs](docs/publisher_agent.md).

## Data Provider

To publish data into the Stork Network, a Publisher first needs to fetch that data from some data source.

The Stork Data Provider is an app that lets users configure a list of data feeds from various sources which they would like to output. These data streams are output in a format which can be easily received by the Publisher Agent, meaning a user can run the Data Provider alongside the Publisher Agent so that they can source the data, sign it and send it to the Stork Network without writing any code.

It is also an open source framework where users can easily contribute to a collection of data integrations.

See [Stork Data Provider Docs](docs/data_provider.md).
47 changes: 47 additions & 0 deletions apps/cmd/data_provider/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
package main

import (
"log"
"os"
"time"

"github.com/Stork-Oracle/stork-external/apps/lib/data_provider"
"github.com/rs/zerolog"
"github.com/rs/zerolog/pkgerrors"
"github.com/spf13/cobra"
)

var verbose bool

func main() {
rootCmd := &cobra.Command{
Use: "stork-data-provider",
Short: "Stork CLI tool for fetching prices from data sources",
CompletionOptions: cobra.CompletionOptions{
HiddenDefaultCmd: true,
},
PersistentPreRun: func(cmd *cobra.Command, args []string) {
zerolog.TimeFieldFormat = time.RFC3339Nano
zerolog.DurationFieldUnit = time.Nanosecond
zerolog.ErrorStackMarshaler = pkgerrors.MarshalStack

var logLevel zerolog.Level
if verbose {
logLevel = zerolog.DebugLevel
} else {
logLevel = zerolog.InfoLevel
}

// set global log level
zerolog.SetGlobalLevel(logLevel)
},
}
rootCmd.PersistentFlags().BoolVar(&verbose, "verbose", false, "Enable verbose logging")

rootCmd.AddCommand(data_provider.DataProviderCmd)

if err := rootCmd.Execute(); err != nil {
log.Fatal(err)
os.Exit(1)
}
}
52 changes: 52 additions & 0 deletions apps/docs/data_provider.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Data Provider
The Stork Data Provider is a framework to pull arbitrary numeric data across many sources. It can be used on its own, or run alongside the Stork Publisher Agent to sign the data and send it to the Stork Network.

## Adding a New Data Source
If you want to report data from a data source which does not already have an [integration](../lib/data_provider/sources), you can add your own.

To add a new source:
1. Add a [package](../lib/data_provider/sources/random) in the [sources directory](../lib/data_provider/sources) with your data source's name
1. Run `python3 ./apps/scripts/update_shared_data_provider_code/py` to generate some framework code so that the framework is aware of your new source.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numbers here seem to have gotten messed up somehow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they're intentionally all 1 so that markdown interprets them as ordered numbers (might be easier to review the markdown preview for this file). This way we can reorder the steps or add/remove steps without needing to update every number

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh interesting. Is that a common practice? Not sure I've seen that before

1. Add a [data_source.go](../lib/data_provider/sources/random/data_source.go) and implement a DataSource object conforming to the [DataSource interface](../lib/data_provider/types/model.go). This object will contain most of your source-specific logic, but it can leverage tools like the [scheduler](../lib/data_provider/sources/scheduler.go) or [ethereum_utils](../lib/data_provider/sources/ethereum_utils.go) as needed.
1. Add a [data_source_test.go](../lib/data_provider/sources/random/data_source_test.go) to unit test your data source.
1. Add a [config.go](../lib/data_provider/sources/random/config.go) which defines a configuration object corresponding to a single data feed in your source
1. This config object must include a `DataSource` field.
1. Add a [JSON Schema](https://json-schema.org/) [config](../lib/data_provider/configs/resources/source_config_schemas/random.json) in the configs package defining the structure of the configuration object in [config.go](../lib/data_provider/sources/random/config.go)
1. Add a [config test](../lib/data_provider/configs/source_config_tests/random_test.go) to the configs package which tests that a valid Data Provider config json using your source:
1. Passes schema validations
1. Can be deserialized into your configuration object correctly
1. Can be used to extract your DataSourceId using `GetSourceSpecificConfig`
1. Add an [init.go](../lib/data_provider/sources/random/init.go) to your package. This file can be almost identical for every source. This file is responsible for:
1. Defining the DataSourceId variable for this source (which must be the same as the package name)
1. Defining and registering a DataSourceFactory (which will just call to your DataSource constructor)
1. Asserting the source's DataSource and DataSourceFactory satisfy our interfaces
1. Defining a function to deserialize the source's config object
1. Submit a Pull Request so other developers can use your new data source!

## Configuration
The Data Provider can report many feeds, each sourced from any of the data sources implemented in [sources](../lib/data_provider/sources).

You can configure the Data Provider by passing it a [config json file](../../sample.data-provider.config.json) which can be deserialized into a [DataProviderConfig](../lib/data_provider/types/model.go) object.

The `sources` tag is a list of configurations for different feeds, where each feed has a unique `id` and a `config` which can be deserialized into the appropriate [source config](../lib/data_provider/sources/random/config.go).

## Running Local Code
You can test the Data Provider locally by running:
```
go run apps/cmd/data_provider/main.go start -c ./sample.data-provider.config.json --verbose
```
You will most likely want to replace the `./sample.data-provider.config.json` with a more useful config json. Also make sure any required environment variables like API keys are set in your local environment.

Running in `--verbose` mode with no output address set will just log every price update. If you want to actually send updates somewhere (like the websocket server of your local Publisher Agent), you can pass an output address flag:
```
go run apps/cmd/data_provider/main.go start -c ./sample.data-provider.config.json -o ws://localhost:5216/
```

## Running Published Docker Image
If all the data sources you want to use are already merged into Stork's repo, you can just pull the latest published Data Provider docker image and supply your own config:
```
docker run --platform linux/arm64 --pull always --restart always --name data-provider -v ./sample.data-provider.config.json:/etc/config.json -d --log-opt max-size=1g storknetwork/data-provider:v1.0.4 start -c /etc/config.json -o ws://localhost:5216/
```



51 changes: 51 additions & 0 deletions apps/lib/data_provider/command.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
package data_provider

import (
"fmt"
"time"

"github.com/Stork-Oracle/stork-external/apps/lib/data_provider/utils"
"github.com/rs/zerolog"
"github.com/rs/zerolog/pkgerrors"
"github.com/spf13/cobra"
)

var DataProviderCmd = &cobra.Command{
Use: "start",
Short: "Start a process to fetch prices from data sources",
RunE: runDataProvider,
}

// required
const ConfigFilePathFlag = "config-file-path"
const OutputAddressFlag = "output-address"

func init() {
DataProviderCmd.Flags().StringP(ConfigFilePathFlag, "c", "", "the path of your config json file")
DataProviderCmd.Flags().StringP(OutputAddressFlag, "o", "", "a string representing an output address (e.g. ws://localhost:5216/)")

DataProviderCmd.MarkFlagRequired(ConfigFilePathFlag)
}

func runDataProvider(cmd *cobra.Command, args []string) error {
configFilePath, _ := cmd.Flags().GetString(ConfigFilePathFlag)
outputAddress, _ := cmd.Flags().GetString(OutputAddressFlag)

mainLogger := utils.MainLogger()

zerolog.TimeFieldFormat = time.RFC3339Nano
zerolog.DurationFieldUnit = time.Nanosecond
zerolog.ErrorStackMarshaler = pkgerrors.MarshalStack

mainLogger.Info().Msg("Starting data provider")

config, err := LoadConfig(configFilePath)
if err != nil {
return fmt.Errorf("error loading config: %v", err)
}

runner := NewDataProviderRunner(*config, outputAddress)
runner.Run()

return nil
}
18 changes: 18 additions & 0 deletions apps/lib/data_provider/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
package data_provider

import (
"fmt"
"os"

"github.com/Stork-Oracle/stork-external/apps/lib/data_provider/configs"
"github.com/Stork-Oracle/stork-external/apps/lib/data_provider/types"
)

func LoadConfig(configPath string) (*types.DataProviderConfig, error) {
configBytes, err := os.ReadFile(configPath)
if err != nil {
return nil, fmt.Errorf("failed to read config file: %v", err)
}

return configs.LoadConfigFromBytes(configBytes)
}
90 changes: 90 additions & 0 deletions apps/lib/data_provider/configs/config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
package configs

import (
"embed"
"encoding/json"
"fmt"
"path/filepath"

"github.com/Stork-Oracle/stork-external/apps/lib/data_provider/types"
"github.com/xeipuuv/gojsonschema"
)

//go:embed resources
var resourcesFS embed.FS

const configSchemaPath = "resources/data_provider_config.schema.json"

// exposed for testing
func LoadConfigFromBytes(configBytes []byte) (*types.DataProviderConfig, error) {
schema, err := loadSchema(resourcesFS)
if err != nil {
return nil, fmt.Errorf("error loading schema: %v", err)
}

err = validateConfig(configBytes, schema)
if err != nil {
return nil, fmt.Errorf("config file is invalid: %v", err)
}

var config types.DataProviderConfig
if err := json.Unmarshal(configBytes, &config); err != nil {
return nil, fmt.Errorf("failed to unmarshal config file: %v", err)
}
return &config, nil
}

func loadSchema(resourcesFS embed.FS) (*gojsonschema.Schema, error) {
schemaContent, err := resourcesFS.ReadFile(configSchemaPath)
if err != nil {
return nil, fmt.Errorf("failed to read schema file for %s: %v", configSchemaPath, err)
}

loader := gojsonschema.NewSchemaLoader()

// add all source schema configs to schema loader
sourceSchemaDir := "resources/source_config_schemas"
sourceSchemaFiles, err := resourcesFS.ReadDir(sourceSchemaDir)
if err != nil {
return nil, err
}
for _, sourceSchemaFile := range sourceSchemaFiles {
sourceSchemaPath := filepath.Join(sourceSchemaDir, sourceSchemaFile.Name())
schemaBytes, err := resourcesFS.ReadFile(sourceSchemaPath)
if err != nil {
return nil, err
}
schemaFileLoader := gojsonschema.NewBytesLoader(schemaBytes)
err = loader.AddSchema(sourceSchemaPath, schemaFileLoader)
if err != nil {
return nil, err
}
}

topLevelSchemaLoader := gojsonschema.NewStringLoader(string(schemaContent))

schema, err := loader.Compile(topLevelSchemaLoader)
if err != nil {
return nil, fmt.Errorf("failed to parse schema for %s: %v", configSchemaPath, err)
}

return schema, nil
}

func validateConfig(configBytes []byte, schema *gojsonschema.Schema) error {
var dataProviderConfig map[string]interface{}
if err := json.Unmarshal(configBytes, &dataProviderConfig); err != nil {
return fmt.Errorf("failed to parse config JSON: %v", err)
}

configLoader := gojsonschema.NewGoLoader(dataProviderConfig)
result, err := schema.Validate(configLoader)
if err != nil {
return fmt.Errorf("error validating config: %v", err)
}
if !result.Valid() {
return fmt.Errorf("config is invalid: %v", result.Errors())
}

return nil
}
Loading
Loading