Version 1: https://github.com/cowprotocol/dune-sync
Currently, it is narrowly scoped, but could easily be generalized. For example:
- its specific to the sync of AppHash (and a few other structures) from Local DB to Dune.
- it contains some general content regarding when to truncate/write without duplication using block numbers although it is very specific to the use case.
Version 2 aims to improve on V1 by implementing bidirectional data "syncing" and furthermore separating the fetch-insert logic from the Source-Destination configuration. This would allow anyone to run the service from a plain configuration file and pre-existing binary image.
The goal is to extend this tool to be a general purpose tool that can "sync" from
-
Dune to Local: Archives data from dune via their API, such as historical prices or arbitrary aggrgated data returned from a Dune Query. This is, presumably, intended to save end users on API costs and give users the opportinuty for quick/easy/local access to well structured and readily available data.
-
Local to Dune: Some off-chain server collecting application-specific metadata (e.g. CowSwap's APP Data, Solver Competition, MEV Blocker's Requests or even ERC4337 UserOperation requests) that is relevant to the community for the sake of transparancy, availablility, application insght and overall protocol-data completion.
Version 1 of this product implements the Local to Dune for the specific case of app data, order rewards, batch rewards and internalized token imbalances (cf https://github.com/cowprotocol/dune-sync/blob/ce33b767face38303d6dc834b35603ae1a3081a5/src/models/tables.py#L8-L11) however these are each "distinct" enough to warrent more general purpose data handling.
Version 2 aims to improve on V1 by implementing bidirectional data "syncing" and furthermore separating the fetch-insert logic from the Source-Destination configuration. This would allow anyone to run the service from a plain configuration file and pre-existing binary image capable of ingesting a configuration and "doing the thing".
A sketch configuration file is proposed here providing some insight to the expectation of the final result.
Loosely a configuration would look like:
{Source, Destination} along with whatever credentials are required for read/write access to them (i.e. read source, write dest).
and the program would implement fetch & insert logic for at least DuneAPI and Postgres.
Where, for example
V1 implements fetch for { DuneAPI, Postgres, IPFS } while insert is currently limited to S3-Buckets.
- Result type inference and serialization into insert statements.
- Duplication and Missing data protection.
- [optional] Destination table creation on non-existence.
Query Results from Dune API provide result columns, types and rows resulting from the query.
Type Mapping: Either configured or inferred?
| Trino (Dune) | Postgres |
|----------------------|-------------------|
| timestamp w TZ | TIMESTAMPTZ |
| bigint | BIGINT |
| decimal(38, 0) | NUMERIC(38, 0) |
| boolean | BOOLEAN |
| varbinary | BYTEA |
| varchar | VARCHAR |
| date | DATE |
| array(varbinary) | BYTEA[] |
We may eventually want to support custom types such as:
```sql
CREATE TYPE custom_type AS (
address BYTEA,
storagekeys BYTEA[]
);
CREATE TABLE example_table (
timestamp_column TIMESTAMPTZ,
bigint_column BIGINT,
numeric_column NUMERIC(38, 0),
boolean_column BOOLEAN,
varbinary_column BYTEA,
varchar_column VARCHAR,
custom_array_column custom_type[], -- Assuming custom_type has been created
date_column DATE,
varbinary_array_column BYTEA[],
);
```
Here is a sample result with variously typed columns that can be read from DuneAPI:
```sh
curl -X GET https://api.dune.com/api/v1/query/4108446/results \
-H "x-dune-api-key: $DUNE_API_KEY" \
| jq '{column_names: .result.metadata.column_names, row_count: .result.metadata.row_count}'
```
- User provides Read Access to a DB and a raw SQL statement that can be executed there (this could be a string or file path).
- data retrieved from postgres is transformed into CSV posted to DuneAPI's `upload_csv` endpoint.
The Dune API is currently released as a python and node package.
There is also a rust package that doesn't yet support upload_csv (although this is a minor detail).
- V1 is pure python which gives us a leg to stand on.
- All languages have suitable postgres libraries for the task with varying levels of type safety & convenience.
- We will want to have loose-ish control over the raw SQL statements because of the dynamic nature of the query results.