Skip to content

Instantly share code, notes, and snippets.

@Oskang09
Last active December 21, 2019 16:50
Show Gist options
  • Save Oskang09/22b87d9d43c165597503fea124b10951 to your computer and use it in GitHub Desktop.
Save Oskang09/22b87d9d43c165597503fea124b10951 to your computer and use it in GitHub Desktop.
golang dataloader

Description

Utility function for building data loader for loading repeating data in relational. There still many other use case.

Explaination

DataLoader is a generic utility used to abstract request batching and caching. It use to solve N+1 Problem when using with data loading. BatchLoader taking all of the required keys when preload and postload return the data to the stack where we call.

Example Populating Data

I have 3 tables, TeamUser storing team & user data, User storing user data, Team storing team data. Usually with data below, we have to run up at least 9 queries. ( Almost all of the ORM do so )

SELECT * FROM Team ( you will have 1, 2 as team id )

SELECT userId FROM TeamUser WHERE id = 1; ( you will have 1, 2, 3 as userId for team id 1 
SELECT userId FROM TeamUser WHERE id = 2; ( you will have 1, 2, 3 as userId for tema id 2 )

SELECT * FROM User WHERE id = 1;
SELECT * FROM User WHERE id = 2;
SELECT * FROM User WHERE id = 3;

SELECT * FROM User WHERE id = 1;
SELECT * FROM User WHERE id = 2;
SELECT * FROM User WHERE id = 3;

But if we implementing data loader, its only take 4 queries for completing the requests.

SELECT * FROM Team ( you will have 1, 2 as team id )
SELECT userId FROM TeamUser WHERE id IN 1
SELECT userId FROM TeamUser WHERE id IN 2;
SELECT * FROM User WHERE id IN ( 1, 2, 3 )

Example of response data

teams: {
    id: 1,
    name: 'Cool Team',
    teammates: [
      {
            user: {
                id: 2,
                username: 'yuzylam',
                display_name: 'Yuzy Lam',
            },
      },
      {
            user: {
                id: 1,
                username: 'wllee',
                display_name: 'Lee Wang Lin',
            },
      },
      {
            user: {
                id: 3,
                username: 'oska',
                display_name: 'Ng Sze Chen',
            },
      },
    ]
},
teams: {
    id: 2,
    title: 'Another Cool Team',
    teammates: [
      {
            user: {
                id: 2,
                username: 'yuzylam',
                display_name: 'Yuzy Lam',
            },
      },
      {
            user: {
                id: 1,
                username: 'wllee',
                display_name: 'Lee Wang Lin',
            },
      },
      {
            user: {
                id: 3,
                username: 'oska',
                display_name: 'Ng Sze Chen',
            },
      },
    ]
},

Because of when we completing team query & teamuser query to get userId which belongsTo the team, and we pass to loader after done only loader will load. So we able to get the result by the id given, repeating id will not query again will just directly return result that already have.

Library

package util
import (
"context"
"reflect"
"strconv"
"cloud.google.com/go/datastore"
"github.com/graph-gophers/dataloader"
"github.com/ivpusic/grpool"
"github.com/si3nloong/goloquent/db"
)
// 'table' is for Querying to know getting data from which table
// 'dataModel' is for constructing an instance of dataModel
// DataLoader :
func DataLoader(table string, dataModel interface{}) func(...string) ([]interface{}, []error) {
// setup batch function
batch := func(ctx context.Context, keys dataloader.Keys) []*dataloader.Result {
var results []*dataloader.Result
pool := grpool.NewPool(20, 20)
defer pool.Release()
pool.WaitCount(len(keys))
for _, key := range keys {
pool.JobQueue <- func() {
defer pool.JobDone()
// Using `reflect` for constructing new instance
data := reflect.New(reflect.TypeOf(dataModel)).Interface()
// You might change these to your query.
ID, err := strconv.ParseInt(key.String(), 10, 64)
if err != nil {
results = append(
results,
&dataloader.Result{Error: err},
)
}
db.NewQuery().Find(
datastore.IDKey(table, ID, nil),
data,
)
// Append the results
results = append(
results,
&dataloader.Result{Data: data},
)
}
}
pool.WaitAll()
return results
}
loader := dataloader.NewBatchedLoader(batch)
return func(key ...string) ([]interface{}, []error) {
// You can created another function to return by one
// For here support all situation so just used loadMany()
return loader.LoadMany(context.TODO(), dataloader.NewKeysFromStrings(key))()
}
}
type ExampleData struct {
ID string
PopulateId string
PopoulateData Populate
}
type Populate struct {
ID int
Data string
}
func Example() {
result := make([]*ExampleData, 0)
result = append(result, &ExampleData{ ID: 1, PopulateId: 1 })
result = append(result, &ExampleData{ ID: 2, PopulateId: 1 })
result = append(result, &ExampleData{ ID: 3, PopulateId: 2 })
result = append(result, &ExampleData{ ID: 4, PopulateId: 1 })
result = append(result, &ExampleData{ ID: 5, PopulateId: 2 })
loadData := util.DataLoader()
for _, res := range result {
if populate, err := loadData(res.ID); err == nil {
res.PopoulateData = *populate[0].(*Popolate)
}
}
}
// DataLoader :
func DataLoader() func(...int) ([]interface{}, []error) {
batch := func(ctx context.Context, keys dataloader.Keys) []*dataloader.Result {
var results []*dataloader.Result
pool := grpool.NewPool(20, 20)
defer pool.Release()
pool.WaitCount(len(keys))
for _, key := range keys {
pool.JobQueue <- func() {
defer pool.JobDone()
if Key == 1 {
results = append(
results,
&dataloader.Result{Data: Populate{ ID: 1, Data: "One" }},
)
} else if key ===2 {
results = append(
results,
&dataloader.Result{Data: Populate{ ID: 2, Data: "Two" }},
)
}
}
}
pool.WaitAll()
return results
}
loader := dataloader.NewBatchedLoader(batch)
return func(ID ...int) ([]interface{}, []error) {
return loader.LoadMany(context.TODO(), dataloader.NewKeysFromStrings(key))()
}
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment