Skip to content

Instantly share code, notes, and snippets.

@bobvanluijt
Last active October 25, 2018 04:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bobvanluijt/a6f812589095f7435e4e8a99a7f8fef6 to your computer and use it in GitHub Desktop.
Save bobvanluijt/a6f812589095f7435e4e8a99a7f8fef6 to your computer and use it in GitHub Desktop.
Idea for grouping
###
# The result below shows the sum of population of all cities.
###
{
Local {
Get(where:{
operands: [{
path: ["Things", "City", "population"],
operator: GreaterThan
valueInt: 1000000
}]
},{
group:{
operands: [{
path: ["Things", "City", "population"],
aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG,
}]
}
}) {
Things {
City {
population
}
}
}
}
}
@bobvanluijt
Copy link
Author

@laura-ham
Copy link

@bobvanluijt, can you give an example result? What might the result be if you are asking for 'name' and 'population' of a city, and you group by the sum of the population?

@bobvanluijt
Copy link
Author

Sure @laura-ham,

Assume our DB has:

  • city-a with a population of 100.000
  • city-b with a population of 100.001
  • city-c with a population of 200.000

Option 1

The query below will result in:

  • population = 300.000
{
  Local {
    Get(where:{
      operands: [{
        path: ["Things", "City", "population"],
        operator: GreaterThan
        valueInt: 1000000
      }]
    },{
      group:{
        operands: [{
          path: ["Things", "City", "population"],
          aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG, 
        }]
      }
    }) {
      Things {
        City {
          population
        }
      }
    }
  }
}

Option 2

The query below will result in:

  • name: city-b
  • population: 100.001
  • name: city-c
  • population: 200.000
{
  Local {
    Get(where:{
      operands: [{
        path: ["Things", "City", "population"],
        operator: GreaterThan
        valueInt: 1000000
      }]
    },{
      group:{
        operands: [{
          path: ["Things", "City", "population"],
          aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG, 
        }]
      }
    }) {
      Things {
        City {
          name
          population
        }
      }
    }
  }
}

Option 3

The query below will result in:

  • name = 2
{
  Local {
    Get(where:{
      operands: [{
        path: ["Things", "City", "population"],
        operator: GreaterThan
        valueInt: 1000000
      }]
    },{
      group:{
        operands: [{
          path: ["Things", "City", "population"],
          aggregate: COUNT # other options: SUM, MAX, MIN, SUM, AVG, 
        }]
      }
    }) {
      Things {
        City {
          name
        }
      }
    }
  }
}

Option 4

The query below will result in:

  • population = 2
{
  Local {
    Get(where:{
      operands: [{
        path: ["Things", "City", "population"],
        operator: GreaterThan
        valueInt: 1000000
      }]
    },{
      group:{
        operands: [{
          path: ["Things", "City", "population"],
          aggregate: COUNT # other options: SUM, MAX, MIN, SUM, AVG, 
        }]
      }
    }) {
      Things {
        City {
          population
        }
      }
    }
  }
}

option 5

change in the DB

  • city-a with a population of 100.000
  • city-b with a population of 100.001
  • city-c with a population of 200.000
  • city-c with a population of 100.000 (there a two city-c's)

The query below will result in:

  • name: city-b
  • population: 100.001
  • name: city-c
  • population: 300.000 ⬅️
{
  Local {
    Get(where:{
      operands: [{
        path: ["Things", "City", "population"],
        operator: GreaterThan
        valueInt: 1000000
      }]
    },{
      group:{
        operands: [{
          path: ["Things", "City", "population"],
          aggregate: SUM # other options: COUNT, MAX, MIN, SUM, AVG, 
        }]
      }
    }) {
      Things {
        City {
          name
          population
        }
      }
    }
  }
}

@moretea
Copy link

moretea commented Oct 12, 2018

To me these feel like 'statistical' functions, and not 'Get' functions.
I'd argue that a separate 'Aggregated' or 'Statistics' field under Local would do wonders for keeping the 'Get' function simple.

{
  Local {
    Aggregated() {
      Things { City { .... } }
    }

    // or
    Statistics(...) {
      Things { City { .... } }
    }
  }
}

I image that such a field could be translated to Network queries too.


I find it hard to understand what these different aggregations are supposed to do, based on the GraphQL query.

I believe that we should distinguish between simple counts with conditions, and more complex operations like groupBy.
Each different function/operation should ideally correspond to a field below 'Aggregated' or 'Statistics'.

This will make it very simple for end users to start to do some operations.
Initial impressions count, and a initial expore to simple statistics are very good for the demo-ability of Weaviate.

Simple sum

{
  Local {
    Statistics(where: { ... }) {
      Sum {
        Things {
          City {
            population
          }
        }
      }
    }
  }
}
Output
{ "Local": { "Statistics": { "Sum": { "Things": { "City": { "population": 42 } } } } } }

95% percentile

{
  Local {
    Statistics(where: { ... }) {
      # Compute 95% range of data.
      Percentile(from: 0.25, to: 0.975) {
        Things {
          City {
            population
          }
        }
      }
    }
  }
}
Output
{ "Local": { "Statistics": { "Sum": { "Things": { "City": { "population": {
  "min": 1000,
  "max": 2000
}} } } } } }

Group By

{
  Local {
    Statistics(where: { ... }) {
      GroupBy() {
        Things {
          City {
            country @groupBy(fn: GROUP_BY)
            totalPopulation: population @groupBy(fn: SUM)
            smallestCity: population @groupBy(fn: MIN)
            biggestCity: population @groupBy(fn: MAX)
          }
        }
      }
    }
  }
}

@moretea
Copy link

moretea commented Oct 15, 2018

A extra advantage of doing tis that you'll be able to clearly defend that these are different functions with different pricing than just slurping the data out of weaviate/a network.

@bobvanluijt
Copy link
Author

That indeed sounds reasonable. Not necessarily in favor for one or the other but syntactically it might absolutely be preferable to introduce a Stats{} function.

Naming wise, maybe Aggregate{} would suit better. Any thoughts @laura-ham and @moretea?

It would also be possible to add all aggregation functions as GQL-functions.

{
  Local {
    Aggregate(where: { ... }) { # or Stats...
      Sum{}
      Percentile{}
      Count{}
      Average{} 
      Maximum{}
      Median{}
      Minimum{}
      Mode{}
      GroupBy() {} # Would be used for more complext group by functions.
  }
} 

@moretea would it be fair to say that splitting these aggregate functions (except for GroupBy()) would be relatively easier to implement?

@moretea
Copy link

moretea commented Oct 25, 2018

Maybe they are simple to implement. I would expect so, based on my experience with SQL. However, Gremlin is not as well rounded, and I did not research this yet for Gremlin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment