Skip to content

Instantly share code, notes, and snippets.

@debasishg

debasishg/join.md

Last active Aug 30, 2020
Embed
What would you like to do?
Join based queries using skunk decoders

Join based queries with skunk decoders - folding with a Semigroup

Domain Model

Let's take a simple example that models an employee along with the salaries accrued in a year. Here's the domain model in Scala:

case class Salary(empId: String, month: Int, amount: BigDecimal)
case class Employee(id: String, name: String, salaries: List[Salary])

Data Model

We implement the data model with a join on 2 tables - employees and salaries:

CREATE TABLE IF NOT EXISTS employees (
  id varchar NOT NULL PRIMARY KEY,
  name varchar NOT NULL
);
 
CREATE TABLE IF NOT EXISTS salaries (
  id serial PRIMARY KEY,
  empId varchar NOT NULL references employees(id),
  month smallint NOT NULL,
  amount decimal NOT NULL
);

Clearly to build an Employee from the above 2 tables in the database we would need to do a join and then combine all joined records together for one employee to collect the salaries and make the composition a List[Salary].

Decoders in skunk

Here's how we would build the query using decoders in skunk:

private object EmployeeQueries {
  val empSalaryDecoder = varchar ~ int4 ~ numeric ~ varchar
  val selectAll =
    sql"""
        SELECT e.name,
               s.month,
               s.amount,
               e.id
        FROM employees e, salaries s
        WHERE e.id = s.empId
      """.query(empSalaryDecoder)
  }

User API for the Query in Repository

Now we can use the above machinery to implement the final user API for selecting all employees from the database:

final class EmployeeRepository[M[_]: Sync] (
  sessionPool: Resource[M, Session[M]]
) {
    import EmployeeQueries._

    def query: M[List[Employee]] = {
      sessionPool.use { session =>
        session
          .execute(selectAll)
          .map(_.groupBy(_._2))
          .map { m =>
            m.map {
              case (empId, list) =>
                val employeeSalaryJoinRecords =
                  makeSingleEmployeeSalary(empId, list)
                employeeSalaryJoinRecords
                  .tail
                  .foldLeft(employeeSalaryJoinRecords.head)(
                    Semigroup[Employee].combine
                   )
            }.toList
          }
        }
    }

    private def makeSingleEmployeeSalary(
    
      empId: String,
      empSalaries: List[String ~ Int ~ BigDecimal ~ String]
      
    ): List[Employee] = {
      empSalaries.map {
        case ename ~ month ~ amount ~ eid =>
          Employee(eid, ename, List(Salary(eid, month, amount)))
      }
    }
}

Note what happens in the above implementation:

  • we execute the database query using the selectAll combinator that we defined earlier
  • we then group the records based on employee id. Note this is the reason we picked up id as the last column in the select statement - in that case we can extract id from the twiddle list, which is left-associated chain of pairs. We could not have extracted the id from the list of pairs easily had we selected it as the first column where it naturally occurs in the employees table.
  • now we have a Map which we then iterate over key value pairs - the key being the id and the value being a single joined record with the layout as described in the decoder.
  • the actual trick is to fold over these records using a Semigroup[Employee], which needs to be defined suitably so that salaries belonging to one employee get collected in a list with that specific employee. Here's how we can do that. Note that this Semigroup instance is only for this query (general Employees cannot be combined like this) - hence we will have the Semigroup instance NOT as part of the companion object.
implicit val employeeConcatSemigroup: Semigroup[Employee] = new Semigroup[Employee] {
  def combine(x: Employee, y: Employee): Employee =
    x.copy(salaries = x.salaries ++ y.salaries)
}
@cb372

This comment has been minimized.

Copy link

@cb372 cb372 commented Jul 18, 2020

Maybe using a NonEmptyList is slightly tidier?

    def query: M[List[Employee]] = {
      sessionPool.use { session =>
        session
          .execute(selectAll)
          .map(_.groupBy(_._2))
          .map { m =>
            m.map {
              case (empId, list) =>
                // we know the list is non-empty because it is the result of a groupBy
                val nel = NonEmptyList.fromListUnsafe(list)

                val employeeSalaryJoinRecords =
                  makeSingleEmployeeSalary(empId, nel)
                employeeSalaryJoinRecords.reduce
            }.toList
          }
        }
    }

    private def makeSingleEmployeeSalary(
    
      empId: String,
      empSalaries: NonEmptyList[String ~ Int ~ BigDecimal ~ String]
      
    ): NonEmptyList[Employee] = {
      empSalaries.map {
        case ename ~ month ~ amount ~ eid =>
          Employee(eid, ename, List(Salary(eid, month, amount)))
      }
    }

I haven't checked if this even compiles but hopefully the idea makes sense.

@nigredo-tori

This comment has been minimized.

Copy link

@nigredo-tori nigredo-tori commented Jul 20, 2020

As @cb372 said, a NonEmptyList would be better. I'd further suggest replacing groupBy with groupByNel to avoid fromListUnsafe.

Also, instead of building a custom Semigroup, I'd do something like this:

private def mkEmployee(
  // all the rows for a single employee
  empSalaries: NonEmptyList[String ~ Int ~ BigDecimal ~ String]
): Employee = {
  val salaries = empSalaries.toList.map {
    case _ ~ month ~ amount ~ eid => Salary(eid, month, amount)
  }

  empSalaries.head match {
    case ename ~ _ ~ _ ~ eid =>
      Employee(eid, ename, salaries)
  }
}

def query: M[List[Employee]] =
  sessionPool.use { session =>
    session
      .execute(selectAll)
      .map(
        _.groupByNel(_._2)
          .map {
            case (_@empId, nel) => mkEmployee(nel)
          }.toList
      )
  }
@debasishg

This comment has been minimized.

Copy link
Owner Author

@debasishg debasishg commented Jul 20, 2020

Yes, using a NonEmptyList is definitely a better option (along with groupByNel). Regarding building the Employee instance directly (as suggested by @nigredo-tori) it makes things simpler by eschewing the Semigroup. Then I was thinking how this will scale when we have > 2 tables in the join with cascading parent-child relationships - those pattern matches will be complex. I haven't tried it yet but the Semigroup option may result in more modular code for such cases. What do u think ?

@nigredo-tori

This comment has been minimized.

Copy link

@nigredo-tori nigredo-tori commented Jul 20, 2020

I'd still rather use plain functions rather than write ad-hoc instances. E.g. in your original example you don't actually use the instance - only its combine function.

Regarding complicated join "trees" - you would still need to write the same logic regardless. Semigroup doesn't win you anything here. I think the least messy way to handle that would be to build a structure like Map[Id1, NonEmptyMap[Id2, NonEmptyList[Row]] (which has a Semigroup instance), and build the resulting data type from it similarly to what I have done above (relying on non-emptiness).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment