Skip to content

Instantly share code, notes, and snippets.

@zzzeek
Last active August 15, 2023 10:00
Show Gist options
  • Star 15 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save zzzeek/5f58d007698c4a0c372edd95ab8e0267 to your computer and use it in GitHub Desktop.
Save zzzeek/5f58d007698c4a0c372edd95ab8e0267 to your computer and use it in GitHub Desktop.
The SQL is just as easy as an ORM challenge
""" "Writing SQL is just as fast as using an ORM" proof of concept
Below is a simple Python object model, where we represent a database that
stores the names of employees at a company, some of whom are "engineers",
and a list of the jobs they do and the programming languages they use.
We'd like to persist the state represented by this Python object model
in a relational database, using our Python objects as a start. Then we'd
like to write SQL queries for rows in this database, and we get back instances
of Python objects exactly as they were created.
This is of course the job that an ORM does. We'd like to show that ORMs
are unnecessary if one knows SQL, that the work is just as fast and easy
writing raw SQL as long as one knows SQL.
The assigment is as follows:
1. create a schema to store the Python object model below relationally.
a. There should be six tables: "employee", "engineer", "employee_job",
"job", "engineer_language", and "language".
b. An object of type Engineer should have a row in the
"employee" and "engineer" tables, which are associated via foreign
key.
c. "employee_job" has foreign key constraints referring to "employee"
and "job".
d. "engineer_language" has foreign key constraints referring to
"engineer" and "language"
2. store the "emp1" and "emp2" object structures in the database.
Primary keys should use Postgresql SERIAL columns using an auto-incrementing
integer that comes from the sequence implicitly associated with a SERIAL
column. It's likely best to use RETURNING with INSERT statements to get
these back.
3. run this SQL query, or one like it, and get back "emp1" and "emp2" as Python
objects:
SELECT emp.id, emp.name FROM employee
4. The "engineer" table will also need to be queried in some way (either
part of the first query, or as some additional query) to determine which row is
an "Engineer" and which is an "Employee"; emp1 should come back as an Engineer
object instance and emp2 should come back as an Employee object instance.
The Python objects returned should also have the appropriate Job and
Language collections present on them as well, which would require additional
SQL to be emitted (again, either as joins or separate statements, whatever.
With SQLAlchemy, these different kinds of schemes can be selected based
on simple configuration settings.).
5. Things that aren't part of the challenge: setting up transaction blocks, rolling
back when errors occur, connection pooling, dealing with tricky datatypes,
ensuring in-memory uniqueness of objects (after all if two parts of my program
update ProgrammingLanguage in the same transaction do they conflict?), optimistic
object versioning, all things that are also not "writing SQL" that the ORM does,
we'll leave those out.
There's a psycopg2 Connection object near the end of this program.
Are you ready to show how easy writing raw SQL is and how there's nothing
else you need you need when programming real applications? Great! Let's go!
"""
class Employee(object):
def __init__(self, name, jobs):
self.name = name
self.jobs = jobs
class Engineer(Employee):
def __init__(self, name, jobs, languages):
super(Engineer, self).__init__(name, jobs)
self.languages = languages
class Job(object):
def __init__(self, name):
self.name = name
class ProgrammingLanguage(object):
def __init__(self, name):
self.name = name
j1, j2 = Job("programming"), Job("meeting")
python, java = ProgrammingLanguage("python"), ProgrammingLanguage("java")
emp1 = Engineer(name="some engineer", languages=[python, java], jobs=[j1, j2])
emp2 = Employee(name="some employee", jobs=[j2])
# here's a start:
import psycopg2
connection = psycopg2.connect(
user="scott", password="tiger", host="localhost", database="test"
)
# go! create the schema, store the data, load it back! efficiently!
# lets read the data now
for emp in employees:
print(emp.name)
print([j.name for j in emp.jobs])
if isinstance(emp, Engineer):
print([l.name for l in emp.languages])
@zzzeek
Copy link
Author

zzzeek commented Jun 12, 2019

Here's the raw SQL version. in some ways, it's able to do a couple of things a little better than current SQLAlchemy releases (EDIT: 1.3 only. In 1.4 we've improved PostgreSQL returning support) do (though you have to write it all out), and in other ways, it's not at all able to capture important behavioral concepts like maintaining an identity map or handling transactions in any reasonable way. It is of course vastly more verbose.

"""plain SQL solution
"""


class Employee(object):
    def __init__(self, name, jobs):
        self.name = name
        self.jobs = jobs


class Engineer(Employee):
    def __init__(self, name, jobs, languages):
        super(Engineer, self).__init__(name, jobs)
        self.languages = languages


class Job(object):
    def __init__(self, name):
        self.name = name


class ProgrammingLanguage(object):
    def __init__(self, name):
        self.name = name


j1, j2 = Job("programming"), Job("meeting")
python, java = ProgrammingLanguage("python"), ProgrammingLanguage("java")
emp1 = Engineer(name="some engineer", languages=[python, java], jobs=[j1, j2])
emp2 = Employee(name="some employee", jobs=[j2])


# here's a start:
import psycopg2

connection = psycopg2.connect(
    user="scott", password="tiger", host="localhost", database="test"
)

# go!

cursor = connection.cursor()
cursor.execute(
    """
    CREATE TABLE employee (
        id SERIAL,
        name VARCHAR,
        PRIMARY KEY (id)
    )
"""
)
cursor.execute(
    """
    CREATE TABLE engineer (
        id INTEGER,
        FOREIGN KEY (id) REFERENCES employee(id),
        PRIMARY KEY (id)
    )
"""
)
cursor.execute(
    """
    CREATE TABLE job (
        id SERIAL,
        name VARCHAR,
        PRIMARY KEY (id)
    )
"""
)
cursor.execute(
    """
    CREATE TABLE language (
        id SERIAL,
        name VARCHAR,
        PRIMARY KEY (id)
    )
"""
)
cursor.execute(
    """
    CREATE TABLE employee_job (
        employee_id INTEGER,
        job_id INTEGER,
        FOREIGN KEY(employee_id) REFERENCES employee(id),
        FOREIGN KEY(job_id) REFERENCES job(id),
        PRIMARY KEY(employee_id, job_id)
    )
"""
)
cursor.execute(
    """
    CREATE TABLE engineer_language (
        engineer_id INTEGER,
        language_id INTEGER,
        FOREIGN KEY(engineer_id) REFERENCES engineer(id),
        FOREIGN KEY(language_id) REFERENCES language(id),
        PRIMARY KEY(engineer_id, language_id)
    )
"""
)


cursor.execute(
    "INSERT INTO job (name) VALUES (%s), (%s) RETURNING id", (j1.name, j2.name)
)
j1.id = cursor.fetchone()[0]
j2.id = cursor.fetchone()[0]


cursor.execute(
    "INSERT INTO language (name) VALUES (%s), (%s) RETURNING id",
    (python.name, java.name),
)
python.id = cursor.fetchone()[0]
java.id = cursor.fetchone()[0]

cursor.execute(
    "INSERT INTO employee (name) VALUES (%s) RETURNING id", (emp1.name,)
)
row = cursor.fetchone()
emp1.id = row[0]

cursor.execute("INSERT INTO engineer (id) VALUES (%s)", (emp1.id,))


cursor.execute(
    "INSERT INTO employee (name) VALUES (%s) RETURNING id", (emp2.name,)
)
row = cursor.fetchone()
emp2.id = row[0]

cursor.executemany(
    "INSERT INTO employee_job (employee_id, job_id) VALUES (%s, %s)",
    [(emp1.id, j1.id), (emp1.id, j2.id), (emp2.id, j2.id)],
)

cursor.executemany(
    "INSERT INTO engineer_language (engineer_id, language_id) VALUES (%s, %s)",
    [(emp1.id, python.id), (emp1.id, java.id)],
)


cursor.execute(
    "SELECT emp.id, emp.name, eng.id FROM employee AS emp "
    "LEFT OUTER JOIN engineer AS eng ON emp.id = eng.id"
)
employees = []
for row in cursor:
    if row[2]:
        emp = Engineer(name=row[1], jobs=[], languages=[])
        emp.id = row[0]
    else:
        emp = Employee(name=row[1], jobs=[])
        emp.id = row[0]
    employees.append(emp)

cursor.execute(
    "SELECT ej.employee_id, ej.job_id, job.name FROM employee_job AS ej "
    "JOIN job ON ej.job_id=job.id WHERE ej.employee_id = ANY (%s) ",
    ([emp.id for emp in employees], )
)
emps_by_id = {emp.id: emp for emp in employees}
for row in cursor:
    emp = emps_by_id[row[0]]
    job = Job(row[2])
    job.id = row[1]
    emp.jobs.append(job)

cursor.execute(
    "SELECT el.engineer_id, el.language_id, language.name "
    "FROM engineer_language AS el JOIN language "
    "ON el.engineer_id=language.id WHERE el.engineer_id = ANY (%s) ",
    ([emp.id for emp in employees if isinstance(emp, Engineer)], ),
)
for row in cursor:
    emp = emps_by_id[row[0]]
    lang = ProgrammingLanguage(row[2])
    lang.id = row[1]
    emp.languages.append(lang)

for emp in employees:
    print(emp.name)
    print([j.name for j in emp.jobs])
    if isinstance(emp, Engineer):
        print([l.name for l in emp.languages])

@zzzeek
Copy link
Author

zzzeek commented Jun 12, 2019

Here's the ORM version. Deep SQLAlchemy users might notice I used a little trick to get joined table inheritance without a discriminator column :)

"""ORM solution.

As folks will point out, there's something in the SQL version that the ORM
version doesn't do yet, which is calls INSERT .. VALUES (), (), ()...RETURNING
to get primary keys back.  SQLAlchemy might want to add this feature

(edit: this feature is now in 1.4,
see https://docs.sqlalchemy.org/en/14/changelog/migration_14.html#change-5263 )

even though it only works on Postgresql (MySQL and SQLite have VALUES (), (), () but don't
support RETURNING).  But anyway that's not the goal on this exercise.  This is
the "SQL is just as quick and easy as an ORM" exercise.

"""

from sqlalchemy import case
from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import ForeignKey
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy import Table
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import column_property
from sqlalchemy.orm import relationship
from sqlalchemy.orm import Session

Base = declarative_base()


class Employee(Base):
    __tablename__ = "employee"

    id = Column(Integer, primary_key=True)
    name = Column(String)

    jobs = relationship("Job", secondary="employee_job", lazy="selectin")

    __mapper_args__ = {
        "polymorphic_identity": "employee",
        "with_polymorphic": "*",
    }


class Engineer(Employee):
    __tablename__ = "engineer"

    id = Column(ForeignKey("employee.id"), primary_key=True)

    languages = relationship(
        "ProgrammingLanguage", secondary="engineer_language", lazy="selectin"
    )

    __mapper_args__ = {"polymorphic_identity": "engineer"}


Employee.is_engineer = column_property(
    case([(Engineer.id.isnot(None), "engineer")], else_="employee")
)
Employee.__mapper__.polymorphic_on = Employee.is_engineer


class Job(Base):
    __tablename__ = "job"

    id = Column(Integer, primary_key=True)
    name = Column(String)

    def __init__(self, name):
        self.name = name


class ProgrammingLanguage(Base):
    __tablename__ = "language"

    id = Column(Integer, primary_key=True)
    name = Column(String)

    def __init__(self, name):
        self.name = name


Table(
    "employee_job",
    Base.metadata,
    Column("employee_id", ForeignKey("employee.id"), primary_key=True),
    Column("job_id", ForeignKey("job.id"), primary_key=True),
)

Table(
    "engineer_language",
    Base.metadata,
    Column("engineer_id", ForeignKey("employee.id"), primary_key=True),
    Column("language_id", ForeignKey("language.id"), primary_key=True),
)


j1, j2 = Job("programming"), Job("meeting")
python, java = ProgrammingLanguage("python"), ProgrammingLanguage("java")
emp1 = Engineer(name="some engineer", languages=[python, java], jobs=[j1, j2])
emp2 = Employee(name="some employee", jobs=[j2])

# here's a start
e = create_engine("postgresql+psycopg2://scott:tiger@localhost/test")

# go!
Base.metadata.create_all(e)

s = Session(e)

s.add_all([emp1, emp2])
s.commit()

for emp in s.query(Employee):
    print(emp.name)
    print([j.name for j in emp.jobs])
    if isinstance(emp, Engineer):
        print([l.name for l in emp.languages])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment