Skip to content

Instantly share code, notes, and snippets.

@omegaml
Last active June 30, 2023 13:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save omegaml/23ffa0e8a02d0002e0fee138647a19c6 to your computer and use it in GitHub Desktop.
Save omegaml/23ffa0e8a02d0002e0fee138647a19c6 to your computer and use it in GitHub Desktop.
Easily deploy shared code with omega-ml

Using shared modules in omega-ml

Shared modules allow data scientists to implement complex functionality as virtual functions, classes, installable scripts, or packaged apps.

A motivating example

Consider a use case where we want to leverage sqlalchemy's ORM Models
so that we can use them in multiple virtualobj functions.

# models.py
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import declarative_base, relationship

Base = declarative_base()

class User(Base):
    __tablename__ = "user_account"
    id = Column(Integer, primary_key=True)
    name = Column(String(30))
    fullname = Column(String)
    addresses = relationship(
        "Address", back_populates="user", cascade="all, delete-orphan"
    )

    def __repr__(self):
        return f"User(id={self.id!r}, name={self.name!r}, fullname={self.fullname!r})"
    def as_dict(self):
        return {c.name: getattr(self, c.name) for c in self.__table__.columns}

Define the SQL dataset as follows:

om.scripts.put('mssql+pyodbc://user:password/db', 'sqldb')

If the database does not exist yet, you can create the tables as follows:

connection = om.datasets.get('sqldb', raw=True)
Base.metadata.create_all(connection.engine)

Now we can use the User model by creating a session from this dataset:

from sqlalchemy.orm import Session 
connection = om.scripts.get('sqldb', raw=True)
Base.metadata.create_all(connection.engine)
# add a user
with Session(connection.engine) as session:
     user = User(name='Jane', fullname='Walker')
     session.add(user)
     session.commit()
# query users
with Session(connection.engine) as session:
    result = session.execute(select(User))
    users = result.scalars().all()

Unfortunately, when we have several virtualobj functions, we would have to repeat the Model definition in every function. This is because virtualobj functions require all code to be in their local scope, due to the way Python's pickle serialization works.

How to modularize code

Let's look at several practical and powerful options for sharing common code:

  • Option 1: ORM Models as a VirtualObjectHandler dataset
  • Option 2: Use a common VirtualObjectHandler with virtualobj subclasses
  • Option 3: Package code as a script
  • Option 4: Implement a Flask or Dash App
  • Option 5: Install third-party packages

Below we show example code and evaluate the pros and cons of each option at the end of the article and look at the best ways to organize your code.

Option 1: ORM Models as a VirtualObjectHandler dataset

One way for modularizing common code is to implement a VirtualObjectHandler subclass:

# mymodule.py
from omegaml.backends.virtualobj import VirtualObjectHandler
__name__ = '__code__' if __name__ != '__main__' else __name__

class Models(VirtualObjectHandler):
    def models(self, *args, **kwargs):
        from sqlalchemy import Column
        from sqlalchemy import Integer
        from sqlalchemy import String
        from sqlalchemy.orm import declarative_base
        
        Base = declarative_base()

        class User(Base):
           __tablename__ = "user_account"
           id = Column(Integer, primary_key=True)
           name = Column(String(30))
           fullname = Column(String)
           def __repr__(self):
               return f"User(id={self.id!r}, name={self.name!r}, fullname={self.fullname!r})"
           def as_dict(self):
               return {c.name: getattr(self, c.name) for c in self.__table__.columns}

        self.User = User
        return self

    def __call__(self, **args, **kwargs):
        # enable e.g. om.datasets.get('models').User
        return self.models()

We can store this as a dataset or a script:

# always use replace=True (to avoid calling the class)
om.datasets.put(Models, 'orm/models', replace=True)

Now in our virtualobj function we can use the models as before:

@virtualobj
def myfunc(*args, **kwargs):
    import omegaml as om
    from sqlalchemy.orm import Session
    from sqlalchemy import select
    # load the ORM models and db connection
    models = om.datasets.get('orm/models')
    connection = om.datasets.get('sqldb', raw=True)
    # use the ORM models and db connection
    with Session(connection.engine) as session:
        result = session.execute(select(models.User))
        objs = result.all()
    # convert to dict so result can be serialized
    # -- note the custom as_dict() method on the User model
    return [o[0].as_dict() for o in objs]

Option 2: Use a common VirtualObjectHandler

Sometimes we may have virtualobj functions that are the same except for some specific detail. As an example, consider the module loading code from Option 3 below. In this situation it is convenient to implement a common base class, as a custom VirtualObjectHandler, and make each virtualobj function part of a subclass:

The following SharedBaseModel implements the module loading logic:

from omegaml.backends.virtualobj import virtualobj, VirtualObjectHandler
# this line explained in section "Avoid module not found errors", at the end
__name__ = '__code__' if __name__ != '__main__' else __name__

class SharedBaseModel(VirtualObjectHandler):
    def __call__(self, *args, update=False, **kwargs):
        self.load_modules(update=update)
        return super().__call__(*args, **kwargs)

    def load_modules(self, update=False):
        import omegaml as om
        import sys, shutil
        packages = ['shared']
        path = '/tmp/local/packages'
        if update:
            shutil.rmtree(path, ignore_errors=True)
        for pkg in packages:
            mod = om.scripts.get(pkg, install=True, keep=True, localpath=path)
            setattr(self, pkg, mod)

    def predict(self, **kwargs):
        raise NotImplementedError

Now we transform our virtualobj function to a subclass of the SharedBaseModel:

class MyModel(SharedBaseModel):
    def predict(self, **kwargs):
        import omegaml as om
        from sqlalchemy.orm import Session
        from sqlalchemy import select
        # get the models module from the shared package
        models = self.shared.models
        # work with models as before
        connection = om.datasets.get('sqldb', raw=True)
        with Session(connection.engine) as session:
            result = session.execute(select(models.User))
            objs = result.all()
        return [o[0].as_dict() for o in objs]

Deploying this is essentially as before, except now we store the MyModel class instead of the function itself.

om.models.put(MyModel, 'myfunc', replace=True)

Option 3: Package code as script

For larger, more complex code it is considered a best practice to modularize the code and distribute it as an installable "pip" package.

Package Structure

Create a shared module as a normal python package, e.g.

project
+ shared
  + __init__.py
  + models.py
  + common.py 
setup.py

In __init__.py, be sure to import any dependent modules, so they can be accessed as attributes.

# __init__.py
    import shared.models as models
    import shared.common as common

Deployment

We can deploy this code by running

$ om scripts put ./project/shared shared

Once deployed we can load the module in our virtualobj function:

@virtualobj
def myfunc(*args, update=False, **kwargs):
    import omegaml as om
    from sqlalchemy.orm import Session
    from sqlalchemy import select

    # load the shared package
    # -- this is equivalent to Python's import statement
    # -- optionally, we reinstall before using this
    if update:
      from shutil import rmtree
      rmtree('/tmp/local/packages', ignore_errors=True)
    shared = om.scripts.get('shared', keep=True, install=True, localpath='/tmp/local/packages')

    # now we can use the models, and any other shared code, as before
    models = shared.models
    connection = om.datasets.get('sqldb', raw=True)
    with Session(connection.engine) as session:
        result = session.execute(select(models.User))
        objs = result.all()
    return [o[0].as_dict() for o in objs]

Note that when the shared module is updated, we need to force a reload by all runtime workers:

# run this every time you have updated the package
om.runtime.model('myfunc').predict([], update=True)

Sometimes it may be necessary to restart all runtime workers in order to install an update:

# allow a few minutes for the runtime to restart
$ om runtime celery control shutdown

# check periodically to see if the runtime is up by running again:
$ om runtime ping
{'message': 'ping return message', 'time': '2023-06-10T14:44:05.715022', 'args': (), 'kwargs': {}, 'worker': 'celery@worker-system-worker-omdemo'}

# --OR--
$ om runtime status
{'celery@eowyn': [], 'celery@worker-system-worker-omdemo': []}

Option 4: Implement a Flask or Dash App

To implement a UI or Dashboard application it may be most useful to leverage a web application framework like Flask or Dash Plotly application. In practice this is similar to Option 3, packaging code as a script, however we now call this a "app".

The difference between an "app" and a "script" is that your code not only contains a model's logic that is served via a REST API, but it also provides the user interface as a web application. In the backend of the application we can still leverage options 1 - 3 as we see fit.

Package Structure

Create an app module as a normal python package, e.g.

project
+ helloworld
  + static
    . base.css
  + __init__.py
  + app.py
  + routes.py
  + models.py
  + common.py 
setup.py

Deployment

We can directly deploy this app to the omega-ml runtime:

  $ om scripts put ./project/helloworld apps/helloworld
  $ om runtime restart app helloworld

Option 5: Install third-party packages

Note this is for temporary installations and testing only. For permanent installation, use your organization's deployment process for the container base images that run the omega-ml platform.

We can install third-party Python using the following command:

$ om runtime env install <package name>

This requires that the omegaml runtime can access the PyPI Package index. If this is not the case, for example in an internal network that blocks outside URLs, we can still install packages by uploading the to the runtime worker and running a small job to install the packages. To do this follow these steps:

  1. Download the packages as wheel files, and create a tar file

    $ mkdir ./packages && cd packages
    $ pip download <name>
    $ tar -czf mypackages.tgz *whl7
    

    Be sure to download these packages using the same Python version and runtime platform as the omegaml runtime worker. Unless the packages match the runtime's Python version and platform, the installation process will fail

  2. Save the tarfile to om.datasets

    $ om datasets put mypackages.tgz packages/mypackages.tgz
    
  3. Create and run the following notebook and run it on the runtime

    $ om shell
    [] code = """
       import omegaml as om 
       om.datasets.get('packages/mypackages.tgz', local='/tmp/packages/mypackages.tgz')
       !tar -C /tmp/packages -xf /tmp/packages/mypackages.tgz 
       !ls /tmp/packages
       %pip install --no-index --find-links /tmp/packages -U pandas
       """
       om.jobs.create(code, 'install-packages')
       om.runtime.job('install-packages').get()
    
  4. Inside an application deployed using apphub, you can use the same process, by running the install-packages notebook on startup of the application:

    # app.py
    def create_app(...):
      import omegaml as om 
      om.jobs.run('install-packages')
    
    # other modules
    import <package-name>
    

How do I know which option fits my use case?

All options are valid ways. Choose by considering your objectives and the size and complexity of your code:

Objective (1) Virtual dataset or script (2) subclasses (common base class) (3) Packaged Script (4) Packaged App
Small number of shared objects or functions X (X)
Same logic with a few specifics for each function (X) X
Complex or large number of shared code X
Combined logic and user interface X

A key distinction between virtual objects (functions/VirtualObjectHandler subclasses) and packages scripts/apps is that the latter take longer to package, deploy and install. While virtual objects are fast and easy to deploy, scripts and apps provide a workflow that is more amenable to a traditional software engineering process. Apps can provide a runtime performance advantage because all the scripts are loaded at startup time, while virtual objects and packaged scripts are loaded for each execution.

The following table lists the trade-offs for each option.

Trade-Off (1) Virtual dataset or script (2) subclasses (common base class) (3) Packaged Script (4) Packaged App
When loaded each request each request each request App startup
Time to load during request processing 10-100ms *) 10-100ms *) > 10 seconds (first-time); < 50ms (subsequent) already loaded
Accessible from REST API yes yes yes no (app provides its own REST API)
Scalable by adding more runtime instances yes yes yes yes

How to best organize and deploy my code?

omega-ml is designed with simplicity and fast deployment in mind. That is, when you develop a model or a script, it can be instantly deployed and used by the runtime without delay. This results in a fast feedback development model, where you can develop and run your code as a backend, very much the same way as you do in a Jupyter Notebook on your local workstation or laptop.

While this is great for an exploratory and iterative style of working, when we deploy a productive application, we want a stable and repeatable process. omega-ml provides the same seamless experience in this case, adding a repeatable process as deployable artifacts.

There are two phases to writing and deploying code with omega-ml:

  1. Develop your code and run it in the omega-ml runtime

    omega-ml does not dictate a particular way of organizing your code. You may use Jupyter Notebooks, modularize your code in separate .py files, or leverage packaged scripts. The best way depends on the complexity of your application and your working style. When your work is of an exploratory nature, Jupyter Notebooks are a great fit. If you want to modularize your code and create a maintainable code base, applying software engineering best practices like modularity, versioning and CICD, using packaged scripts is the best fit.

  2. Export artifacts and deploy to the target environment

    Once you have tested your code in your omega-ml development environment, you want to export and deploy it to your production system. This works by exporting all artifacts and then importing them again in the target environment. This is achieved by running the om runtime export and om runtime import commands, respectively.

Since these two phases can each consist of multiple steps, we can combine all steps in a deployfile.yaml. This serves both as the repeatable process as well as the documentation of all of our deployments.

A sample deployfile looks like this:

# deployfile.yaml
# -- get a complete example by running: om runtime deploy example 
datasets:
  - name: mydata
    local: data/mydata.csv
models:
    - name: mymodel
      local: package.mymodel
scripts:
    - name: apps/helloworld
      local: ./helloworld

We can run this deployfile by running om runtime deploy. Get more details by running om runtime deploy example and om help runtime.

Avoid "module not found" errors

If the runtime raises a "module not found error", it means that your virtual object contains references to a module the very class or function. This is a result of the way that Python serializes objects. We can avoid this error by adding the following line of code to all modules declarding a base class, a @virtualobj function or a VirtualObjectHandler class.

# first line of code to add to all your modules that define 
# a 
__name__ = '__code__' if __name__ != '__main__' else __name__

For example our mymodule.py code will look like this:

# mymodule.py
__name__ = '__code__' if __name__ != '__main__' else __name__

class BaseModel(VirtualObjectHandler):
      ...

class Model(BaseModel):
      ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment