imrehg/GithubActions_jobs_trigger_overview.png

## README.md

      
    Raw
  

              README.md
            
          
    Using GitHub Actions with the Faculty Platform

GitHub actions enable users to automate their infrastructure and run code check, tests, builds, model training, etc,
on code chance, such as Pull Requests and merges.
The Faculty platform has several task that can be automated in this fashion, and this guide aims to give some initial
guidance how to get started with integrating the two services. Will look into choosing the right type of runner,
installing a self-hosted runner on the platform itself, and showing an example use case of triggering jobs on PRs.
Choosing the right GitHub Runner setup

There are two kinds of "runners" available for GitHub actions:

the public, GitHub-hosted runners
self-hosted runner

There's a trade-off between them, mainly as follows in the case of the Faculty Platform.
Public runners are managed, run, and maintained by GitHub, thus they are always up-to date. On the other hand, they are run in GitHub's own infrastructure, and thus if interacting with the Faculty Platform, there's a need for additional administration to whitelist the relevant GitHub infrastructure. The actions themselves also need more detailed setup to provide the runners the relevant authorization keys to be able issue commands.
Self-hosted runners can live inside the existing Faculty Platform infrastructure, in fact run as "apps" on the platform. This makes it very easy to set up (as apps already have all the required libraries and tokens to interact with the platform itself), while on the other hand requires ongoing maintenance (updating the runner software). It is also recommended that self-hosted runners are used only with private repositories (as in public repositories anyone would be able to control the code run on your runner by simply opening a PR, and would be able to potentially extract secrets from your environment).
Thus we recommend using self-hosted runners with private code repositories, especially when the Faculty Platform deployment is firewalled from the rest of the Internet.
Installing self-hosted GitHub Actions Runners

You can install and use self-hosted runners as well directly on the platform. This only works for the Javascript-type actions (the "docker" types need access to the docker socket, which is not a great idea nor really possible currently).
Let's say one wants to create Github Actions that interact with one of the projects (say running jobs, etc). The easiest is the following setup:

enable third party actions for the given Github repo
on Faculty, run a single server in the project you want to apply the action for (for ease of use)
and using that server go through the the self-hosted runner setup (find it at  at the "Settings/Actions" setting of the repo

These steps go first enabling third party actions:

Start a small server in the project that you want to add a runner to, as the first steps need to
add a few things to your workspace. Follow the setup steps when "Add runner" for Linux:

... in the terminal of the server on the Faculty Platform:

The config.sh script has a bunch of settings, eg. setting the name and not asking any other questions, compared to the default one show in the "Add runner" popup. See more when using config.sh —help
./config.sh --url ... --token ... --name "somename" --unattended

As per the above console screenshot, we have our runner created:

Now to start it, have to run run.sh. That can now be done from a Custom app:

Which should then show:

Example Triggering a job

Here I'll outline one specific use case for GitHub Actions with the platform & its setup. It should give an
inspiration for setting up other situations as well.
In our case we'd like to using self-hosted runners, trigger a specific job on the Faculty platform for each Pull Request received.
The basic setup outlined in the connection diagram below:

Conceptually the following flow happens:

a GitHub runner is deployed as a app inside the project that it is going to interact with. It is listening to
updates from GitHub, for a specific repository.
when GitHub receives a new PR, GitHub will check the defined Action workflows (faculty.yml below), and allows
the relevant (e.g. self-hosted) runners to pick up that change. Since the runner is polling GitHub, it doesn't
need to be accessible from the wider Internet to receive notifications.
the runner will receive the steps to be taken from the workflow, including code checkout, scripts run, etc.
in our case, the workflow uses a Python script (jobrun.py later), which will do the actual job trigger and
monitoring (including setting the status of the GitHub action succeeded or failed based on the job run).
the script triggers a pre-setup job, but through a special script (basic-job-action.sh below). When the job
is run, that task is now not within the runner, but in a server spun up by the Faculty Platform as jobs are
normally done. That extra script can take a commit variable to check out the relevant code on the job server,
using a deployment key (deployment_ssh_key below), and run the actual job with the remaining parameters
(the actual job is somejob.sh below)

We are using the following files in a repository (and examples for these files are attached to this gist):
├── .github
│   └── workflows
│       └── faculty.yml
├── deployment_ssh_key
├── jobs
│   ├── basic-job-action.sh
│   └── somejob.sh
└── workflow
    └── jobrun.py

Workflow definition: workflow.yml

The faculty.yml file sets what actions GitHub Actions will take. For more details, can check the relevant
GitHub documentation
as well. The name of the file is arbitrary, here we've chosen it to be easier to distinguish. See the attached
example
In that file:
name: Faculty
is a name, that will be shown in GitHub, an arbitrary value. The on sections sets when will the action trigger:
on:
  push:
    branches:
      - master
  pull_request:
    branches:
      - master
results in pushes and pull request that are targeting the master branch.
The actual job definition is:
jobs:
  jobrun-selfhosted:
    name: Trigger Job on Self-hosted Runner
    runs-on: self-hosted
    env:
      FACULTY_JOB_NAME: ${{ secrets.FACULTY_JOB_NAME }}
    steps:
      - uses: actions/checkout@v2
      # We already have python/pip/... installed
      - name: Python version
        run: python -V
      - name: Run a job
        run: python workflow/jobrun.py
Here the action selects to run it on a self-hosted runner. The job name (jobrun-selfhosted) is arbitrary,
just has to be unique. The name is a description shown later in GitHub, such as this:

The env section uses GitHub secrets

to pass on information, such as the Job name, but this is optional, and can be hard coded in this case as well.
Here we are adding that name in the given repository's "Settings > Secrets" section:

The last part of the workflow are the steps taken in the action, which includes checking out the code,
logging the Python version used (optional), and running the actual payload, jobrun.py.
The logs from each of the workflow jobs can be expanded in the GitHub interface, and can see for example a view
like this (where the steps are visible, plus here expanded the logs from jobrun.py, which is described in the
next section):

Actions runner payload: jobrun.py

The attached example works with
a job set up as this:

with command as:
bash jobs/basic-job-action.sh "$COMMIT" "$MESSAGE" "$CYCLES"

where the COMMIT is a value for to be used by the code checkout, while the other parameters are passed on to the
actual job somejob.sh, as described later.
jobrun.py then follows the following flow:

loads the relevant environment variables:

project ID, from the default env vars on a Faculty environment,
job name, set by the workflow as shown above,
commit-ish value (here the PR's branch name, most often in practice), set automatically
by GitHub Actions


resolves the job ID from the job's name
sets up the jobs, here it's an array run with two settings:

parameter_value_sets = [
    {"COMMIT": commit, "MESSAGE": "automating", "CYCLES": "10"},
    {"COMMIT": commit, "MESSAGE": "automating", "CYCLES": "15"},
]

triggers the run with the given parameters

run_id = job_client.create_run(project_id, myjob.id, parameter_value_sets)

waits for it to finish

while run_data.state not in COMPLETED_RUN_STATES:
    run_data = job_client.get_run(project_id, myjob.id, run_id)
    sleep(1)

if the run was successful, it returns a success, otherwise (failed, cancelled) it will return a failure in GitHub
and shows the result.

if run_data.state == RunState.COMPLETED:
    print("Job completed successfully.")
else:
    sys.exit(f"Job has not not finished correctly: {run_data.state}")
Faculty job wrapper: basic-job-action.sh

The role of this wrapper is to check out the given state of the repository, and run the actual job with the
settings passed on (see the attached example)
This requires one piece of additional setup, a deployment SSH key, so that the job will be able to pull
the code from the (private) repository.
Start up a server in the given project, and run:
ssh-keygen -t ed25519 -f /project/deployment_ssh_key -N ""

which will generate a new key with an empty passphrase:
(Python3) /project$ ssh-keygen -t ed25519 -f /project/deployment_ssh_key -N ""
Generating public/private ed25519 key pair.
Your identification has been saved in /project/deployment_ssh_key.
Your public key has been saved in /project/deployment_ssh_key.pub.
The key fingerprint is:
SHA256:p0c2enRYpYX7FrDCmTrUlRdU7T3PimIJBrA53DNCVjU faculty@cube-a83ccbb7-0ff9-4eb5-bea5-c4ea797aadcc-554588db99-fr26f
The key's randomart image is:
+--[ED25519 256]--+
|     ...E    +=oo|
|    +    .  =+. .|
|   + =   o +o= ..|
|    * = . *oo ..o|
|     o =S.B... oo|
|        =B o  o o|
|       .ooo. o . |
|         o+ . .  |
|         . .     |
+----[SHA256]-----+
(Python3) /project$ cat deployment_ssh_key.pub
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIML7ONplcN/rlynNZccUDFlapQLpVBKQ/9I56XsKHMZY faculty@cube-a83ccbb7-0ff9-4eb5-bea5
-c4ea797aadcc-554588db99-fr26f

Then copy the contents of deployment_ssh_key.pub and add it as a new deploy key in the "Settings > Deploy Keys"
section of your GitHub repository:

and save it:

The jobs wrapper will then set up to use that key and the given repository to pull the code when a job is triggered:
COMMIT=$1
REMOTE="git@github.com:imrehg/faculty-github-actions.git"
DEPLOYMENT_KEY_PATH="/project/deployment_ssh_key"
# Private repo related setup
export GIT_SSH_COMMAND="/usr/bin/ssh -i ${DEPLOYMENT_KEY_PATH} -o StrictHostKeyChecking=no"

where the REMOTE value needs to be updated to the correct repository's SSH clone link, and if different name
is used for the key file, then that can be changed too.
The next section of the wrapper will clone the code to /code, checks out the given commit, and the rest of the
steps is as your job requires it:

if any Python requirements are needed to be installed, it can do that step
call the actual job script with the remaining command line flags, here:

bash jobs/somejob.sh "${@:2}"

Note this script is running before the given code is checked out, so it's kept simple, and has to be hosted in the workspace before it can be used. Also, any changes to the script will only take effect if they are present in the
workspace.
The actual job: somejob.sh

This part completely depends on your application, here it's a very simple example
just to show running something and using the passed flags correctly. It receives 2 variables (message, and cycles),
and will just idle as many times as many cycles, with the log lines prepended by the message.
In practical uses most likely your job is a Pythin script, and thus in the basic-job-action.sh wrapper the call
would be like that, instead of bash ..., rather python somejob.jy "${@:2}", etc.
The results

When everyting is set up, a new PR will trigger a job like this:

with the correct parameters, such as shown here for the example setup:

and the given run's log is is available (both the wrapper and the actual job's logs)

Further information and Links


GitHub Actions help


## basic-job-action.sh
#!/bin/bash

set -eux

COMMIT=$1
REMOTE="git@github.com:imrehg/faculty-github-actions.git"
DEPLOYMENT_KEY_PATH="/project/deployment_ssh_key"
# Private repo related setup
export GIT_SSH_COMMAND="/usr/bin/ssh -i ${DEPLOYMENT_KEY_PATH} -o StrictHostKeyChecking=no"

sudo rm -fr /code
sudo mkdir /code
sudo chown faculty:faculty /code

git clone "${REMOTE}" /code
cd /code
git checkout "$COMMIT"

if [ -f "requirements.txt" ]; then
  pip install -r requirements.txt
fi
# Run the actual job
bash jobs/somejob.sh "${@:2}"

## faculty.yml
name: Faculty
on:
  # Trigger the workflow on push or pull request,
  # but only for the master branch
  push:
    branches:
      - master
  pull_request:
    branches:
      - master

jobs:
  jobrun-selfhosted:
    name: Trigger Job on Self-hosted Runner
    runs-on: self-hosted
    env:
      FACULTY_JOB_NAME: ${{ secrets.FACULTY_JOB_NAME }}
    steps:
      - uses: actions/checkout@v2
      # We already have python/pip/... installed
      - name: Python version
        run: python -V
      - name: Run a job
        run: python workflow/jobrun.py

## github_action_results.png

      
    Raw
  

              github_action_results.png
            
          
## github_actions_log.png

      
    Raw
  

              github_actions_log.png
            
          
## github_deploy_key1.png

      
    Raw
  

              github_deploy_key1.png
            
          
## github_deploy_key2.png

      
    Raw
  

              github_deploy_key2.png
            
          
## github_secrets.png

      
    Raw
  

              github_secrets.png
            
          
## GithubActions_jobs_trigger_overview.png

      
    Raw
  

              GithubActions_jobs_trigger_overview.png
            
          
## job_definition.png

      
    Raw
  

              job_definition.png
            
          
## job_view1.png

      
    Raw
  

              job_view1.png
            
          
## job_view2.png

      
    Raw
  

              job_view2.png
            
          
## job_view3.png

      
    Raw
  

              job_view3.png
            
          
## jobrun.py
import faculty
import sys
from time import sleep
import os
import random

from faculty.clients.job import RunState

COMPLETED_RUN_STATES = {
    RunState.COMPLETED,
    RunState.FAILED,
    RunState.CANCELLED,
    RunState.ERROR,
}

profile = faculty.config.resolve_profile()
dashboard_url = f"{profile.protocol}://{profile.domain.replace('services.', '')}"

project_id = os.getenv("FACULTY_PROJECT_ID")
jobname = os.getenv("FACULTY_JOB_NAME")
# https://help.github.com/en/actions/configuring-and-managing-workflows/using-environment-variables
# GITHUB_SHA is the most relevant value, but that doesn't seem reliable at the moment, thus we are checking
# things out by reference.
commit = os.getenv("GITHUB_HEAD_REF")

job_client = faculty.client("job")

jobs = job_client.list(project_id)
try:
    myjob = [j for j in jobs if j.metadata.name == jobname][0]
except IndexError:
    sys.exit(
        f"Error: Couldn't find job {jobname} in project {project_id}, please check the name."
    )

# Trigger run
parameter_value_sets = [
    {"COMMIT": commit, "MESSAGE": "automating", "CYCLES": "10"},
    {"COMMIT": commit, "MESSAGE": "automating", "CYCLES": "15"},
]
print(f"Parameters: {parameter_value_sets}")
run_id = job_client.create_run(project_id, myjob.id, parameter_value_sets)
print(f"Run triggered with id {run_id}")
run_data = job_client.get_run(project_id, myjob.id, run_id)
print(f"Run number: {run_data.run_number}")
print("Waiting for job to finish...")
while run_data.state not in COMPLETED_RUN_STATES:
    run_data = job_client.get_run(project_id, myjob.id, run_id)
    sleep(1)

# job_link = join(str(dashboard_url), "project", str(project_id), "jobs", "manage", str(myjob.id), "history")
# print(f"Check results at {job_link}")
if run_data.state == RunState.COMPLETED:
    print("Job completed successfully.")
else:
    sys.exit(f"Job has not not finished correctly: {run_data.state}")

## self-hosted-runner-1.png

      
    Raw
  

              self-hosted-runner-1.png
            
          
## self-hosted-runner-2.png

      
    Raw
  

              self-hosted-runner-2.png
            
          
## self-hosted-runner-3.png

      
    Raw
  

              self-hosted-runner-3.png
            
          
## self-hosted-runner-4.png

      
    Raw
  

              self-hosted-runner-4.png
            
          
## self-hosted-runner-5.png

      
    Raw
  

              self-hosted-runner-5.png
            
          
## self-hosted-runner-6.png

      
    Raw
  

              self-hosted-runner-6.png
            
          
## somejob.sh

      
    Raw
  

              somejob.sh
            
          
            View raw
              (Sorry about that, but we can’t show files that are this big right now.)
	#!/bin/bash

	set -eux

	COMMIT=$1
	REMOTE="git@github.com:imrehg/faculty-github-actions.git"
	DEPLOYMENT_KEY_PATH="/project/deployment_ssh_key"
	# Private repo related setup
	export GIT_SSH_COMMAND="/usr/bin/ssh -i ${DEPLOYMENT_KEY_PATH} -o StrictHostKeyChecking=no"

	sudo rm -fr /code
	sudo mkdir /code
	sudo chown faculty:faculty /code

	git clone "${REMOTE}" /code
	cd /code
	git checkout "$COMMIT"

	if [ -f "requirements.txt" ]; then
	pip install -r requirements.txt
	fi
	# Run the actual job
	bash jobs/somejob.sh "${@:2}"
	name: Faculty
	on:
	# Trigger the workflow on push or pull request,
	# but only for the master branch
	push:
	branches:
	- master
	pull_request:
	branches:
	- master

	jobs:
	jobrun-selfhosted:
	name: Trigger Job on Self-hosted Runner
	runs-on: self-hosted
	env:
	FACULTY_JOB_NAME: ${{ secrets.FACULTY_JOB_NAME }}
	steps:
	- uses: actions/checkout@v2
	# We already have python/pip/... installed
	- name: Python version
	run: python -V
	- name: Run a job
	run: python workflow/jobrun.py
	import faculty
	import sys
	from time import sleep
	import os
	import random

	from faculty.clients.job import RunState

	COMPLETED_RUN_STATES = {
	RunState.COMPLETED,
	RunState.FAILED,
	RunState.CANCELLED,
	RunState.ERROR,
	}

	profile = faculty.config.resolve_profile()
	dashboard_url = f"{profile.protocol}://{profile.domain.replace('services.', '')}"

	project_id = os.getenv("FACULTY_PROJECT_ID")
	jobname = os.getenv("FACULTY_JOB_NAME")
	# https://help.github.com/en/actions/configuring-and-managing-workflows/using-environment-variables
	# GITHUB_SHA is the most relevant value, but that doesn't seem reliable at the moment, thus we are checking
	# things out by reference.
	commit = os.getenv("GITHUB_HEAD_REF")

	job_client = faculty.client("job")

	jobs = job_client.list(project_id)
	try:
	myjob = [j for j in jobs if j.metadata.name == jobname][0]
	except IndexError:
	sys.exit(
	f"Error: Couldn't find job {jobname} in project {project_id}, please check the name."
	)

	# Trigger run
	parameter_value_sets = [
	{"COMMIT": commit, "MESSAGE": "automating", "CYCLES": "10"},
	{"COMMIT": commit, "MESSAGE": "automating", "CYCLES": "15"},
	]
	print(f"Parameters: {parameter_value_sets}")
	run_id = job_client.create_run(project_id, myjob.id, parameter_value_sets)
	print(f"Run triggered with id {run_id}")
	run_data = job_client.get_run(project_id, myjob.id, run_id)
	print(f"Run number: {run_data.run_number}")
	print("Waiting for job to finish...")
	while run_data.state not in COMPLETED_RUN_STATES:
	run_data = job_client.get_run(project_id, myjob.id, run_id)
	sleep(1)

	# job_link = join(str(dashboard_url), "project", str(project_id), "jobs", "manage", str(myjob.id), "history")
	# print(f"Check results at {job_link}")
	if run_data.state == RunState.COMPLETED:
	print("Job completed successfully.")
	else:
	sys.exit(f"Job has not not finished correctly: {run_data.state}")