Skip to content

Instantly share code, notes, and snippets.

@thedave42
Last active June 22, 2021 23:47
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save thedave42/1bccb81fa7fef6354f9d611b0d4a1b9a to your computer and use it in GitHub Desktop.
Save thedave42/1bccb81fa7fef6354f9d611b0d4a1b9a to your computer and use it in GitHub Desktop.
Using CodeQL with 3rd Party CI

Using CodeQL with 3rd Party CI

Running CodeQL code scans is not limited to GitHub Actions. You can use your existing CI system to run CodeQL scans against your repo. Code scanning results will be viewable in the GitHub repo that the scan was run against. This is accomplished using a tool provided by GitHub called the codeql-runner. The codeql-runner tool can retrieve the latest CodeQL queries for the languages you wish to scan, run those queries, and upload the results back into GitHub.

You can find more information about where to download the codeql-runner and the specific commands to run on our documentation site.

How it works

CodeQL relies on understanding how your code runs in order to perform its analysis. To understand this it creates a relational representation of each source file in your codebase. Interpreted languages like JavaScript and Python work by the CodeQL analysis engine running an extractor directly against your source code that creates the relational representation. Compiled languages work a little differently. CodeQL monitors your build process and extracts information from the compiler about how your code is working. This information is stored in temporary files and is imported for analysis during the analysis step.

Running a code scan using your 3rd party CI will typically involve three steps:

  1. Initialize
  2. Build
  3. Analyze

Initialize

During the initialization step you are configuring the environment used for the code scan. This step downloads the CodeQL bundle from either GitHub.com or GitHub Enterprise Server. During initization CodeQL needs to know what languages you want to scan for, so it can retrieve the latest queries for that language. It also will create some empty database files to hold the data it's going to collect. Compiled languages need some additional configuration - CodeQL needs to know what compilers it should be monitoring. When you run the codeql-runner init command it will generate script files (for Windows, Linux, OS X) and a JSON that contain the necessary environment variables that need to be set in order for CodeQL to extract information during compilation.

When you are scanning a compiled language it is necessary to ensure these environment variable are set for the build step. Some CI systems require that you explicitly persist environment variables between steps. To ensure code scanning is able to run, you will need to ensure that the environment variables defined in the script files or JSON file from this step persist during the build step. We have some example workflow files that show how this is done in Jenkins and Azure Pipelines.

Note: The first time you run the codeql-runner init command on a new machine it may take longer than normal to run. You can avoid this by pre-installing the necessary bundle files.

Build

Interpreted languages get off easily here because there's no need to build them. JavaScript, TypeScript, Python - there is no need to do anything at this step. The codeql-runner analyze step will take care of extracting the necessary information for CodeQL to run its queries.

Compiled languages need to be built in order for CodeQL to perform its analysis. Your standard build command should be run here. The environment variables set during the initialization step will give CodeQL the information it needs to monitor the build process and extract the necessary information from the compiler. During the build, CodeQL will store the information it's extracting in temporary files that will be used during the analyze step.

Note: The codeql-runner provides an autobuild command that will attempt to build your application. It was designed to be a starting point for your build when you're starting a workflow from scratch. It works well when applications line up with standard build practices for the language. If you have a build workflow in your CI system already we recommend you use that workflow and build command as the starting point for your code scanning workflow. However, for new workflows you can use autobuild as a starting point and write a manual build command if it doesn't work.

Analyze

The analyze step is when the CodeQL populates the empty database files with the information extracted by the compilers and/or interpreters, and then runs all of its queries. When it finishes running the queries, the codeql-runner will upload its results as a SARIF file to the GitHub repo that you scanned. The results are then visible from the Security tab of the repo.

Note: The CodeQL analyze step may display a message stating "No code found during the build" if it is unable to extract information during build process. There are several factors that may cause this issue that are addressed in our troubleshooting documentation.

Containerized Builds with 3rd Party CI

It's becoming common practice in software development for build environments to exist in containers. Teams are creating pre-configured containers with the necessary environment to run their builds so that process is easily portable. CodeQL scans can run in these environments as well. It's just a matter of making sure the codeql-runner tool is added to the container and run within the container that is running the build process.

Note: There is a known issue running the codeql-runner on Alpine Linux because of different C libraries used by Alpine. Ubuntu is GitHub's recommended Linux distribution for running the codeql-runner.

To ensure the codeql-runner is part of your build environment, you must download and install the tool as part of the Dockerfile that defines the container. Then the scan is run within the container in the same way described above. The initialize step is run prior to the build, and the script that is generated to set up the necessary environment variables is also run before the build. Then the build is run as it normally would be (if you're scanning an interpreted language, there is no need to do anything at this stage). Finally the analyze step is run. The analyze step will also upload the results from the container to the associated GitHub repo, so the container must be able to connect to GitHub.

You can find additional information running code scanning in a container here. You will need to ensure that your container also has the necessary dependencies for codeql-runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment