Skip to content

Instantly share code, notes, and snippets.

@mvriel
Created November 22, 2011 21:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mvriel/1387152 to your computer and use it in GitHub Desktop.
Save mvriel/1387152 to your computer and use it in GitHub Desktop.
RFC: Adding support for external artifacts for Travis

In order to display the output of several source code analysis / build tools it is necessary to have the ability to store and display their output.

QA Tools can output in two different ways:

  • To STDOUT
  • To a physical disk as file(s)

The latter output type can be split into two sub-types:

  • Raw files, or logs
  • Human-readable output

The first part of this RFC will focus on Raw files where the second section proposes a means to contain human-readable output within the current architecture of Travis.

Raw files

Raw files are files containing machine readable output that can be transformed by Travis into human-readable output. It is assumed that it does not matter whether the output is via STDOUT or to a physical file, as long as there is just one entity to record.

During the processing stage of Travis it should be possible to execute an arbitrary tool, as defined in the project build file, and have its output written to an artifact record in the Travis database. The artifact record should also have a type of artifact filled in matching the output type of the given tool.

For example:

PHPUnit outputs JUnit XML, this output can be stored with a type JUnit.

PHP_CodeSniffer outputs Checkstyle formatted XML, this output can be stored with a type Checkstyle.

Getting Human-readable output

Based on the type of these raw files Travis can transform these into human-readable output using a transformation script, or using (for example) XSLT. This human-readable output can be stored in the database in a second artifact record, or it could not be a by-product of the build process but have the transformation applied ad-hoc when the user attempts to open the page.

The first approach (generating during a post-processing phase of building) has the benefit of faster viewing (some large projects cannot have their human-readable artifact generated ad-hoc due to time constraints) and self-contained builds. Whereas the downside is the increased disk space usage for the database and longer build times.

The second approach reduces the build time and database storage but runs the risk of having performance issues when you want to view the results.

Based on the arguments above I recommend the first approach.

Human-readable output

In addition to the human-readable output mentioned in the previous chapter some tools will only allow human-readable artifacts to be created consisting of multiple files.

In the next section I will propose a solution as to storing these kind of artifacts in the current architecture of Travis. By no means is this a best practice, but it could be used as a simple starting point from where to expand in the future.

Phar

PHP has an executable archive format called Phar. By using a micro-framework inside a phar archive to route all URLs to the static files contained inside this Phar file is it possible to have compressed output of a QA tool in a single file. This in turn makes it possible to store it in an artifact record and have an action in Travis route traffic to this Phar file. The Phar will in turn retrieve all static content.

Conclusion

With probably relative minor adjustments it might be possible to add support for several types of artifacts produced by source code analysis tools. Focus here should be to store single entity output of specific QA tools in the artifact model of Travis and display them with human readable contents.

Appendix A: List of source code analysis tools per language

In this appendix I hope to document as much analysis tools with their output format per language. _Author's note: since my area of expertise is PHP, I hope that other will append to this list.

Please note that documentation generators are explicitly not included as this does not match the primary scope of this RFC.

Ruby

PHP

  • PHPUnit: xUnit
  • PHP_CodeCoverage: Clover
  • PHP_CodeSniffer: checkstyle
  • phpmd: PMD
  • pdepend: JDepend
  • phploc: custom format
  • phpcpd: custom format

Java

  • JUnit: xUnit
  • PMD: PMD
  • JDepend: JDepend
@mvriel
Copy link
Author

mvriel commented Nov 22, 2011

mvriel: I am not sure I understand the phar part
mvriel: does it mean we will have to use phar the data format or host a small php app for that?
antares_: phar is php's jar format
yes, what are the implications of using it for build artifacts?
antares_: each phar file can contain all static files generated by a tool; to make these files accessible to you would need a conduit. A small PHP script inside the Phar acting as router can do that
mvriel: my point is, we don't expect travis be all Ruby all the time but I doubt using php will be worth it
mvriel: what if we use .jar files?
mvriel: we can easily manipulate them using JRuby, it can be hosted on heroku as well (cuts down our maintenance time)
antares_: If you know how to expose the static files contained inside the .jar; then that will work too
mvriel: where will said .jar files be stored?
mvriel: antares_: should be fine with a .war
mvriel: right now all the Web-accessible parts are hosted on heroku and it would be really great to keep it that wya
*way
antares_: my proposal would be in the artifact record, just as you would log a log file
it has been 0 maintenance for us so far
mvriel: ok, so as a binary in the database
mvriel: I like this idea more than anything I could come up with
antares_: the concept is to expose the static contents of an archive; whether that is Jar, war or phar is the same to me
mvriel: so we just need to figure out a way to serve that stuff
mvriel: I understand
loicfrering: actually, it may be a good use case for a small Play! app :)
loicfrering: if using Play will make serving .war contents easier (which I am not sure is the case)
antares_: no, they just use natty internally
mvriel: I like it. We may tweak some details but this is better than anything I can propose right now. My only question is, how will we go about single file tools or stdout?
antares_: s/natty/netty/
loicfrering: yeah I figured :)
loicfrering: so you think you have an idea of how we can easily do that?
loicfrering: alternatively we can use JRuby or Clojure but I think it will largely be all the same
antares_: I'm gonna read the gist and let you know :)
antares_: as described in the RFC I would propose storing the human-readable output in the artifact model (and perhaps before that store the raw artifact in in another record) during a post-processing stage
loicfrering: the problem boils down to serving static files in a .war or .jar from a web app without trying to reimplement half of apache or nginx
mvriel: ok
mvriel: right now logs are stored in a separate table
antares_: I have not fully analyzed the Travis source; the artifacts table looked like a good starting point
mvriel: for several reasons but largely because they are huge compared to everything else and running certain migrations without them stored together with other things
Define huge?
mvriel: so artifacts then will be stored just as another association in another table
svenfuch_: you might want to be a part of this conversation
mvriel: well, some are tens of megabytes
mvriel: compared to other rows this is pretty significant
Single file artifacts can in some cases reach such levels as well for large projects
mvriel: travis database dump is ~ 1 GB right now, this already makes some ALTER TABLEs take longer than we would like
mvriel: I know, the solution for both is to wipe out old logs
mvriel: say, older than 4 months
antares_: perhaps we should store the output in a separate table
it is sad that we cannot keep them forever but having huge database dumps makes it harder for people to import them locally if needed
the artifact table only contains the meta-data and another table the actual output
mvriel: i was thinking about this more, and i think we should store all output on s3
mvriel: hm, I think they can be combined
if we don't care about displaying multi-file artifacts inline we can store them on S3
mvriel: by the way, some Ruby metric tools are Flog, Flay, Heckle, rcov and there is a couple new ones I cannot immediately recall
mvriel: http://ruby.sadi.st/Flog.html, http://ruby.sadi.st/Flay.html, http://ruby.sadi.st/Heckle.html
if we store allll files on s3, raw and post processed, then this would work best in cases where multiple files are uploaded
heckle is mutation testing so I am not sure if it fits
and images as well
just launching a crazy idea here: we could push the result into the repo (right management problem here) in a gh-pages branch and let these static files being served by github
Unfortunately I have to cut this short; otherwise my night's sleep will be really short. I will read up tomorrow morning :)
(ok really crazy idea)
loicfrering: we will quickly run over the repo size limit
mvriel: can you add any notes from this discussion to your gist?
loicfrering: which github has, I think even for private repos

@svenfuchs
Copy link

I prefer aiming for something way more simple and just upload files from a predefined location to either s3 directly or the app. then expose these through the api and put links on the build details page. we might want to display images at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment