Skip to content

Instantly share code, notes, and snippets.

@karenc
Last active December 2, 2019 21:05
Show Gist options
  • Save karenc/6d3a4e083f0fd925531a8f67e97d3769 to your computer and use it in GitHub Desktop.
Save karenc/6d3a4e083f0fd925531a8f67e97d3769 to your computer and use it in GitHub Desktop.

AP Biology Rebaking on Production

Issue

This is the history of baking for all versions of AP Biology: https://cnx.org/a/content-status/6c322e32-9fb0-4c4d-a1d7-20c95c5c7af2

AP Biology 18.4 was created on 2019-08-01 and it baked successfully.

It was then rebaked a number of times:

  • on 2019-09-11 12:43 but failed with a connection error to mathmlcloud.cnx.org,
  • on 2019-09-25 08:53 but failed again with a connection error to mathmlcloud.cnx.org,
  • on 2019-09-26 08:33 but failed again with a connection error to mathmlcloud.cnx.org,
  • on 2019-10-09 09:08 but failed again with a connection error to mathmlcloud.cnx.org,
  • on 2019-10-23 10:02 succeeded,
  • on 2019-11-07 11:03 succeeded.

We would not have noticed this except the titles of some of the composite pages have changed when the book rebaked successfully on 2019-10-23. This caused a huge problem with rex redirects and we rolled back a recipe change and rebaked on 2019-11-07.

The question is the book was baked successfully when it was first published, why was it rebaked?

Deployment History

We can check when production was updated at https://cnx.org/history.txt

At the time of writing, production was last updated on 2019-11-20.

These are the dates of the previous deployments:

  • 2019-11-07 09:19:15 CST
  • 2019-10-23 09:48:34 CDT
  • 2019-10-09 08:57:09 CDT
  • 2019-09-26 08:20:46 CDT
  • 2019-09-25 08:39:46 CDT
  • 2019-09-11 09:47:10 CDT

Compare these dates and time with when the book was rebaked. It's pretty clear that the book was rebaked when there's a deployment. It's most likely one of the migrations.

CNX Content Transforms

Content on cnx is done in cnxml, it is transformed to raw html using xslt. The raw html is then baked using a recipe to create the final baked html which is displayed on the site.

Most of the transformation code is in rhaptos.cnxmlutils, we store the version of rhaptos.cnxmlutils used to transform the cnxml in the raw html, for example, Preface on AP Biology:

<body ... data-cnxml-to-html-ver="1.7.3">

data-cnxml-to-html-ver in the body tag contains the version of rhaptos.cnxmlutil. So preface was transformed using rhaptos.cnxmlutils 1.7.3.

This is also available in the baked html:

<div data-type="page" id="1dde1b61-63d4-4d4c-9137-d18fc12b56b3" data-cnxml-to-html-ver="1.7.3">

Migration that Re-transforms CNXML to HTML

Back in 2017, we decided that whenever rhaptos.cnxmlutils is updated, all the (raw) html should be transformed. So there's this migration that looks for all the outdated html (content that doesn't have data-cnxml-to-html-ver="<current-rhaptos-cnxmlutils-version>") and re-transforms them.

This migration is one of the deferred migrations that run after everything is deployed and update the content behind the scene.

Fast forward to 2019, REX wanted CNX to return html instead of xhtml. We added a hack to make certain tags to not self close. The migration ran and all the raw html was transformed correctly. Problem was, the book was not rebaked and the baked html was still outdated.

So I changed this migration to also rebake a book if any of the baked html does not contain data-cnxml-to-html-ver="<current-rhaptos-cnxmlutils-version>".

The first time we deployed this to staging, we immediately noticed a problem. Theoretically it's correct, but unlike the cnxml-to-html transforms, baking takes a lot longer. This migration added hundreds (?) of books to the baking queue and the system just could not finish baking all the books within a reasonable amount of time.

We changed the migration to only rebake "current" books (the version that is redirected to when a version is not in the url) authored by Openstax.

Why was AP Biology rebaked?

People have speculated that AP Biology was rebaked because the AP biology recipe was updated.

According to what I know and see in the existing migrations, we don't rebake books because recipes updated.

Going back to the cnxml-to-html transforms migration, it looks like at least one page with baked html in AP Biology 18.4 did not use the latest rhaptos.cnxmlutils (as in data-cnxml-to-html-ver was not 1.7.3), that caused the migration to rebake the book.

Why was it a problem that AP Biology rebaked?

Rebaking a book with the same recipe should yield the same result so why was it a problem this time? The reason is, this time the AP Biology recipe was updated and so the content actually changed (the titles of some composite pages changed). The migration uses the latest recipe to rebake books instead of the recipe that was used.

Actions to take

Rebake with the same recipe that was used to bake that version of the book instead of using the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment