This is the history of baking for all versions of AP Biology: https://cnx.org/a/content-status/6c322e32-9fb0-4c4d-a1d7-20c95c5c7af2
AP Biology 18.4
was created on 2019-08-01
and it baked successfully.
It was then rebaked a number of times:
- on
2019-09-11 12:43
but failed with a connection error to mathmlcloud.cnx.org, - on
2019-09-25 08:53
but failed again with a connection error to mathmlcloud.cnx.org, - on
2019-09-26 08:33
but failed again with a connection error to mathmlcloud.cnx.org, - on
2019-10-09 09:08
but failed again with a connection error to mathmlcloud.cnx.org, - on
2019-10-23 10:02
succeeded, - on
2019-11-07 11:03
succeeded.
We would not have noticed this except the titles of some of the composite pages
have changed when the book rebaked successfully on 2019-10-23
. This caused a
huge problem with rex redirects and we rolled back a recipe change and rebaked
on 2019-11-07
.
The question is the book was baked successfully when it was first published, why was it rebaked?
We can check when production was updated at https://cnx.org/history.txt
At the time of writing, production was last updated on 2019-11-20
.
These are the dates of the previous deployments:
2019-11-07 09:19:15 CST
2019-10-23 09:48:34 CDT
2019-10-09 08:57:09 CDT
2019-09-26 08:20:46 CDT
2019-09-25 08:39:46 CDT
2019-09-11 09:47:10 CDT
Compare these dates and time with when the book was rebaked. It's pretty clear that the book was rebaked when there's a deployment. It's most likely one of the migrations.
Content on cnx is done in cnxml, it is transformed to raw html using xslt. The raw html is then baked using a recipe to create the final baked html which is displayed on the site.
Most of the transformation code is in
rhaptos.cnxmlutils, we store
the version of rhaptos.cnxmlutils
used to transform the cnxml in the raw
html, for example,
Preface
on AP Biology:
<body ... data-cnxml-to-html-ver="1.7.3">
data-cnxml-to-html-ver
in the body tag contains the version of
rhaptos.cnxmlutil
. So preface was transformed using rhaptos.cnxmlutils
1.7.3.
This is also available in the baked html:
<div data-type="page" id="1dde1b61-63d4-4d4c-9137-d18fc12b56b3" data-cnxml-to-html-ver="1.7.3">
Back in 2017, we decided that whenever rhaptos.cnxmlutils
is updated, all the
(raw) html should be transformed. So there's this
migration
that looks for all the outdated html (content that doesn't have
data-cnxml-to-html-ver="<current-rhaptos-cnxmlutils-version>"
) and
re-transforms them.
This migration is one of the deferred migrations that run after everything is deployed and update the content behind the scene.
Fast forward to 2019, REX wanted CNX to return html instead of xhtml. We added a hack to make certain tags to not self close. The migration ran and all the raw html was transformed correctly. Problem was, the book was not rebaked and the baked html was still outdated.
So I changed this migration to also rebake a book if any of the baked html does
not contain data-cnxml-to-html-ver="<current-rhaptos-cnxmlutils-version>"
.
The first time we deployed this to staging, we immediately noticed a problem. Theoretically it's correct, but unlike the cnxml-to-html transforms, baking takes a lot longer. This migration added hundreds (?) of books to the baking queue and the system just could not finish baking all the books within a reasonable amount of time.
We changed the migration to only rebake "current" books (the version that is redirected to when a version is not in the url) authored by Openstax.
People have speculated that AP Biology was rebaked because the AP biology recipe was updated.
According to what I know and see in the existing migrations, we don't rebake books because recipes updated.
Going back to the cnxml-to-html transforms migration, it looks like at least
one page with baked html in AP Biology 18.4
did not use the latest
rhaptos.cnxmlutils
(as in data-cnxml-to-html-ver
was not 1.7.3
), that
caused the migration to rebake the book.
Rebaking a book with the same recipe should yield the same result so why was it a problem this time? The reason is, this time the AP Biology recipe was updated and so the content actually changed (the titles of some composite pages changed). The migration uses the latest recipe to rebake books instead of the recipe that was used.
Rebake with the same recipe that was used to bake that version of the book instead of using the latest version.