Skip to content

Instantly share code, notes, and snippets.

@karenc
Last active November 3, 2019 19:29
Show Gist options
  • Save karenc/f4aafcf168b8e8d096f79505578be38b to your computer and use it in GitHub Desktop.
Save karenc/f4aafcf168b8e8d096f79505578be38b to your computer and use it in GitHub Desktop.

What happened?

On 2019-11-01, We discovered that slugs have changed in the AP Biology book when looking at the rex redirects PR: openstax/cnx-deploy#1213

For example, it changes a rex url from

/books/biology-ap-courses/pages/1-introduction

to

/books/biology-ap-courses/pages/chapter-1-introduction,

or

/books/biology-ap-courses/pages/a-the-periodic-table-of-elements

to

/books/biology-ap-courses/pages/appendix-a-the-periodic-table-of-elements

Since we redirect cnx users to rex using 301 redirects, it means that if the user has been redirected before, they will end up with the old url which doesn't exist on rex.

Tom told me that the problem was between versions 18.3 and 18.4.

Investigation

Look for content changes

I wanted to first check the content changes between 18.3 and 18.4, so I wrote this script called compare_books.py:

#!/usr/bin/env python3

import json
import unittest
from urllib.request import urlopen


class TestContentChanges(unittest.TestCase):
    maxDiff = None

    def test(self):
        ARCHIVE_API = 'https://archive.cnx.org/contents'
        BOOK_ID = '6c322e32-9fb0-4c4d-a1d7-20c95c5c7af2'
        BOOK_VERSION_1 = '18.3'
        BOOK_VERSION_2 = '18.4'

        trees = []

        for version in (BOOK_VERSION_1, BOOK_VERSION_2):
            resp = urlopen(f'{ARCHIVE_API}/{BOOK_ID}@{version}?as_collated=false')
            trees.append(json.loads(resp.read().decode('utf-8')))

        self.assertEqual(*trees)


if __name__ == '__main__':
    unittest.main()

(I didn't know how to diff dictionaries in python other than using the unittest package so that's what I used.)

I requested the raw book trees of the two versions using the archive API so that I can compare just the content without baking / the recipe.

Running the script shows:

  {'abstract': '<div xmlns="http://www.w3.org/1999/xhtml" '
               'xmlns:c="http://cnx.rice.edu/cnxml" '
               'xmlns:md="http://cnx.rice.edu/mdml" '
               'xmlns:qml="http://cnx.rice.edu/qml/1.0" '
               'xmlns:mod="http://cnx.rice.edu/#moduleIds" '
               'xmlns:bib="http://bibtexml.sf.net/" '
               'xmlns:data="http://www.w3.org/TR/html5/dom.html#custom-data-attribute" '
               'data-cnxml-to-html-ver="1.3.2"/>',
   'authors': [{'firstname': 'OpenStax',
                'fullname': 'OpenStax Biology for AP Courses',
                'id': 'cnxapbio',
                'suffix': None,
                'surname': 'Biology for AP Courses',
                'title': ''}],
-  'baked': '2019-07-22T11:49:51.87133-05:00',
?                  -  ^  ^ ^^ ^   ^^ ^

+  'baked': '2019-10-23T10:05:21.825349-05:00',
?                 +   ^  ^ ^^ ^   ^^ ^^

   'buyLink': None,
   'canon_url': 'https://cnx.org/contents/6c322e32-9fb0-4c4d-a1d7-20c95c5c7af2/Biology-for-AP%C2%AE-Courses',
   'canonical': None,
   'collated': False,
   'created': '2016-09-30T08:26:59Z',
   'doctype': '',
   'googleAnalytics': ['UA-30227798-20', 'UA-101537094-1'],
+  'history': [{'changes': 'period added',
+               'publisher': {'firstname': '',
+                             'fullname': 'OpenStax',
+                             'id': 'OpenStaxCollege',
+                             'suffix': '',
+                             'surname': 'OpenStax College',
+                             'title': ''},
+               'revised': '2019-08-01T18:41:41Z',
+               'version': '18.4'},
-  'history': [{'changes': 'removed duplicated text -TM',
?  ^^^^^^^^^^^^

+              {'changes': 'removed duplicated text -TM',
?  ^^^^^^^^^^^^
 ...
   'resources': [{'filename': 'collection.xml',
-                 'id': '1ae48be89902b8253b95af1b3f6c41d9145f0477',
+                 'id': '8853ce2839ac0fee79067f1085b72693cb92be99',
                  'media_type': 'text/xml'}],
-  'revised': '2019-07-22T16:49:39Z',
?                    ^ ^^  ^  ^ ^^

+  'revised': '2019-08-01T18:41:41Z',
?                    ^ ^^  ^  ^ ^^

   'roles': None,
   'shortId': 'bDIuMp-w',
   'stateid': 1,
   'subjects': [],
-  'submitlog': 'removed duplicated text -TM',
+  'submitlog': 'period added',
-  'submitter': {'firstname': 'OpenStax',
?                              --------

+  'submitter': {'firstname': '',
-                'fullname': 'OpenStax Biology for AP Courses',
?                                     -----------------------

+                'fullname': 'OpenStax',
-                'id': 'cnxapbio',
?                       ^  ^^^^

+                'id': 'OpenStaxCollege',
?                       ^^^ +++ ^ +++++

-                'suffix': None,
?                          ^^^^

+                'suffix': '',
?                          ^^

-                'surname': 'Biology for AP Courses',
+                'surname': 'OpenStax College',
                 'title': ''},
   'title': 'Biology for AP® Courses',
-  'tree': {'contents': [{'id': '1dde1b61-63d4-4d4c-9137-d18fc12b56b3@15',
?                                                                      ^

+  'tree': {'contents': [{'id': '1dde1b61-63d4-4d4c-9137-d18fc12b56b3@16',
?                                                                      ^

-                         'shortId': 'Hd4bYWPU@15',
?                                               ^

+                         'shortId': 'Hd4bYWPU@16',
?                                               ^

                          'slug': None,
                          'title': 'Preface'},
 ...
-           'id': '6c322e32-9fb0-4c4d-a1d7-20c95c5c7af2@18.3',
?                                                          ^

+           'id': '6c322e32-9fb0-4c4d-a1d7-20c95c5c7af2@18.4',
?                                                          ^

-           'shortId': 'bDIuMp-w@18.3',
?                                   ^

+           'shortId': 'bDIuMp-w@18.4',
?                                   ^

            'slug': None,
            'title': 'Biology for AP® Courses'},
-  'version': '18.3'}
?                 ^

+  'version': '18.4'}
?                 ^

What this is showing is that:

  • 18.3 was baked on 2019-07-22 and 18.4 was baked on 2019-10-23
  • 18.4 was published ("revised") on 2019-08-01
  • 18.4 was submitted by "OpenStax" and the message was "period added"
  • the collection.xml has changed
  • the submitter changed from "OpenStax Biology for AP Courses" to "OpenStax"
  • the version of "Preface" changed from 15 to 16
  • the version of the book changed from 18.3 to 18.4

From that I decided that the content didn't really change, the only change was in "Preface" and that wasn't one of the urls that got changed anyway.

Look for recipe changes

By going to GOB, you can look at the baking history of a book. I clicked on "Biology for AP® Courses", and you can see the baking history of the book.

18.4 was baked multiple times, a few times failed due to mathmlcloud.cnx.org being offline.

To download the recipe used for baking, you can look at the info on the page:

Version: 18.4     Created: 2019-10-23 10:02:20.373495-05:00     Recipe: 2a26a01d5b12e9ac2674901f246b2b5cefb7c411     State: SUCCESS

And the recipe is at this url https://cnx.org/resources/2a26a01d5b12e9ac2674901f246b2b5cefb7c411.

To compare the difference between the recipes used in 18.3 and 18.4, I did:

wget -O recipe18.3.css 'https://cnx.org/resources/bcd7bd618d63fed711014994c4108974229c23b5'
wget -O recipe18.4.css 'https://cnx.org/resources/2a26a01d5b12e9ac2674901f246b2b5cefb7c411'

and then compare the css files diff -u recipe18.3.css recipe18.4.css:

--- recipe18.3.css 2019-11-01 15:51:21.366172008 +0100
+++ recipe18.4.css 2019-11-01 15:51:28.878238258 +0100
@@ -1420,7 +1420,7 @@

 :pass(3) div[data-type="chapter"]::before {
   container: span;
-  content: counter(chapter);
+  content: "Chapter " counter(chapter);
   class: os-number;
   move-to: bChapterLabel; }

@@ -1442,13 +1442,13 @@

 :pass(3) div.appendix::before {
   container: span;
-  content: counter(appendix, upper-alpha);
+  content: "Appendix " counter(appendix, upper-alpha);
   class: os-number;
   move-to: bAppendixLabel; }

 :pass(3) div.appendix::before {
   container: span;
-  content: " | ";
+  content: " ";
   class: os-divider;
   move-to: bAppendixLabel; }

This is exactly it. Our page slugs have the additional chapter- and appendix- in them and that's what changed between the recipes used in 18.3 and 18.4.

Solutions

We talked to CE styles team and found out that this is a change that they need in the pdf and so even if we revert the change now, we will need to enable it again later.

CE tech team could possibly do something with the slug generation code to avoid this url change.

But ultimately, there is a possibility that this will happen in the future again and the best way is to simply redirect old urls to the new ones within rex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment