Skip to content

Instantly share code, notes, and snippets.

@karenc
Last active December 19, 2019 19:43
Show Gist options
  • Save karenc/e072e708c78240768d73f56da64a7377 to your computer and use it in GitHub Desktop.
Save karenc/e072e708c78240768d73f56da64a7377 to your computer and use it in GitHub Desktop.

The Problem

When downloading a new collection on katalyst01 col30381 using neb get, Ryan got this error:

karen@yachiyo:~/src/nebuchadnezzar$ neb get katalyst01 col30381 latest
Traceback (most recent call last):
  File "./venv/bin/neb", line 11, in <module>
    load_entry_point('nebuchadnezzar', 'console_scripts', 'neb')()
  File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/karen/src/nebuchadnezzar/nebu/cli/_common.py", line 59, in wrapper
    return func(*args, **kwargs)
  File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/click/decorators.py", line 17, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/home/karen/src/nebuchadnezzar/nebu/cli/get.py", line 102, in get
    version = resp.json()['headVersion']
  File "/home/karen/src/nebuchadnezzar/venv/lib/python3.6/site-packages/requests/models.py", line 897, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Investigation

I added some print statements in neb get so we can see the url and the response:

diff --git a/nebu/cli/get.py b/nebu/cli/get.py
index cdec743..e9c791c 100644
--- a/nebu/cli/get.py
+++ b/nebu/cli/get.py
@@ -95,10 +95,12 @@ def get(ctx, env, col_id, col_version, output_dir, book_tree, get_resources):

     # Fetch extras (includes head and downloadable file info)
     url = '{}/extras/{}@{}'.format(base_url, uuid, version)
+    print(url)
     resp = session.get(url)

     # Latest defaults to successfully baked - we need headVersion
     if col_version == 'latest':
+        print(resp.text)
         version = resp.json()['headVersion']
         url = '{}/extras/{}@{}'.format(base_url, uuid, version)
         resp = session.get(url)

This is the debug output:

https://archive-katalyst01.cnx.org/extras/a919011d-956e-4db2-8c1a-37ad293bbfc9@1.1
<html>
 <head>
  <title>404 Not Found</title>
 </head>
 <body>
  <h1>404 Not Found</h1>
  The resource could not be found.<br/><br/>



 </body>
</html>

Then I had a look in cnx-archive for the view function for this url and found:

    add_route('content-extras', '/extras/{ident_hash:([^:/@.]+(@[0-9.]*[0-9]+)?)}{separator:(:?)}{page_ident_hash:([^:/@.]+(@[0-9.]*[0-9]+)?)?}')  # noqa cnxarchive.views:get_extra

The route is called content-extras and you can find the code by doing:

karen@yachiyo:~/src/cnx-archive$ git grep content-extras
...
cnxarchive/views/content.py:@view_config(route_name='content-extras', request_method='GET',

Looking at the source code:

@view_config(route_name='content-extras', request_method='GET',
             http_cache=(600, {'public': True}))
def get_extra(request):
    """Return information about a module / collection that cannot be cached."""
    settings = get_current_registry().settings
    exports_dirs = settings['exports-directories'].split()
    args = request.matchdict
    if args['page_ident_hash']:
        context_id, context_version = split_ident_hash(args['ident_hash'])
        try:
            id, version = split_ident_hash(args['page_ident_hash'])
        except IdentHashShortId as e:
            id = get_uuid(e.id)
            version = e.version
        except IdentHashMissingVersion as e:
            # Ideally we would find the page version
            # that is in the book instead of latest
            id = e.id
            version = get_latest_version(e.id)
...

My feeling is that may be get_latest_version was failing. So next, I sshed into katalyst01.cnx.org and had a look at the database:

repository=> select moduleid, version, name, created from modules where portal_type = 'Collection' order by created desc limit 3;
 moduleid | version |                                                  name                                                  |        created
----------+---------+--------------------------------------------------------------------------------------------------------+------------------------
 col30381 | 1.1     | Physics                                                                                                | 2019-12-13 20:02:00+00
 col30328 | 1.1     | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-06 17:59:56+00
 col30328 | 1.2     | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-06 17:59:56+00
(3 rows)

repository=> select moduleid, version, name, created from latest_modules where portal_type = 'Collection' order by created desc limit 3;
 moduleid | version |                                                  name                                                  |        created
----------+---------+--------------------------------------------------------------------------------------------------------+------------------------
 col30328 | 1.3     | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-06 17:59:56+00
 col30327 | 1.3     | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-06 17:45:44+00
 col30326 | 1.8     | My Final Analysis of Everything - including Emotions, Feelings and Thoughts - By Mark Rozen Pettinelli | 2019-11-05 15:23:07+00
(3 rows)

So col30381 is in the modules table but not in the latest_modules table.

Looking at the update_latest trigger:

CREATE OR REPLACE FUNCTION update_latest() RETURNS trigger AS '
BEGIN
-- lastest content is the highest version that has successfully baked - states 1 and 8 (current and fallback)
-- represent some sort of success (fallback used an old recipe due to errors)
  IF (TG_OP = ''INSERT'' OR TG_OP = ''UPDATE'') AND
          ARRAY [NEW.major_version, NEW.minor_version] >= (SELECT ARRAY [major_version, minor_version]
            FROM latest_modules WHERE uuid = NEW.uuid UNION ALL SELECT ARRAY[0, NULL] LIMIT 1) AND
          NEW.stateid in (1, 8) THEN -- current and fallback
      LOCK TABLE latest_modules IN SHARE ROW EXCLUSIVE MODE;
...

latest_modules is not updated if stateid isn't 1 or 8 and looking at the stateid:

repository=> select moduleid, version, stateid from modules where portal_type = 'Collection' order by created desc limit 3;
 moduleid | version | stateid
----------+---------+---------
 col30381 | 1.1     |       6
 col30381 | 1.1     |       6
 col30328 | 1.1     |       1
(3 rows)

That was exactly the problem. stateid 6 means baking failed.

Looking at the GOB page for physics, we get this traceback:

Traceback (most recent call last):
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/tasks.py", line 31, in __call__
    return super(PyramidAwareTask, self).__call__(*args, **kwargs)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/celery/app/trace.py", line 648, in __protected_call__
    return self.run(*args, **kwargs)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/db.py", line 74, in wrapped
    return func(*args, **kwargs)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/subscribers.py", line 168, in baking_processor
    bake(binder, recipe_id, publisher, message, cursor=cursor)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/db.py", line 70, in wrapped
    return func(*args, **kwargs)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxpublishing/bake.py", line 74, in bake
    binder = collate_models(binder, ruleset=recipe, includes=includes)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/collation.py", line 71, in collate
    collated_binder = reconstitute(collated_html)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/collation.py", line 51, in reconstitute
    return adapt_single_html(xhtml)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/adapters.py", line 360, in adapt_single_html
    _adapt_single_html_tree(binder, body[0], nav_tree, top_metadata=metadata)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/adapters.py", line 517, in _adapt_single_html_tree
    id_map=id_map, depth=depth+1)
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/adapters.py", line 475, in _adapt_single_html_tree
    child.xpath('./*[@data-type="metadata"]')[0])
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/html_parsers.py", line 74, in parse_metadata
    return parser()
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/html_parsers.py", line 108, in __call__
    return self.metadata
  File "/var/cnx/venvs/publishing/local/lib/python2.7/site-packages/cnxepub/html_parsers.py", line 128, in metadata
    "A value for '{}' could not be found.".format(key))
ValueError: A value for 'title' could not be found.

Reproducing the Error

Download the recipe used (you can find the resource hash on the GOB page):

wget -O katalyst01-physics-recipe.css 'https://katalyst01.cnx.org/resources/a1f636ac26f5bdf29a78bade3f9f499ee20e0212'

Download and bake the book (after commenting out the bit that neb get tries to get the head version):

mkdir -p physics/raw
neb get -d physics/raw katalyst01 col30381 1.1
neb assemble physics/raw physics  # this creates physics/collection.assembled.xhtml
cnx-easybake katalyst01-physics-recipe.css physics/collection.assembled.xhtml physics/collection.baked.xhtml

Looking at the traceback, the error happens when running adapt_single_html:

karen@yachiyo:~/src/cnx-epub$ python
Python 2.7.15+ (default, Oct  7 2019, 17:39:04)
[GCC 7.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import logging
>>> logging.getLogger('cnxepub').addHandler(logging.StreamHandler())
>>> from cnxepub import adapt_single_html
>>> adapt_single_html(open('physics/collection.baked.xhtml').read())
Error when parsing metadata for composite-page (id: None, parent: "1 What is Physics?")
Traceback (most recent call last):
  File "/home/karen/src/cnx-epub/cnxepub/adapters.py", line 476, in _adapt_single_html_tree
    child.xpath('./*[@data-type="metadata"]')[0])
  File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 74, in parse_metadata
    return parser()
  File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 108, in __call__
    return self.metadata
  File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 128, in metadata
    "A value for '{}' could not be found.".format(key))
ValueError: A value for 'title' could not be found.
No handlers could be found for logger "cnxepub"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/karen/src/cnx-epub/cnxepub/adapters.py", line 360, in adapt_single_html
    _adapt_single_html_tree(binder, body[0], nav_tree, top_metadata=metadata)
  File "/home/karen/src/cnx-epub/cnxepub/adapters.py", line 529, in _adapt_single_html_tree
    id_map=id_map, depth=depth+1)
  File "/home/karen/src/cnx-epub/cnxepub/adapters.py", line 476, in _adapt_single_html_tree
    child.xpath('./*[@data-type="metadata"]')[0])
  File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 74, in parse_metadata
    return parser()
  File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 108, in __call__
    return self.metadata
  File "/home/karen/src/cnx-epub/cnxepub/html_parsers.py", line 128, in metadata
    "A value for '{}' could not be found.".format(key))
ValueError: A value for 'title' could not be found.

Tada! We have reproduced the error seen on the server.

Specifically it's the composite page after "1.3 The Language of Physics: Physical Quantities and Units"

  <div class="os-eoc os-ost-chapter-review.problem-container" data-type="composite-page"><div data-type="metadata" style="display: none;">
      <h1 data-type="document-title" itemprop="name"></h1>

Temporary Solution

Since this is a composite page, it's probably a problem with the recipe and is not somthing Ryan can fix by changing the content.

I tried baking the same book with the physics recipe and that worked, so Ryan will see if she can change the recipe and publish for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment