Ryan was unable to publish m64279
on katalyst01 and
https://legacy-katalyst01.cnx.org/content/m64279/latest/ was broken. r2 found
this traceback in the postgresql logs:
2019-12-06 16:30:08 UTC STATEMENT:
INSERT INTO modules
(moduleid, portal_type, version, name, created, revised, authors, maintainers, licensors, abstractid, stateid, licenseid, doctype, submitter, submitlog, language, parent, google_analytics, print_style)
VALUES
(
'm64279',
'Module',
'1.18',
'Preface',
'2017-07-07T19:55:51Z',
'2019-12-06T16:30:04Z',
ARRAY ['cnxapbio'],
ARRAY ['cnxapbio', 'OpenStaxCollege', 'OSCRiceUniversity'],
ARRAY ['cnxapbio', 'OpenStaxCollege', 'OSCRiceUniversity'],
299,
1,
10,
'',
'OpenStaxCollege',
'preface class',
'en',
null,
null,
null
)
2019-12-06 16:30:08 UTC ERROR: spiexceptions.CardinalityViolation: more than one row returned by a subquery used as an expression
2019-12-06 16:30:08 UTC CONTEXT: Traceback (most recent call last):
PL/Python function "add_module_file", line 3, in <module>
return add_module_file(plpy, TD)
PL/Python function "add_module_file", line 765, in add_module_file
PL/Python function "add_module_file", line 204, in produce_html_for_module
PL/Python function "add_module_file", line 306, in produce_transformed_file
PL/Python function "add_module_file"
2019-12-06 16:30:08 UTC STATEMENT:
INSERT INTO module_files (module_ident, fileid, filename)
SELECT module_ident, '2640147',
'index.cnxml'
FROM modules
WHERE moduleid = 'm64279'
AND version = '1.18'
Ross rolled back m64279
version 1.18
in the database and
https://legacy-katalyst01.cnx.org/content/m64279/latest/ was fixed.
Ross was able to replicate it in the database by doing the equivalent of a "no change" publish, so it was not legacy and likely to be the cnxml-to-html transforms trigger.
Ryan tried to publish a different module m42936
and got the same error.
New module publish attempt gets new error: The id "m80187" is invalid - it is already in use.
The full slack conversation: https://openstax.slack.com/archives/CBW9DFKGF/p1575549860372300
In /var/lib/cnx/cnx-buildout/var/log/instance0.log
, I found:
2019-12-11T15:39:36 ERROR Zope.SiteErrorLog https://legacy-katalyst01.cnx.org/GroupWorkspaces/wg2275/m42936/module_publish_description
Traceback (innermost last):
Module ZPublisher.Publish, line 115, in publish
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 41, in call_object
Module Products.CMFFormController.FSControllerPageTemplate, line 90, in __call__
Module Products.CMFFormController.BaseControllerPageTemplate, line 28, in _call
Module Products.CMFFormController.ControllerBase, line 244, in getNext
Module Products.CMFFormController.Actions.TraverseTo, line 38, in __call__
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 41, in call_object
Module Products.CMFFormController.FSControllerPythonScript, line 104, in __call__
Module Products.CMFFormController.Script, line 145, in __call__
Module Products.CMFCore.FSPythonScript, line 108, in __call__
Module Shared.DC.Scripts.Bindings, line 311, in __call__
Module Products.PloneHotfix20110531, line 106, in _patched_bindAndExec
Module Products.PloneHotfix20130618.spamProtect, line 35, in _patched_bindAndExec
Module Products.PloneHotfix20121106.python_scripts, line 63, in _patched_bindAndExec
Module Shared.DC.Scripts.Bindings, line 348, in _bindAndExec
Module Products.CMFCore.FSPythonScript, line 164, in _exec
Module None, line 39, in publishContent
- <FSControllerPythonScript at /plone/publishContent used for /plone/GroupWorkspaces/wg2275/m42936>
- Line 39
Module Products.CNXCaching.module, line 27, in publishRevision
Module Products.RhaptosRepository.Repository, line 694, in publishRevision
Module Products.RhaptosModuleStorage.ModuleVersionFolder, line 176, in checkinResource
Module Products.RhaptosModuleStorage.ModuleDBTool, line 190, in insertModuleVersion
Module Products.RhaptosModuleStorage.ZSQLFile, line 44, in __call__
Module Shared.DC.ZRDB.DA, line 494, in __call__
- <ExtZSQLMethod at /plone/portal_moduledb/20191209150952.980889418597>
Module Products.ZPsycopgDA.db, line 207, in query
ProgrammingError: spiexceptions.CardinalityViolation: more than one row returned by a subquery used as an expression
CONTEXT: Traceback (most recent call last):
PL/Python function "add_module_file", line 3, in <module>
return add_module_file(plpy, TD)
PL/Python function "add_module_file", line 765, in add_module_file
PL/Python function "add_module_file", line 208, in produce_html_for_module
PL/Python function "add_module_file", line 310, in produce_transformed_file
PL/Python function "add_module_file"
In /var/log/postgresql/postgresql-9.4.main.log
:
2019-12-11 15:39:35 UTC CONTEXT: PL/Python function "republish_module"
2019-12-11 15:39:35 UTC STATEMENT:
INSERT INTO modules
(moduleid, portal_type, version, name, created, revised, authors, maintainers, licensors, abstractid, stateid, licenseid, doctype, submitter, submitlog, language, parent, google_analytics, print_style)
VALUES
(
'm42936',
'Module',
'1.13',
'Urbanization',
'2012-01-30T15:32:41Z',
'2019-12-11T15:39:23Z',
ARRAY ['OpenStaxCollege'],
ARRAY ['OpenStaxCollege', 'soccap'],
ARRAY ['OSCRiceUniversity'],
59257,
1,
10,
'',
'OpenStaxCollege',
'no change',
'en',
null,
'UA-30227798-2',
null
)
2019-12-11 15:39:36 UTC ERROR: spiexceptions.CardinalityViolation: more than one row returned by a subquery used as an expression
2019-12-11 15:39:36 UTC CONTEXT: Traceback (most recent call last):
PL/Python function "add_module_file", line 3, in <module>
return add_module_file(plpy, TD)
PL/Python function "add_module_file", line 765, in add_module_file
PL/Python function "add_module_file", line 208, in produce_html_for_module
PL/Python function "add_module_file", line 310, in produce_transformed_file
PL/Python function "add_module_file"
2019-12-11 15:39:36 UTC STATEMENT:
INSERT INTO module_files (module_ident, fileid, filename)
SELECT module_ident, '2640174',
'index.cnxml'
FROM modules
WHERE moduleid = 'm42936'
AND version = '1.13'
In eggs/Products.RhaptosModuleStorage-1.4.0-py2.4.egg/Products/RhaptosModuleStorage/sql/insertModuleFile.sql
:
INSERT INTO module_files (module_ident, fileid, filename)
SELECT module_ident, <dtml-sqlvar fileid type="string">,
<dtml-sqlvar filename type="string">
FROM modules
WHERE <dtml-sqltest moduleid type="string">
AND <dtml-sqltest version type="string">
Try creating a new version of m42936
manually (using the statement from the log):
repository=> INSERT INTO modules
repository-> (moduleid, portal_type, version, name, created, revised, authors, maintainers, licensors, abstractid, stateid, licenseid, doctype, submitter, submitlog, language, parent, google_analytics, print_style)
repository-> VALUES
repository-> (
repository(> 'm42936',
repository(> 'Module',
repository(> '1.13',
repository(> 'Urbanization',
repository(> '2012-01-30T15:32:41Z',
repository(> '2019-12-11T15:39:23Z',
repository(> ARRAY ['OpenStaxCollege'],
repository(> ARRAY ['OpenStaxCollege', 'soccap'],
repository(> ARRAY ['OSCRiceUniversity'],
repository(> 59257,
repository(> 1,
repository(> 10,
repository(> '',
repository(> 'OpenStaxCollege',
repository(> 'no change',
repository(> 'en',
repository(> null,
repository(> 'UA-30227798-2',
repository(> null
repository(> )
repository-> ;
INSERT 0 1
Get all the module files of m42936
version 1.13
:
repository=> select * from module_files where module_ident = 660935;
module_ident | fileid | filename
--------------+---------+----------------------
660935 | 226493 | Figure_20_02_01.jpg
660935 | 229970 | Figure_20_02_02.jpg
660935 | 226492 | Figure_20_02_03a.jpg
660935 | 226491 | Figure_20_02_03b.jpg
660935 | 226495 | Figure_20_02_04.jpg
660935 | 226496 | Figure_20_02_06.jpg
660935 | 1630848 | index.cnxml
660935 | 2544140 | index.cnxml.html
(8 rows)
Insert all images:
repository=> insert into module_files (module_ident, fileid, filename) select 1094902, fileid, filename from module_files where module_ident = 660935 and filename not in ('index.cnxml', 'index.cnxml.html');
INSERT 0 6
repository=> select * from module_files where module_ident = 1094902; module_ident | fileid | filename
--------------+--------+----------------------
1094902 | 226493 | Figure_20_02_01.jpg
1094902 | 229970 | Figure_20_02_02.jpg
1094902 | 226492 | Figure_20_02_03a.jpg
1094902 | 226491 | Figure_20_02_03b.jpg
1094902 | 226495 | Figure_20_02_04.jpg
1094902 | 226496 | Figure_20_02_06.jpg
(6 rows)
Insert the index.cnxml
:
repository=> insert into module_files (module_ident, fileid, filename) values (1094902, 1630848, 'index.cnxml');
INFO: produce cnxml->html for module_ident = 1094902
ERROR: spiexceptions.CardinalityViolation: more than one row returned by a subquery used as an expression
CONTEXT: Traceback (most recent call last):
PL/Python function "add_module_file", line 3, in <module>
return add_module_file(plpy, TD)
PL/Python function "add_module_file", line 765, in add_module_file
PL/Python function "add_module_file", line 208, in produce_html_for_module
PL/Python function "add_module_file", line 310, in produce_transformed_file
PL/Python function "add_module_file"
Access the database as postgres
and create a function to check plpython path:
administrator@katalyst01:/var/lib/cnx/cnx-buildout$ PGHOST=localhost psql -U postgres repository
psql (9.4.19)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
repository=# create function karen() returns text language plpythonu as $$ import sys; return(str(sys.path)) $$;
CREATE FUNCTION
Check the path as rhaptos
:
repository=> select karen();
['/var/cnx/venvs/archive/lib/python2.7/site-packages', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat']
We can confirm that plpython
uses the packages in /var/cnx/venvs/archive/lib/python2.7/site-packages
.
Drop the function as postgres
:
administrator@katalyst01:/var/lib/cnx/cnx-buildout$ PGHOST=localhost psql -U postgres repository
psql (9.4.19)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
repository=# drop function karen();
DROP FUNCTION
Add this to
/var/cnx/venvs/archive/lib/python2.7/site-packages/cnxtransforms/producers.py
line 305 to see exactly what values we are inserting to module_files
:
plpy.info('INSERT INTO module_files (module_ident, fileid, filename) VALUES ({}, {}, {});'.format(ident, destination_file_id, filename))
Start psql
as rhaptos
again:
administrator@katalyst01:/var/lib/cnx/cnx-buildout$ PGHOST=localhost psql -U rhaptos repository
psql (9.4.19)
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
repository=> insert into module_files (module_ident, fileid, filename) values (1094902, 1630848, 'index.cnxml');
INFO: produce cnxml->html for module_ident = 1094902
INFO: INSERT INTO module_files (module_ident, fileid, filename) VALUES (1094902, 2544140, index.cnxml.html);
ERROR: spiexceptions.CardinalityViolation: more than one row returned by a subquery used as an expression
CONTEXT: Traceback (most recent call last):
PL/Python function "add_module_file", line 3, in <module>
return add_module_file(plpy, TD)
PL/Python function "add_module_file", line 765, in add_module_file
PL/Python function "add_module_file", line 208, in produce_html_for_module
PL/Python function "add_module_file", line 311, in produce_transformed_file
PL/Python function "add_module_file"
Try inserting index.cnxml.html
manually:
repository=> INSERT INTO module_files (module_ident, fileid, filename) VALUES (1094902, 2544140, 'index.cnxml.html');
ERROR: more than one row returned by a subquery used as an expression
CONTEXT: SQL statement "SELECT (SELECT ab.abstract FROM abstracts ab INNER JOIN modules m
ON ab.abstractid = m.abstractid
WHERE m.module_ident = NEW.module_ident)"
PL/pgSQL function index_fulltext_trigger() line 21 at assignment
Look for the source code of function index_fulltext_trigger
:
administrator@katalyst01:/var/cnx/venvs/archive/lib/python2.7/site-packages/cnxdb$ grep -r fulltext .
...
./archive-sql/schema/fulltext-indexing.sql:CREATE OR REPLACE FUNCTION index_fulltext_trigger()
...
Have a look at fulltext-indexing.sql
, line 21 of index_fulltext_trigger
:
_abstract := (SELECT ab.abstract FROM abstracts ab INNER JOIN modules m
ON ab.abstractid = m.abstractid
WHERE m.module_ident = NEW.module_ident);
Doing the select manually returned:
repository=> select ab.abstract from abstracts ab inner join modules m on ab.abstractid = m.abstractid where m.module_ident = 1094902;
abstract
-------------------------------------------------------------------------------
<list> +
<item>Describe the process of urbanization in the United States</item> +
<item>Understand the function of suburbs, exurbs, and concentric zones</item>+
<item>Discuss urbanization from various sociological perspectives</item> +
</list>
<list> +
<item>Describe the process of urbanization in the United States</item> +
<item>Understand the function of suburbs, exurbs, and concentric zones</item>+
<item>Discuss urbanization from various sociological perspectives</item> +
</list>
(2 rows)
So finally, we found the problem! This is what the error is about.
Have a look at the abstractid
and abstracts
table:
repository=> select abstractid from modules where module_ident = 1094902;
abstractid
------------
59257
(1 row)
repository=> select * from abstracts where abstractid = 59257;
abstractid | abstract | html
------------+-------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
59257 | <list> +| <div xmlns="http://www.w3.org/1999/xhtml" xmlns:c="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml" xmlns:qml="http://cnx.rice.edu/qml/1.0" xmlns:mod="http://cnx.rice.edu/#moduleIds" xmlns:bib="http://bibtexml.sf.net/" xmlns:data="http://www.w3.org/TR/html5/dom.html#custom-data-attribute" data-cnxml-to-html-ver="1.3.2"><ul>+
| <item>Describe the process of urbanization in the United States</item> +| <li>Describe the process of urbanization in the United States</li> +
| <item>Understand the function of suburbs, exurbs, and concentric zones</item>+| <li>Understand the function of suburbs, exurbs, and concentric zones</li> +
| <item>Discuss urbanization from various sociological perspectives</item> +| <li>Discuss urbanization from various sociological perspectives</li> +
| </list> | </ul></div>
59257 | <list> +| <div xmlns="http://www.w3.org/1999/xhtml" xmlns:c="http://cnx.rice.edu/cnxml" xmlns:md="http://cnx.rice.edu/mdml" xmlns:qml="http://cnx.rice.edu/qml/1.0" xmlns:mod="http://cnx.rice.edu/#moduleIds" xmlns:bib="http://bibtexml.sf.net/" xmlns:data="http://www.w3.org/TR/html5/dom.html#custom-data-attribute" data-cnxml-to-html-ver="1.3.2"><ul>+
| <item>Describe the process of urbanization in the United States</item> +| <li>Describe the process of urbanization in the United States</li> +
| <item>Understand the function of suburbs, exurbs, and concentric zones</item>+| <li>Understand the function of suburbs, exurbs, and concentric zones</li> +
| <item>Discuss urbanization from various sociological perspectives</item> +| <li>Discuss urbanization from various sociological perspectives</li> +
| </list> | </ul></div>
(2 rows)
There are 2 entries for abstractid
59257
!
Now, check how many abstracts are duplicate:
repository=> select abstractid, count(*) from abstracts group by abstractid having count(*) > 1;
...
(445570 rows)
repository=> select abstractid, count(*) from abstracts group by abstractid having count(*) = 1;
...
(106 rows)
(I checked the staging and prod databases and this problem doesn't exist)
Ross told me about the built in ctid
column which can be used to distinguish between 2 identical rows so that it's possible to delete the duplicate rows:
repository=> select ctid from abstracts where abstractid = 59257;
ctid
-----------
(14514,9)
(2249,8)
(2 rows)
repository=> delete from abstracts where abstractid = 59257 and ctid != (select min(ctid) from abstracts where abstractid = 59257);
DELETE 1
So now insert index.cnxml
again and:
repository=> insert into module_files (module_ident, fileid, filename) values (1094902, 1630848, 'index.cnxml');
INFO: produce cnxml->html for module_ident = 1094902
INFO: produce {}->{} for module_ident = {}
INSERT 0 1
So m42936
is fixed!
Ross suggested an easy way of fixing the duplicate rows in the abstracts
table by creating a new table with distinct rows and then renaming the tables:
repository=> CREATE TABLE new_abstracts AS SELECT DISTINCT * FROM abstracts;
SELECT 445676
repository=> select abstractid, count(*) from new_abstracts group by abstractid having count(*) > 1;
abstractid | count
------------+-------
(0 rows)
repository=> select count(*) from new_abstracts;
count
--------
445676
(1 row)
repository=> select count(*) from abstracts;
count
--------
891245
(1 row)
repository=> select count(distinct abstractid) from abstracts;
count
--------
445676
(1 row)
repository=> alter table abstracts rename to old_abstracts; alter table new_abstracts rename to abstracts;
ALTER TABLE
ALTER TABLE
A quick check on the invalid moduleid
error to see if the moduleid_seq
sequence needs updating:
repository=> select last_value from moduleid_seq;
last_value
------------
80187
(1 row)
repository=> select max(substring(moduleid, 2)::int) from modules where portal_type = 'Module';
max
-------
80186
(1 row)
The "maximum" module id in the modules table is m80186
and the last value
from the moduleid sequence is 80187
, which looks ok to me. Not sure why we
would get the error The id "m80187" is invalid - it is already in use.
.
Looking at legacy /var/lib/cnx/cnx-buildout/bin/instance debug
:
>>> app.plone.content.m80187 # m80187 exists on legacy (but not on the database)
2019-12-11 19:12:26 WARNING ZODB Could not import class 'PloneboardTool' from module 'Products.Ploneboard.PloneboardTool'
2019-12-11 19:12:26 WARNING ZODB Could not import class 'OLSForumTool' from module 'Products.OLSForumConnector.OLSForumTool'
<ModuleVersionStub at /plone/content/m80187>
>>> app.plone.content.m90000 # a module that doesn't exist
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: m90000
In /var/lib/cnx/cnx-buildout/var/log/instance1.log
, I found:
2019-12-11T16:14:44 ERROR Zope.SiteErrorLog https://legacy-katalyst01.cnx.org/Members/OpenStaxCollege/module.2019-12-11.4919547500/publishContent
Traceback (innermost last):
Module ZPublisher.Publish, line 115, in publish
Module ZPublisher.mapply, line 88, in mapply
Module ZPublisher.Publish, line 41, in call_object
Module Products.CMFFormController.FSControllerPythonScript, line 104, in __call__
Module Products.CMFFormController.Script, line 145, in __call__
Module Products.CMFCore.FSPythonScript, line 108, in __call__
Module Shared.DC.Scripts.Bindings, line 311, in __call__
Module Products.PloneHotfix20110531, line 106, in _patched_bindAndExec
Module Products.PloneHotfix20130618.spamProtect, line 35, in _patched_bindAndExec
Module Products.PloneHotfix20121106.python_scripts, line 63, in _patched_bindAndExec
Module Shared.DC.Scripts.Bindings, line 348, in _bindAndExec
Module Products.CMFCore.FSPythonScript, line 164, in _exec
Module None, line 35, in publishContent
- <FSControllerPythonScript at /plone/publishContent used for /plone/Members/OpenStaxCollege/module.2019-12-11.4919547500>
- Line 35
Module Products.RhaptosRepository.Repository, line 645, in publishObject
Module Products.RhaptosModuleStorage.ModuleVersionFolder, line 71, in createVersionFolder
Module Products.BTreeFolder2.BTreeFolder2, line 417, in _setObject
Module Products.BTreeFolder2.BTreeFolder2, line 529, in _checkId
Module OFS.ObjectManager, line 95, in checkValidId
BadRequest: The id "m80187" is invalid - it is already in use.
>>> getattr(app.plone.Members.OpenStaxCollege['module.2019-12-11.4919547500'].aq_base, 'objectId')
>>> getattr(app.plone.Members.OpenStaxCollege['module.2019-12-11.4919547500'].aq_base, 'version')
'**new**'
I'm not sure why we get the "already in use" error.