Skip to content

Instantly share code, notes, and snippets.

@lsloan
Forked from IanHopkinson/lxml_examples.py
Last active April 10, 2023 21:11
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lsloan/1ba7539d097f9c622054c8e83a241297 to your computer and use it in GitHub Desktop.
Save lsloan/1ba7539d097f9c622054c8e83a241297 to your computer and use it in GitHub Desktop.
QTI data processing in Python; examples using pyslet, beautifulsoup4, and lxml.

Examples of processing QTI data with Python.

I attempted to use pyslet, which was designed for this purpose, but I found it awkward to use and its documentation unclear. Instead, I tried to use beautifulsoup4, but I learned that library doesn't support XPath to query for specific elements of the data. I turned to using the simple XML processing library lxml. It has similarities to other XML parsing libraries I've used before, but it has many unique features of its own.

Note that of the examples below, each does something a little differently. They don't all have the same output.
That's because they were mostly tests to see whether we preferred working with one library over another. Some
of them had features that were better or worse than the others. At some point, these examples should be
revisited to make them all do the same thing, like a simple dump of all questions, their answers, and an
indication of which answer is the correct one.

List of examples:

  1. pyslet-test.py — An attempt at using pyslet as recommended by colleagues at IMS
    Global. I thought it would be ideal to use a library written specifically for QTI, but it leaves a lot to be desired.
  2. bs4-test.py — Tried using beautifulsoup4, but I found that this approach would be
    very loop-intensive because it doesn't support something like XPath.
  3. lxml-test-etree.py — Uses the etree module of lxml to access data in the document and XPath to find
    specific elements. The notable thing is that the XPath queries must specify the namespace of the elements searched, even though they are in the default namespace of the document.
  4. lxml-test-objectify.py — Uses the objectify module of lxml to parse the XML into Pythonic objects and
    provide typical techniques for accessing the data. That module provides ObjectPath as a simple XPath-like query of the data, but it is a little awkward.
from bs4 import BeautifulSoup, ResultSet, PageElement
qtiFile = open('sample-test.xml', 'rb').read()
doc = BeautifulSoup(qtiFile, 'lxml')
items: ResultSet = doc.find_all('item')
print(type(items[0]))
item: PageElement
for item in items:
print(item['title'], item.attrs)
#!/usr/bin/env python
# encoding: utf-8
from lxml import etree
from lxml.etree import ElementTree
def main():
filename = 'sample-test.xml'
tree: ElementTree = etree.parse(filename)
root = tree.getroot()
defaultNamespace = {'_': root.nsmap[None]}
# all elements in the document
# for element in root.getiterator():
# print(element.tag)
# xpath to find attributes doesn't need NS
titleAttr = root.xpath('//@title')
print(f'@title: {titleAttr}')
# xpath to find elements requires NS
# it would be nice if this could be defined once ,
items = root.xpath('//_:item', namespaces=defaultNamespace)
for item in items:
print(f'Element name: {item.tag}, element text "{item.text}"')
if '__main__' == __name__:
main()
#!/usr/bin/env python
# encoding: utf-8
from lxml import objectify
def main():
filename = 'sample-test.xml'
tree = objectify.parse(filename)
root = tree.getroot()
path = objectify.ObjectPath('questestinterop.assessment.section')
items = path.find(root).getchildren()
for item in items:
print(f'ident: "{item.attrib["ident"]}"')
print(f'title: "{item.attrib["title"]}"')
print(f'question: "{item.presentation.material.mattext}"')
# FIXME: don't skip
if item.presentation.response_lid.attrib['rcardinality'] == 'Multiple':
print ('SKIPPING Multiple reponse question!')
else:
# FIXME: fails if contents of "conditionvar" is complex,
# for example, when "response_lid" has `rcardinality="Multiple"`
correctChoiceID = str(item.resprocessing.respcondition.conditionvar.
varequal).strip()
choices = item.presentation.response_lid.render_choice.getchildren()
for choice in choices:
choiceID = str(choice.attrib['ident']).strip()
print(f'choice (ident={choiceID}'
f'{", correct" if choiceID == correctChoiceID else ""}): '
f'"{choice.material.mattext}"')
print('¯-_-' * 15)
if '__main__' == __name__:
main()
from typing import Optional, Any, Sequence
import pyslet.qtiv2.xml as qti
from pyslet.xml.namespace import NSElement
from pyslet.xml.structures import Document, Element
def getChildElements(node: NSElement) -> Sequence[NSElement]:
# skips strings not enclosed within elements (e.g., linebreaks in XML file)
return [child for child in node.get_children() if
isinstance(child, NSElement)]
def getAttr(node: NSElement, attrName: str, default: Any = None) -> \
Optional[Any]:
# gracefully return attribute value or substitute a default value
try:
return node.get_attribute(attrName)
except KeyError:
return default
def traverseChildTree(node: NSElement, depth=0, indentSize=2) -> None:
children = getChildElements(node)
if len(children) == 0:
return
indentation = ' ' * depth * indentSize
for child in children:
interestingAttributes = ['title', 'ident', 'respident']
interestingAttrValues = {key: getAttr(child, key) for key in
interestingAttributes if getAttr(child, key)}
print(indentation + '; '.join([
f'<{child.xmlname}>',
f'attributes: {interestingAttrValues}',
# ignore values that contain other elements,
# those elements will be seen when traversing the next level
f'value: "{child.get_value(ignore_elements=True).strip()}"']))
traverseChildTree(child, depth=depth + 1)
if '__main__' == __name__:
doc = qti.QTIDocument()
with open('sample-test.xml', 'rb') as f:
doc.read(src=f)
root: Document = doc.root
traverseChildTree(root)
<?xml version="1.0" encoding="UTF-8"?>
<questestinterop xmlns="http://www.imsglobal.org/xsd/ims_qtiasiv1p2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.imsglobal.org/xsd/ims_qtiasiv1p2 http://www.imsglobal.org/xsd/ims_qtiasiv1p2p1.xsd">
<assessment ident="g0b648e96ac6d927a9d6431533dbadcb2" title="Crossing the Bridge of Death">
<qtimetadata>
<qtimetadatafield>
<fieldlabel>cc_maxattempts</fieldlabel>
<fieldentry>unlimited</fieldentry>
</qtimetadatafield>
</qtimetadata>
<section ident="root_section">
<item ident="gad2a567bd83d5666a648499127b67bfe" title="Name">
<itemmetadata>
<qtimetadata>
<qtimetadatafield>
<fieldlabel>question_type</fieldlabel>
<fieldentry>multiple_choice_question</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>points_possible</fieldlabel>
<fieldentry>1.0</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>original_answer_ids</fieldlabel>
<fieldentry>9071,6500,4802,1355</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>assessment_question_identifierref</fieldlabel>
<fieldentry>gdf6d7b7650101f8e5ec9df94414b85eb</fieldentry>
</qtimetadatafield>
</qtimetadata>
</itemmetadata>
<presentation>
<material>
<mattext texttype="text/html">&lt;div&gt;&lt;p&gt;What is your name?&lt;/p&gt;&lt;/div&gt;</mattext>
</material>
<response_lid ident="response1" rcardinality="Single">
<render_choice>
<response_label ident="9071">
<material>
<mattext texttype="text/plain">Launcelot</mattext>
</material>
</response_label>
<response_label ident="6500">
<material>
<mattext texttype="text/plain">Robin</mattext>
</material>
</response_label>
<response_label ident="4802">
<material>
<mattext texttype="text/plain">Gawain</mattext>
</material>
</response_label>
<response_label ident="1355">
<material>
<mattext texttype="text/plain">Arthur</mattext>
</material>
</response_label>
</render_choice>
</response_lid>
</presentation>
<resprocessing>
<outcomes>
<decvar maxvalue="100" minvalue="0" varname="SCORE" vartype="Decimal"/>
</outcomes>
<respcondition continue="No">
<conditionvar>
<varequal respident="response1">9071</varequal>
</conditionvar>
<setvar action="Set" varname="SCORE">100</setvar>
</respcondition>
</resprocessing>
</item>
<item ident="gf7b0332279f5882d0f0fbcceb3f559f9" title="Quest">
<itemmetadata>
<qtimetadata>
<qtimetadatafield>
<fieldlabel>question_type</fieldlabel>
<fieldentry>multiple_choice_question</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>points_possible</fieldlabel>
<fieldentry>1.0</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>original_answer_ids</fieldlabel>
<fieldentry>145,5514,2772,5811</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>assessment_question_identifierref</fieldlabel>
<fieldentry>ged724b2f3667dfa46926fb6328bf9d2c</fieldentry>
</qtimetadatafield>
</qtimetadata>
</itemmetadata>
<presentation>
<material>
<mattext texttype="text/html">&lt;div&gt;&lt;p&gt;What is your quest?&lt;/p&gt;&lt;/div&gt;</mattext>
</material>
<response_lid ident="response1" rcardinality="Single">
<render_choice>
<response_label ident="145">
<material>
<mattext texttype="text/plain">To seek the Holy Grail</mattext>
</material>
</response_label>
<response_label ident="5514">
<material>
<mattext texttype="text/plain">To go to White Castle</mattext>
</material>
</response_label>
<response_label ident="2772">
<material>
<mattext texttype="text/plain">To go to Camelot</mattext>
</material>
</response_label>
<response_label ident="5811">
<material>
<mattext texttype="text/plain">To graduate from U-M</mattext>
</material>
</response_label>
</render_choice>
</response_lid>
</presentation>
<resprocessing>
<outcomes>
<decvar maxvalue="100" minvalue="0" varname="SCORE" vartype="Decimal"/>
</outcomes>
<respcondition continue="No">
<conditionvar>
<varequal respident="response1">145</varequal>
</conditionvar>
<setvar action="Set" varname="SCORE">100</setvar>
</respcondition>
</resprocessing>
</item>
<item ident="gaa34fb0f17f5b27785e67d5ba8838757" title="Colour">
<itemmetadata>
<qtimetadata>
<qtimetadatafield>
<fieldlabel>question_type</fieldlabel>
<fieldentry>multiple_choice_question</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>points_possible</fieldlabel>
<fieldentry>1.0</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>original_answer_ids</fieldlabel>
<fieldentry>5793,8708,2546,7153</fieldentry>
</qtimetadatafield>
<qtimetadatafield>
<fieldlabel>assessment_question_identifierref</fieldlabel>
<fieldentry>g4cad9057fa7c22fc53fe801d739a1e5d</fieldentry>
</qtimetadatafield>
</qtimetadata>
</itemmetadata>
<presentation>
<material>
<mattext texttype="text/html">&lt;div&gt;&lt;p&gt;What is your favourite colour?&lt;/p&gt;&lt;/div&gt;</mattext>
</material>
<response_lid ident="response1" rcardinality="Single">
<render_choice>
<response_label ident="5793">
<material>
<mattext texttype="text/plain">Blue</mattext>
</material>
</response_label>
<response_label ident="8708">
<material>
<mattext texttype="text/plain">Yellow</mattext>
</material>
</response_label>
<response_label ident="2546">
<material>
<mattext texttype="text/plain">Maize and blue</mattext>
</material>
</response_label>
<response_label ident="7153">
<material>
<mattext texttype="text/plain">Blue . . . no, yellow!</mattext>
</material>
</response_label>
</render_choice>
</response_lid>
</presentation>
<resprocessing>
<outcomes>
<decvar maxvalue="100" minvalue="0" varname="SCORE" vartype="Decimal"/>
</outcomes>
<respcondition continue="No">
<conditionvar>
<varequal respident="response1">5793</varequal>
</conditionvar>
<setvar action="Set" varname="SCORE">100</setvar>
</respcondition>
</resprocessing>
</item>
</section>
</assessment>
</questestinterop>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment