Skip to content

Instantly share code, notes, and snippets.

View CliffordAnderson's full-sized avatar

Clifford Anderson CliffordAnderson

View GitHub Profile

Artistic Influence

Introduction

This graph studies the relations of infuence between artists. The data comes from this query of Wikidata:

for $doc in fn:collection("bpp-quarterly")//FullText[.//text() contains text {"jury", "law"} any using stemming]
let $hits := ft:extract($doc[.//text() contains text {"jury", "law"} all using stemming])
let $count := fn:count($hits//mark)
let $record := fn:doc(fn:base-uri($doc))
order by $count descending
return <hit count="{$count}" url="{$record//URLDocView}" title="{$record//RecordTitle}">{$hits}</hit>
@CliffordAnderson
CliffordAnderson / reports.xqy
Created October 15, 2019 17:24
For Jim Duran & co.
let $data := <column name="Reporters">Williams, Brian|Maceda, Jim|Keith, Brian Williams,|Thompson, Anne</column>
let $contributor := $data[@name='Reporters']/text() => fn:tokenize("\|")
return $contributor
@CliffordAnderson
CliffordAnderson / 00-introduction.md
Last active May 20, 2020 23:54
Text Mining at Scale (XQuery Working Group) Natural Language Processing

Natural Language Processing

Today, we’ll be exploring patterns in a corpus of genuine and fake news collected in 2016 by Buzz Feed and scored for veracity by professional journalists. As you might imagine, the corpus contains very partisan perspectives; individual articles may contain disturbing language and viewpoints. In the initial code example below, you will need to have downloaded the data set and have created a database called articles.

We’ll begin our investigation of natural language processing by using Aylien, which bill itself as a “News Intelligence Platform,” to classify these articles, analyze their topics, identify the people, places, things they discuss, and to discern the sentiment or tone of the articles. If you would like to follow along, please sign up for a free API key.

@CliffordAnderson
CliffordAnderson / hashtag.xqy
Created October 10, 2019 22:44
NLP with Aylien
let $appid := "###"
let $key := "###"
let $endpoint := "https://api.aylien.com/api/v1/"
let $service := "hashtags"
let $text := fn:encode-for-uri("Are you serious? Do you think anyone cares about that crazy plan? Get back to Arizona!")
let $request :=
<http:request method="get" href="{$endpoint || $service || '?text=' || $text}">
<http:header name="Accept" value="text/xml"/>
<http:header name="X-AYLIEN-TextAPI-Application-Key" value="{$key}"/>
<http:header name="X-AYLIEN-TextAPI-Application-ID" value="{$appid}"/>
xquery version "3.1";
declare function local:seriesTitle($Network as xs:string?) as xs:string
{
switch ($Network)
case "ABC" return "ABC World News Tonight"
case "CBS" return "CBS Evening News"
case "NBC" return "NBC Nightly News"
default return "unknown network"
};
@CliffordAnderson
CliffordAnderson / 00-introduction.md
Last active May 20, 2020 23:54
Code snippets for XQuery Working Group (Text Mining at Scale)

XQuery Working Group

XQuery and XPath Full Text 1.0

In this session, we will be exploring the XQuery and XPath Full Text 1.0 standard. Our goal is to take the records that we created during our prior class from the Victorian Women Writers Project and persist them to another database where we will analyze their contents for textual patterns.

The following exercises assume that you have loaded the documents from the Victorian Women Writers Project into a BaseX database. It is also assumed that you have named that database vwwp_tei.

XQuery Working Group

Text Mining at Scale

In this session, we will extract poems from the Victorian Women Writers Project. The electronic editions of these documents are maintained in TEI P5 format on Github. You can also download a zip file of the entire corpus.

The following exercises assume that you have loaded the documents from the Victorian Women Writers Project into a BaseX database. It is also assumed that you have named that database vwwp_tei.

@CliffordAnderson
CliffordAnderson / start-query.xqy
Last active September 24, 2019 20:12
Starter Query
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "csv";
declare option output:csv "header=yes, separator=comma";
<csv>
{
for $doc in fn:collection("bpp-quarterly")
let $fullText := $doc/Record/FullText/text()
let $title := $doc/Record/Publication/Title/text()
let $articleTitle := $doc/Record/RecordTitle/text()
@CliffordAnderson
CliffordAnderson / xpath.xqy
Last active September 20, 2019 13:24
Practice XPath expressions for USC
distinct-values(//self::element()[text()]/name())
/child::usc:uscDoc/child::usc:main/child::usc:title/(child::usc:heading|child::usc:num)