Clifford Anderson CliffordAnderson

## graph_gist_artists.adoc

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                CliffordAnderson
                / graph_gist_artists.adoc
            
            
              Last active
              May 20, 2020 23:53
                — forked from jexp/graph_gist_template.adoc
            
              
                Influential Artists
              
          
    Artistic Influence


Introduction


This graph studies the relations of infuence between artists. The data comes from this query of Wikidata:


## basic-full-text-query.xqy
for $doc in fn:collection("bpp-quarterly")//FullText[.//text() contains text {"jury", "law"} any using stemming]
let $hits := ft:extract($doc[.//text() contains text {"jury", "law"} all using stemming])
let $count := fn:count($hits//mark)
let $record := fn:doc(fn:base-uri($doc))
order by $count descending
return <hit count="{$count}" url="{$record//URLDocView}" title="{$record//RecordTitle}">{$hits}</hit>

## reports.xqy
let $data := <column name="Reporters">Williams, Brian|Maceda, Jim|Keith, Brian Williams,|Thompson, Anne</column>
let $contributor := $data[@name='Reporters']/text() => fn:tokenize("\|")
return $contributor

## 00-introduction.md

      
              8 files
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                CliffordAnderson
                / 00-introduction.md
            
            
              Last active
              May 20, 2020 23:54
            
              
                Text Mining at Scale (XQuery Working Group) Natural Language Processing
              
          
    Natural Language Processing

Today, we’ll be exploring patterns in a corpus of genuine and fake news collected in 2016 by Buzz Feed and scored for veracity by professional journalists. As you might imagine, the corpus contains very partisan perspectives; individual articles may contain disturbing language and viewpoints. In the initial code example below, you will need to have downloaded the data set and have created a database called articles.
We’ll begin our investigation of natural language processing by using Aylien, which bill itself as a “News Intelligence Platform,” to classify these articles, analyze their topics, identify the people, places, things they discuss, and to discern the sentiment or tone of the articles. If you would like to follow along, please sign up for a free API key.

  
## hashtag.xqy
let $appid := "###"
let $key := "###"
let $endpoint := "https://api.aylien.com/api/v1/"
let $service := "hashtags"
let $text := fn:encode-for-uri("Are you serious? Do you think anyone cares about that crazy plan? Get back to Arizona!")
let $request :=
  <http:request method="get" href="{$endpoint || $service || '?text=' || $text}">
    <http:header name="Accept" value="text/xml"/>
    <http:header name="X-AYLIEN-TextAPI-Application-Key" value="{$key}"/>
    <http:header name="X-AYLIEN-TextAPI-Application-ID" value="{$appid}"/>

## seriesTitle.xq
xquery version "3.1";

declare function local:seriesTitle($Network as xs:string?) as xs:string
  {
  switch ($Network)
    case "ABC" return "ABC World News Tonight"
    case "CBS" return "CBS Evening News"
    case "NBC" return "NBC Nightly News"
    default    return "unknown network"
 };

## 00-introduction.md

      
              11 files
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                CliffordAnderson
                / 00-introduction.md
            
            
              Last active
              May 20, 2020 23:54
            
              
                Code snippets for XQuery Working Group (Text Mining at Scale)
              
          
    XQuery Working Group

XQuery and XPath Full Text 1.0

In this session, we will be exploring the XQuery and XPath Full Text 1.0 standard. Our goal is to take the records that we created during our prior class from the Victorian Women Writers Project and persist them to another database where we will analyze their contents for textual patterns.
The following exercises assume that you have loaded the documents from the Victorian Women Writers Project into a BaseX database. It is also assumed that you have named that database vwwp_tei.

  
## 0-data-source.md

      
              9 files
            
          
              1 fork
            
          
              0 comments
            
          
              0 stars
            
          
                CliffordAnderson
                / 0-data-source.md
            
            
              Last active
              May 20, 2020 23:54
            
          
    XQuery Working Group

Text Mining at Scale

In this session, we will extract poems from the Victorian Women Writers Project. The electronic editions of these documents are maintained in TEI P5 format on Github. You can also download a zip file of the entire corpus.
The following exercises assume that you have loaded the documents from the Victorian Women Writers Project into a BaseX database. It is also assumed that you have named that database vwwp_tei.

  
## start-query.xqy
declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
declare option output:method "csv";
declare option output:csv "header=yes, separator=comma";

<csv>
{
  for $doc in fn:collection("bpp-quarterly")
  let $fullText := $doc/Record/FullText/text()
  let $title := $doc/Record/Publication/Title/text()
  let $articleTitle := $doc/Record/RecordTitle/text()

## xpath.xqy
distinct-values(//self::element()[text()]/name())
/child::usc:uscDoc/child::usc:main/child::usc:title/(child::usc:heading|child::usc:num)
	for $doc in fn:collection("bpp-quarterly")//FullText[.//text() contains text {"jury", "law"} any using stemming]
	let $hits := ft:extract($doc[.//text() contains text {"jury", "law"} all using stemming])
	let $count := fn:count($hits//mark)
	let $record := fn:doc(fn:base-uri($doc))
	order by $count descending
	return <hit count="{$count}" url="{$record//URLDocView}" title="{$record//RecordTitle}">{$hits}</hit>
	let $data := <column name="Reporters">Williams, Brian\|Maceda, Jim\|Keith, Brian Williams,\|Thompson, Anne</column>
	let $contributor := $data[@name='Reporters']/text() => fn:tokenize("\\|")
	return $contributor
	let $appid := "###"
	let $key := "###"
	let $endpoint := "https://api.aylien.com/api/v1/"
	let $service := "hashtags"
	let $text := fn:encode-for-uri("Are you serious? Do you think anyone cares about that crazy plan? Get back to Arizona!")
	let $request :=
	<http:request method="get" href="{$endpoint \|\| $service \|\| '?text=' \|\| $text}">
	<http:header name="Accept" value="text/xml"/>
	<http:header name="X-AYLIEN-TextAPI-Application-Key" value="{$key}"/>
	<http:header name="X-AYLIEN-TextAPI-Application-ID" value="{$appid}"/>
	xquery version "3.1";

	declare function local:seriesTitle($Network as xs:string?) as xs:string
	{
	switch ($Network)
	case "ABC" return "ABC World News Tonight"
	case "CBS" return "CBS Evening News"
	case "NBC" return "NBC Nightly News"
	default return "unknown network"
	};
	declare namespace output = "http://www.w3.org/2010/xslt-xquery-serialization";
	declare option output:method "csv";
	declare option output:csv "header=yes, separator=comma";

	<csv>
	{
	for $doc in fn:collection("bpp-quarterly")
	let $fullText := $doc/Record/FullText/text()
	let $title := $doc/Record/Publication/Title/text()
	let $articleTitle := $doc/Record/RecordTitle/text()
	distinct-values(//self::element()[text()]/name())
	/child::usc:uscDoc/child::usc:main/child::usc:title/(child::usc:heading\|child::usc:num)