Skip to content

Instantly share code, notes, and snippets.

@davidamichelson
Created April 7, 2020 18:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save davidamichelson/d6f8a727313026a79c10fe390bab05cd to your computer and use it in GitHub Desktop.
Save davidamichelson/d6f8a727313026a79c10fe390bab05cd to your computer and use it in GitHub Desktop.
Data To Concatenate
Headword,Syriaca_URI,Map_1,Map_2,Map_3,Map_4,Map_5,Map_6,Map_7,Map_8,Map_9,Map_10,Map_11,Map_12,Map_13,Map_14
Abila,syriaca.org/place/4355,1,,,4,,,,,,,,,,
Abivard,syriaca.org/place/4222,,,,,,,,,9,,,,,
Abr Shahr,syriaca.org/place/4220,,,,,,,,,9,10,,,,
Acre|Akko,syriaca.org/place/14,1,2,,,5,,,,,,,,,
Headword,Syriaca_URI,Map_1,Map_2,Map_3,Map_4,Map_5,Map_6,Map_7,Map_8,Map_9,Map_10,Map_11,Map_12,Map_13,Map_14
Abila,syriaca.org/place/4355,1,,,4,,,,,,,,,,
Abivard,syriaca.org/place/4222,,,,,,,,,9,,,,,
Abr Shahr,syriaca.org/place/4220,,,,,,,,,9,10,,,,
Acre,syriaca.org/place/14,,2,,,5,,,,,,,,,
Akko,syriaca.org/place/14,1,,,,5,,,,,,,,,
@davidamichelson
Copy link
Author

davidamichelson commented Apr 7, 2020

We have two goals, we want the script to concatenate and merge headwords (Acre|Akko) based on URI. We also want to merge the Map numbers (1,2,,,5,,,,,,,,,) based on URI. The map numbers should not duplicate.

The end goal is further transformation in XML so if the result is in XML that is fine. In that case the concatenation could look like:
<headword><label>Acre</label><label>Akko</label></headword> rather than a concatenation into one element.

@CliffordAnderson
Copy link

CliffordAnderson commented Apr 7, 2020

Hi Dave,

Here's a query expression that should do the trick of merging rows with the same identifiers:

xquery version "3.1";

(: Merges content from rows with the same identifiers :)

let $uri := "https://gist.githubusercontent.com/CliffordAnderson/592c2c43870a9ef3335d26714e51b38a/raw/2b1c467d2bafdbba13e0a1dd91abafebfd2e7722/Original.csv"
let $csv := fetch:text($uri) => csv:parse(map{"header":fn:true()})
let $ids := fn:distinct-values($csv/csv/record/Syriaca_URI)
let $rows :=  fn:distinct-values($csv/csv/record/* ! fn:node-name(.))
for $id in $ids
return 
  element record {
    for $cell in $rows
    return element {$cell} {
      let $contents := $csv/csv/record[Syriaca_URI = $id]/*[fn:name(.) = fn:string($cell)]/text()
      return
        if (count($contents) ne count(fn:distinct-values($contents)))
        then $contents[1]
        else fn:string-join($contents, "|")
      }
  }

Output:

<record>
  <Headword>Abila</Headword>
  <Syriaca_URI>syriaca.org/place/4355</Syriaca_URI>
  <Map_1>1</Map_1>
  <Map_2/>
  <Map_3/>
  <Map_4>4</Map_4>
  <Map_5/>
  <Map_6/>
  <Map_7/>
  <Map_8/>
  <Map_9/>
  <Map_10/>
  <Map_11/>
  <Map_12/>
  <Map_13/>
  <Map_14/>
</record>
<record>
  <Headword>Abivard</Headword>
  <Syriaca_URI>syriaca.org/place/4222</Syriaca_URI>
  <Map_1/>
  <Map_2/>
  <Map_3/>
  <Map_4/>
  <Map_5/>
  <Map_6/>
  <Map_7/>
  <Map_8/>
  <Map_9>9</Map_9>
  <Map_10/>
  <Map_11/>
  <Map_12/>
  <Map_13/>
  <Map_14/>
</record>
<record>
  <Headword>Abr Shahr</Headword>
  <Syriaca_URI>syriaca.org/place/4220</Syriaca_URI>
  <Map_1/>
  <Map_2/>
  <Map_3/>
  <Map_4/>
  <Map_5/>
  <Map_6/>
  <Map_7/>
  <Map_8/>
  <Map_9>9</Map_9>
  <Map_10>10</Map_10>
  <Map_11/>
  <Map_12/>
  <Map_13/>
  <Map_14/>
</record>
<record>
  <Headword>Acre|Akko</Headword>
  <Syriaca_URI>syriaca.org/place/14</Syriaca_URI>
  <Map_1>1</Map_1>
  <Map_2>2</Map_2>
  <Map_3/>
  <Map_4/>
  <Map_5>5</Map_5>
  <Map_6/>
  <Map_7/>
  <Map_8/>
  <Map_9/>
  <Map_10/>
  <Map_11/>
  <Map_12/>
  <Map_13/>
  <Map_14/>
</record>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment