Skip to content

Instantly share code, notes, and snippets.

@VladimirAlexiev
Created January 21, 2017 15:12
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save VladimirAlexiev/e51e256be18870ac5033901197ee8277 to your computer and use it in GitHub Desktop.
Save VladimirAlexiev/e51e256be18870ac5033901197ee8277 to your computer and use it in GitHub Desktop.
How to add Museum Identifiers to Wikidata

How to Add Museum IDs to Wikidata

17 January 2017 I get this nice reminder by Wikidata user Pasleim: Unused properties This is a kind reminder that the following properties were created more than six months ago: Artsy gene (P2411), J. Paul Getty Museum object id (P2582), dataset distribution (P2702). As of today, these properties are used on less than five items. As the proposer of these properties you probably want to change the unfortunate situation by adding a few statements to items.

I went ahead and added a few Getty Museum IDs by hand. Then I looked more closely at the available data and it hit me: let's do some SPARQL queries to generate the missing data.

Find J. Paul Getty Museum objects

First, there are 471 objects in collection J. Paul Getty Museum (JPGM):

SELECT (count(*) as ?c) {
  ?item wdt:P195 wd:Q731126
}

Don't be scared by the P and Q numbers: the Wikidata query service has two nice features to explain them:

  • mouse-over over the P and Q identifiers shows what they mean
  • the "query explanation" box on the right displays "collection J. Paul Getty Museum"

Note: Consider that JPGM has some 800k objects: 471 in Wikidata is very few, so a Data Donation from JPGM would be very much appreciated. See WikiProject_Cultural_heritage and Partnerships_and_data_imports.

Get the IDs

Most of these objects have a property "described at URL" that include the object ID I seek, eg: http://www.getty.edu/art/collection/objects/103928/adelaide-labille-guiard-head-of-a-young-woman-french-1779/.

I can get the ID with the following query

SELECT ?item ?id {
  ?item wdt:P195 wd:Q731126; wdt:P973 ?url.
  filter not exists {?item wdt:P2582 ?objId}
  bind(strbefore(strafter(str(?url),"http://www.getty.edu/art/collection/objects/"),"/") as ?id)
}

This can be decoded as follows:

  • find ?item in collection J. Paul Getty Museum
  • fetch its "described at" ?url
  • strafter() discards the common website prefix http://www.getty.edu/art/collection/objects/
  • strbefore() discards the human-readable page suffix eg /adelaide-labille-guiard-head-of-a-young-woman-french-1779/
  • which leaves only the ?id I need 103928

I further noticed that:

Insert the IDs in Wikidata

I'll use the Quick Statements tool by Magnus "the magnificent" Manske. I need to create a simple text file in format item <tab> prop <tab> value. Since P2582 is a string but the values are purely numeric, I need to surround them with quotes to indicate a string, eg:

Q20181112	P2582	"103928"

So I modify the query as follows:

SELECT ?itm ?prop ?idStr {
  ?item wdt:P195 wd:Q731126; wdt:P973 ?url.
  filter not exists {?item wdt:P2582 ?objId}
  bind(strafter(str(?item),"http://www.wikidata.org/entity/") as ?itm)
  bind("P2582" as ?prop)
  bind(strbefore(strafter(str(?url),"http://www.getty.edu/art/collection/objects/"),"/") as ?id1)
  bind(strafter(str(?url),"http://www.getty.edu/art/gettyguide/artObjectDetails?artobj=") as ?id2)
  bind(if(?id1="",?id2,?id1) as ?id)
  filter(?id != "")
  bind(concat('"',?id,'"') as ?idStr)
}
  • remove the common prefix http://www.wikidata.org/entity/ of ?item
  • return the constant string P2582 as property name
  • match/remove one form of Getty URLs as ?id1, and the other form as ?id2
  • "coalesce" either ?id1 or ?id2 (the first one that is non-empty) into ?id. (There is a SPARQL function coalesce() but it checks whether the var is bound, not if it's empty)
  • remove rows where ?id is empty (thes rows correspond to URLs that match neither Getty pattern)
  • surround the numeric ?id in quotes to indicate it's a string ?idStr

Download the result as Simple TSV (don't want each data cell surrounded by extra quotes).

All Wikidata updates are recorded with the user who made them, no matter what tool he used. So you need to register with WiDaR if you haven't yet. Finally, I go to Quick Statements, paste all but the header line and press Do It.

And voila! I've added 438 "J. Paul Getty Museum object IDs" to Wikidata.

15:10, 21 January 2017 (diff | hist) . . (+344)‎ . . Landscape with Classical Ruins and Figures (Q20178204)
‎ (‎Created claim: J. Paul Getty Museum object id (P2582): 559, #quickstatements) (current) (Tag: Widar [1.4])
15:10, 21 January 2017 (diff | hist) . . (+345)‎ . . Still Life with Game, Vegetables, Fruit and a Cockatoo (Q20178199)
‎ (‎Created claim: J. Paul Getty Museum object id (P2582): 535, #quickstatements) (current) (Tag: Widar [1.4])

For a more detailed explanation of a similar technique, see Editing_Data_in_Spreadsheet_Mode

itm prop idStr
Q20181066 P2582 245361
Q20181182 P2582 878
Q20184662 P2582 717
Q20184700 P2582 801
Q20184753 P2582 240287
Q20184774 P2582 616
Q20184808 P2582 623
Q20184820 P2582 692
Q20184843 P2582 700
Q20184860 P2582 705
Q20184876 P2582 685
Q20184895 P2582 697
Q20184905 P2582 767
Q20184937 P2582 239595
Q20184950 P2582 522
Q20184964 P2582 534
Q20184982 P2582 541
Q20184994 P2582 546
Q20185009 P2582 527
Q20185027 P2582 539
Q20185050 P2582 654
Q20185069 P2582 635
Q20185092 P2582 642
Q20185111 P2582 743
Q20185129 P2582 736
Q20185150 P2582 731
Q20185166 P2582 748
Q20185189 P2582 818
Q20185214 P2582 844
Q20185228 P2582 813
Q20185242 P2582 820
Q20185262 P2582 851
Q20185275 P2582 225119
Q20185296 P2582 875
Q20185339 P2582 221225
Q20185359 P2582 577
Q20185380 P2582 591
Q20185393 P2582 589
Q20185415 P2582 584
Q20185434 P2582 596
Q20185454 P2582 604
Q20185466 P2582 609
Q20185488 P2582 762
Q20185498 P2582 755
Q20185510 P2582 899
Q20185518 P2582 244695
Q20185530 P2582 254442
Q20185543 P2582 129325
Q20185552 P2582 225594
Q20185562 P2582 863
Q20185576 P2582 729
Q20185587 P2582 560
Q20185598 P2582 630
Q20185609 P2582 681
Q20185622 P2582 720
Q20185632 P2582 775
Q20185641 P2582 198580
Q20185646 P2582 245502
Q20185649 P2582 263439
Q20185652 P2582 198579
Q20185655 P2582 198581
Q20185658 P2582 682
Q20392035 P2582 42
Q21407867 P2582 235047
Q23901829 P2582 133975
Q23920940 P2582 776
Q23920981 P2582 245501
Q23921029 P2582 855
Q23921051 P2582 856
Q3203172 P2582 811
Q3208973 P2582 310277
Q3212521 P2582 825
Q3230385 P2582 806
Q3605538 P2582 781
Q3605567 P2582 643
Q3607621 P2582 824
Q3612418 P2582 943
Q3649290 P2582 686
Q3797773 P2582 646
Q3823474 P2582 823
Q3842508 P2582 647
Q3892936 P2582 707
Q3937665 P2582 789
Q3948700 P2582 683
Q4009400 P2582 881
Q4009569 P2582 846
Q6888851 P2582 251762
Q7960534 P2582 710
Q11076793 P2582 548
Q11815524 P2582 809
Q12899367 P2582 247767
Q14940737 P2582 628
Q15992486 P2582 617
Q16038511 P2582 777
Q16687283 P2582 104858
Q16741678 P2582 563
Q16839893 P2582 225593
Q17341978 P2582 719
Q17459905 P2582 106011
Q17519490 P2582 706
Q17591169 P2582 265936
Q18043298 P2582 902
Q18043307 P2582 779
Q18177370 P2582 741
Q18177373 P2582 134031
Q18177374 P2582 740
Q18209775 P2582 750
Q18602622 P2582 1046
Q18620351 P2582 268843
Q18627387 P2582 68569
Q18627390 P2582 68541
Q18627391 P2582 68557
Q18627447 P2582 68417
Q18627849 P2582 44795
Q18685319 P2582 102383
Q18687903 P2582 808
Q19015418 P2582 600
Q19609415 P2582 225586
Q19609453 P2582 798
Q19660379 P2582 567
Q19681606 P2582 843
Q19693208 P2582 796
Q19758850 P2582 225690
Q19758888 P2582 225691
Q19759372 P2582 225692
Q19863044 P2582 821
Q19935926 P2582 734
Q19947166 P2582 691
Q19953571 P2582 690
Q19989799 P2582 733
Q20017925 P2582 233693
Q20019881 P2582 133228
Q20019889 P2582 133227
Q20082059 P2582 786
Q20087445 P2582 651
Q20087469 P2582 670
Q20088065 P2582 223011
Q20178042 P2582 221229
Q20178048 P2582 561
Q20181245 P2582 676
Q20181314 P2582 537
Q20181364 P2582 533
Q20181398 P2582 521
Q20181430 P2582 538
Q20181478 P2582 519
Q20181517 P2582 540
Q20181560 P2582 526
Q20181600 P2582 595
Q20181625 P2582 583
Q20181666 P2582 590
Q20181716 P2582 588
Q20181778 P2582 639
Q20181814 P2582 641
Q20181854 P2582 653
Q20181896 P2582 658
Q20181919 P2582 709
Q20181956 P2582 716
Q20181996 P2582 723
Q20182042 P2582 711
Q20182081 P2582 817
Q20182142 P2582 812
Q20182178 P2582 901
Q20182230 P2582 879
Q20182269 P2582 850
Q20182297 P2582 867
Q20182366 P2582 494
Q20182407 P2582 552
Q20182459 P2582 545
Q20182501 P2582 557
Q20182532 P2582 564
Q20182577 P2582 622
Q20182621 P2582 627
Q20182670 P2582 634
Q20182721 P2582 704
Q20182753 P2582 689
Q20182810 P2582 696
Q20182834 P2582 735
Q20182871 P2582 730
Q20182898 P2582 728
Q20182930 P2582 778
Q20182958 P2582 766
Q20182992 P2582 761
Q20183023 P2582 773
Q20183048 P2582 759
Q20183082 P2582 780
Q20183106 P2582 226156
Q20183144 P2582 226430
Q20183180 P2582 223049
Q20183206 P2582 571
Q20183244 P2582 576
Q20183313 P2582 569
Q20183355 P2582 615
Q20183389 P2582 610
Q20183421 P2582 603
Q20183443 P2582 608
Q20183469 P2582 677
Q20183486 P2582 660
Q20183508 P2582 672
Q20183524 P2582 684
Q20183541 P2582 665
Q20183563 P2582 754
Q20183581 P2582 747
Q20183607 P2582 742
Q20183642 P2582 797
Q20183676 P2582 792
Q20183718 P2582 805
Q20183750 P2582 800
Q20183782 P2582 785
Q20183820 P2582 225595
Q20183867 P2582 258736
Q20183919 P2582 127289
Q20183964 P2582 128631
Q20183990 P2582 126642
Q20184013 P2582 130326
Q20184030 P2582 130756
Q20184067 P2582 131420
Q20184101 P2582 132182
Q20184139 P2582 219656
Q20184167 P2582 221226
Q20184209 P2582 125556
Q20184231 P2582 127970
Q20184253 P2582 128454
Q20184277 P2582 131471
Q20184308 P2582 132392
Q20184330 P2582 105375
Q20184360 P2582 106462
Q20184389 P2582 258867
Q20184401 P2582 220342
Q20184420 P2582 553
Q20184438 P2582 558
Q20184456 P2582 565
Q20184476 P2582 572
Q20184501 P2582 673
Q20184523 P2582 661
Q20184549 P2582 678
Q20184568 P2582 659
Q20184600 P2582 666
Q20184617 P2582 724
Q20184643 P2582 712
Q20178058 P2582 566
Q20178062 P2582 580
Q20178069 P2582 578
Q20178076 P2582 629
Q20178083 P2582 631
Q20178088 P2582 624
Q20178094 P2582 636
Q20178100 P2582 679
Q20178103 P2582 667
Q20178134 P2582 749
Q20178156 P2582 883
Q20178167 P2582 516
Q20178172 P2582 528
Q20178179 P2582 530
Q20178182 P2582 542
Q20178189 P2582 554
Q20178194 P2582 547
Q20178199 P2582 535
Q20178204 P2582 559
Q20178210 P2582 693
Q20178216 P2582 701
Q20178222 P2582 763
Q20178228 P2582 751
Q20178234 P2582 756
Q20178241 P2582 857
Q20178248 P2582 225120
Q20178255 P2582 852
Q20178261 P2582 585
Q20178265 P2582 597
Q20178292 P2582 592
Q20178332 P2582 605
Q20178344 P2582 648
Q20178356 P2582 650
Q20178370 P2582 655
Q20178387 P2582 725
Q20178397 P2582 713
Q20178407 P2582 782
Q20178419 P2582 770
Q20178426 P2582 768
Q20178436 P2582 799
Q20178446 P2582 787
Q20178455 P2582 794
Q20178470 P2582 242201
Q20178484 P2582 244132
Q20178501 P2582 250519
Q20178515 P2582 127301
Q20178523 P2582 129840
Q20178528 P2582 129922
Q20178534 P2582 130518
Q20178541 P2582 102125
Q20178547 P2582 102106
Q20178559 P2582 105080
Q20178571 P2582 106381
Q20178589 P2582 108894
Q20178606 P2582 174864
Q20178619 P2582 201125
Q20178634 P2582 258873
Q20178660 P2582 819
Q20178677 P2582 127267
Q20178687 P2582 224761
Q20178698 P2582 698
Q20178712 P2582 573
Q20178729 P2582 523
Q20178743 P2582 612
Q20178757 P2582 662
Q20178771 P2582 807
Q20178788 P2582 102380
Q20178817 P2582 221228
Q20178829 P2582 531
Q20178837 P2582 524
Q20178853 P2582 517
Q20178866 P2582 529
Q20178891 P2582 543
Q20178903 P2582 555
Q20178914 P2582 550
Q20178929 P2582 562
Q20178945 P2582 574
Q20178953 P2582 579
Q20178963 P2582 593
Q20178974 P2582 581
Q20178987 P2582 586
Q20179003 P2582 598
Q20179020 P2582 601
Q20179035 P2582 613
Q20179047 P2582 606
Q20179059 P2582 632
Q20179068 P2582 618
Q20179084 P2582 625
Q20179097 P2582 649
Q20179121 P2582 637
Q20179133 P2582 644
Q20179142 P2582 668
Q20179151 P2582 663
Q20179179 P2582 656
Q20179195 P2582 687
Q20179211 P2582 675
Q20179227 P2582 699
Q20179257 P2582 694
Q20179271 P2582 702
Q20179282 P2582 721
Q20179293 P2582 726
Q20179310 P2582 738
Q20179323 P2582 745
Q20179335 P2582 764
Q20179348 P2582 757
Q20179363 P2582 771
Q20179377 P2582 752
Q20179392 P2582 769
Q20179407 P2582 106010
Q20179421 P2582 105081
Q20179436 P2582 126955
Q20179460 P2582 130406
Q20179479 P2582 132394
Q20179493 P2582 132933
Q20179504 P2582 795
Q20179517 P2582 790
Q20179536 P2582 783
Q20179553 P2582 788
Q20179566 P2582 810
Q20179579 P2582 803
Q20179595 P2582 815
Q20179607 P2582 827
Q20179624 P2582 839
Q20179637 P2582 872
Q20179650 P2582 860
Q20179662 P2582 225528
Q20179671 P2582 240153
Q20179689 P2582 714
Q20179702 P2582 106426
Q20179717 P2582 620
Q20179732 P2582 104848
Q20179743 P2582 865
Q20179757 P2582 103264
Q20179770 P2582 105121
Q20179777 P2582 105912
Q20179787 P2582 240912
Q20179796 P2582 233804
Q20179805 P2582 129847
Q20179815 P2582 198568
Q20179825 P2582 127911
Q20179834 P2582 133627
Q20179847 P2582 220349
Q20179857 P2582 221227
Q20179866 P2582 544
Q20179876 P2582 532
Q20179884 P2582 525
Q20179892 P2582 575
Q20179900 P2582 582
Q20179906 P2582 587
Q20179916 P2582 570
Q20179930 P2582 626
Q20179947 P2582 640
Q20179963 P2582 633
Q20179976 P2582 638
Q20179995 P2582 688
Q20180013 P2582 671
Q20180027 P2582 669
Q20180042 P2582 715
Q20180050 P2582 722
Q20180061 P2582 727
Q20180077 P2582 784
Q20180102 P2582 791
Q20180111 P2582 900
Q20180125 P2582 866
Q20180141 P2582 518
Q20180157 P2582 520
Q20180173 P2582 551
Q20180192 P2582 549
Q20180207 P2582 563
Q20180221 P2582 568
Q20180235 P2582 556
Q20180251 P2582 594
Q20180268 P2582 602
Q20180291 P2582 599
Q20180307 P2582 645
Q20180328 P2582 657
Q20180350 P2582 652
Q20180371 P2582 664
Q20180398 P2582 753
Q20180426 P2582 739
Q20180458 P2582 746
Q20180494 P2582 816
Q20180555 P2582 804
Q20180589 P2582 223019
Q20180616 P2582 607
Q20180643 P2582 619
Q20180669 P2582 621
Q20180698 P2582 614
Q20180723 P2582 703
Q20180756 P2582 708
Q20180779 P2582 695
Q20180816 P2582 772
Q20180844 P2582 760
Q20180879 P2582 224721
Q20180916 P2582 765
Q20180965 P2582 758
Q20181028 P2582 859
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment