Skip to content

Instantly share code, notes, and snippets.

View Beck-Davis's full-sized avatar

Beck Davis Beck-Davis

View GitHub Profile
@Beck-Davis
Beck-Davis / 653_lc_terms_matching.md
Last active June 3, 2024 18:53
Matching 653 with LC terms

These scripts are intended to find exact matches to determine if LC terms can be added to new 6xx fields programmatically. The codes use adapted methods from pulibrary/authority_control. The methods strip subject headings in LC authority MARC records and 653 fields from bibliographic MARC records down to simple strings. It also compares the strings for exact matches.

The methods get_heading_from_authority_field and normalize_heading_for_local_search borrowed from pulibrary/authority_control.

Hash authority_hash is created with the normalized forms of the headings from the authority records.

Array unwanted_terms is created with a txt list of terms in the 653 fields that should be ignored during matching

While iterating over the 653s in each record, the original field value is stored to be written out

###This code is most recent for ISO 639-3 project and has been the most effective
###Uses a text list of terms that might appear in the field 546 that will
###make a record ineligible for new 041$a fields with ISO 639-3 codes
require_relative './../lib/marc_cleanup'
require 'nokogiri'
require 'set'
require 'pry'
name_to_code = {} #New hash containing languages as keys and ISO codes as values
require_relative './../lib/marc_cleanup'
def duplicate_fields?(record) #creates an array with each unique field value converted to a string
field_array = []
record.fields.each do |field|
return true if field_array.include?(field.to_s)
field_array << field.to_s
end
false
end
require_relative './../lib/marc_cleanup'
ISBN13PREFIX = '978'.freeze
def contains_020error_indicator?(field) #regular expression looking for specified phrase
field['c'] =~ /Invalid data in 1st \$a in .* 020/
end
def field_020_error?(record) #check to see if 020$a has non numerical or alphabetical characters
record.fields('020').any? { |field| (field['a'].to_s =~ /^\s*([^\s]+)\s+(\(.*?\))\s*$/) } #regex, block returns true if there are any characters other than 0-9 or X