kspurgin/MARC_upsides.txt

## MARC_upsides.txt
"MARC is concise as a physical format (something that is less important today)"
^--- I know it used to be a lot MORE important, this doesn't feel unimportant when I'm extracting 6.5 million records from one system and transferring them to another server! (And I'm aware there are more concise ways to express data that we often see expressed in XML)

-=-=-=-=-=-=-=-=-=-
Ease of transforming/exporting/analyzing
-=-=-=-=-=-=-=-=-=-
I work with big batches of MARC (in MARC-binary and MARC-XML) and other metadata formats (DC, EAD, DDI, MODS, Oracle Endeca format for indexed data) expressed in XML.

I whip up XSLT to do stuff when I have to. But Terry Reese gave us MARCedit which makes it so easy to do fairly complex transformations across a set of MARC records, or just get an overview of what fields are in a record set and with what frequency.

Perhaps tools to do this for XML data exist, but I have not had luck finding them. What MARC records don't have a 26X $c? That is a snap to find with free tools ready to hand. What MODS records are lacking "mods:dateIssued"? Not so fast and easy...

Of course, tools develop to meet the needs at hand.

It's super easy to export MARC fields/subfields to a spreadsheet/delimited text format. This is partly because of its flatness.

All the nesting in our XML metadata makes it a nightmare to deal with, which may say more about how our metadata evolved than anything specifically about the formats, but just trying to separate the ETDs described as composite objects (with data and other attachments) vs. the ones that are just the thesis basically requires a programming project.

-=-=-=-=-=-=-=-=-=-
Field structure/semantics = shorthands for processing
-=-=-=-=-=-=-=-=-=-
I write a lot of code to process MARC. It is a beautiful thing to be able to grab all LCSH headings, regardless of subject, name, geog, chron type, with a quick:

tag =~ /6../ && ( i2 == '0' || (i2 == '7' && field_string.include?('$2 lcsh')))

Or to know that certain subfields of 111, 711, 811 can get processed with the same method because they are defined the same as meeting names.

I've heard a lot of complaining from developers about how terrible MARC is, but honestly if you understand it, it gives you a lot of shortcuts for working with the data with code.
	"MARC is concise as a physical format (something that is less important today)"
	^--- I know it used to be a lot MORE important, this doesn't feel unimportant when I'm extracting 6.5 million records from one system and transferring them to another server! (And I'm aware there are more concise ways to express data that we often see expressed in XML)

	-=-=-=-=-=-=-=-=-=-
	Ease of transforming/exporting/analyzing
	-=-=-=-=-=-=-=-=-=-
	I work with big batches of MARC (in MARC-binary and MARC-XML) and other metadata formats (DC, EAD, DDI, MODS, Oracle Endeca format for indexed data) expressed in XML.

	I whip up XSLT to do stuff when I have to. But Terry Reese gave us MARCedit which makes it so easy to do fairly complex transformations across a set of MARC records, or just get an overview of what fields are in a record set and with what frequency.

	Perhaps tools to do this for XML data exist, but I have not had luck finding them. What MARC records don't have a 26X $c? That is a snap to find with free tools ready to hand. What MODS records are lacking "mods:dateIssued"? Not so fast and easy...

	Of course, tools develop to meet the needs at hand.

	It's super easy to export MARC fields/subfields to a spreadsheet/delimited text format. This is partly because of its flatness.

	All the nesting in our XML metadata makes it a nightmare to deal with, which may say more about how our metadata evolved than anything specifically about the formats, but just trying to separate the ETDs described as composite objects (with data and other attachments) vs. the ones that are just the thesis basically requires a programming project.

	-=-=-=-=-=-=-=-=-=-
	Field structure/semantics = shorthands for processing
	-=-=-=-=-=-=-=-=-=-
	I write a lot of code to process MARC. It is a beautiful thing to be able to grab all LCSH headings, regardless of subject, name, geog, chron type, with a quick:

	tag =~ /6../ && ( i2 == '0' \|\| (i2 == '7' && field_string.include?('$2 lcsh')))

	Or to know that certain subfields of 111, 711, 811 can get processed with the same method because they are defined the same as meeting names.

	I've heard a lot of complaining from developers about how terrible MARC is, but honestly if you understand it, it gives you a lot of shortcuts for working with the data with code.