drjwbaker/116 Metadata for Electronic Resources notes

## 116 Metadata for Electronic Resources notes
<h>What is Metadata?</h>

Data about data.
But not a very helpful definition.
Better: data that describes content, format or atributes of an information resource. Structured data has metadata.

Metadata can refer to a standard, such as Dublin Core, or a data element (Author/Creator) or the data contained in a specific field.

Metadata describes in information object, not a can of soup.

Metadata can be searched quickly to tell you things about large datasets.

Types:
* Descriptive metadata (for searching)
* Strucutral metadata (how a resource was put together, eg how pages are ordered to form chapters)
* Administrative metadata (file type, how created, security, circulation data)

Purposes:
* Description
* Retrieval (Dublin Core focused on this)
* Management
* Rights/ownership/licences
* Interoperabiity


<h>Dublin Core</h>

Consensus drive standards.
Aim to be as simple as possible for discoverability on the internet (designed for webpages).
Often generated automatically by a CMS.

Content in fields should be tailored to the audience you expect to use the metadat.
So, for 'Coverage' of an exhibition it could use date of exhibit, place of exhibit, and/or date exhibit about..
..depending on whether metadata aimed an internal or external audience.
Identifier category must be unique (eg, URL, DOI, ISBN)

Refining and extended possible. Eg date.created, date.revised
Can define your own, but in the interests of interoperability it is better these are kept the same.
Extensions such as eGMS used by government for discoverability of resources.

Dublin Core is a permissive standard: you can put what you like within the container.

<h>XML/RDF</h>
How the terminology used in those areas relate to metadata standards.

Document Type Definitions (DTDs)
A means of expressing a metadata standard.
But more widely used now are XML Schemeas (W3C)
More flexible. Refers to locations of definitions of components of the schema.
RelaxNG: an alternative that is more human readable (and yet still readable by machines).

XML
Repeat fields as needed, with the ability to define a minimum number required for a field.
'Namespace' refers to schemas or standards.

RDF
Standard of data interchange on the web.
Structure allows for merging of data.
One thing related to another by a relationship (what is called triple).
Eg:
Resource - Hound of the Baskervilles
Element/Property - Creator
Value/Content - Author Catalogue ID 0001
Recursive system, so the value here could be a resource, linked to elements (name/lifespan) and Values (Conan Doyle, 1859-1930)
[So, in essence, RDF builds a network for structures relationships]


<h>Metadata Standards</h>
Library materials (MARC, RDA)
Archives and manuscripts (EAD, EAC)
Records management (use international standards)
Digised and surrogate material (MODS)
Web Resources (Dublin Core, eGMS)
Learning Objects (LOM)
Born Digital Material (MODS, PREMIS)

Ingest...
..Point at which metadata is created (though some metadata likely to be included at the point of creation).
..Review purpose of the metadata.
..Adapt schema for specific local or subject specific requirements.

MODS
Metadata Object Description Schema
Basd on MARC
Richer than Dublin Core
Used for sharing metadata.
There is also a MODS lite which corresponds to Dublic Core.
METS is the container within which more specific metadata, such as as MODS, can go.

Processing MODS into DC requires us to make decisions. Example of challenging fit:
http://lcweb2.loc.gov/diglib/ihas/loc.natlib.ihas.200031106/mods.xml


<h>EAD/EAC-CPF</h>

Bill is an archivist, not a librarian.

Encoded Archival Description
Encoded Archival Context-Corporate Bodies, Persons, Families

Because archives are format independent, cataloguing them works differently to with books.
Hierarchical structure, embedded tree.
We also have to describe those who were responsible for creating the archives...
..what affairs they were carrying out in order to create the material.
Archival records include to main things. Description of:
*the archival material
*the creating entities [ergo, people]

EAD and EAC-CPF are the XML standards for describing archives and their creators.

EAD used worldwide since 1993, particularly by federated services (archives hub).
You can catalogue directly to EAD XML.
Main use though for transmitting metadata in XML, for sharing.

EAC-CPF less used (only published in 2010)
Potential beyond the archival world as it describes people and their relationship to other objects.
(well used at http://trove.nla.gov.au/people?q=)

At the BL... EAD being wrapped up in METS files to tramsform IAMS data to the Qatar project.


<h>METS</h>

Metadata Encoding and Transmission Standard.
Container you can use for more specific metadata.
Structural and technical aspects: descriptive material but also relational data.
Maintained by LoC (like MODS).
7 sections:
1) METS Header > info about the resource you are describing
2) Descriptive Metadata > MODS/MARC type data
3) Administrative Metdata > IP, provenance.
4) File Section > components that make up a resource.
5) Structural Map > eg, chapters
6) Structural Links > links between, for eg, chapters.
7) Behaviour > such as executable behaviour

METS allows you to use MODS, DC et al side by side in one wrapper > if interoperability with certain systems required.
Big container which holds other containers (whose fields are themselves containers)


<h>Encoding Schemes</h>

Familiar to those with a cataloguing background; authority lists/files, taxonomies, controlled vocabularies.
VIAF (Virtual International Authority File)
Metadata Authority Description Standard (MADS) > maintained by LoC

Vocabulary Encoding Scheme. Eg:
Encoding Schemes appropriate for a particular context (for example, ePrints Type Vocabulary Encoding Scheme for IRs.
Syntax Encoding Scheme. Eg:
Cataloguing rule: eg SURNAME, INITIAL.INITIAL.

Both powerful tools for creating good quality metadata.
One thing to stick to a standard, another to create high quality metadata that improves discoverability.

Evaluating controlled vocabs: think, up to date? authoritative? detail? flexible? availability? compatible wtih you?


<h>Access</h>

Who are your users? And why are they using your collection?
How do they retrieve information for your systems? [do we need metadata for big data collections? (eg MS books)]
	<h>What is Metadata?</h>

	Data about data.
	But not a very helpful definition.
	Better: data that describes content, format or atributes of an information resource. Structured data has metadata.

	Metadata can refer to a standard, such as Dublin Core, or a data element (Author/Creator) or the data contained in a specific field.

	Metadata describes in information object, not a can of soup.

	Metadata can be searched quickly to tell you things about large datasets.

	Types:
	* Descriptive metadata (for searching)
	* Strucutral metadata (how a resource was put together, eg how pages are ordered to form chapters)
	* Administrative metadata (file type, how created, security, circulation data)

	Purposes:
	* Description
	* Retrieval (Dublin Core focused on this)
	* Management
	* Rights/ownership/licences
	* Interoperabiity


	<h>Dublin Core</h>

	Consensus drive standards.
	Aim to be as simple as possible for discoverability on the internet (designed for webpages).
	Often generated automatically by a CMS.

	Content in fields should be tailored to the audience you expect to use the metadat.
	So, for 'Coverage' of an exhibition it could use date of exhibit, place of exhibit, and/or date exhibit about..
	..depending on whether metadata aimed an internal or external audience.
	Identifier category must be unique (eg, URL, DOI, ISBN)

	Refining and extended possible. Eg date.created, date.revised
	Can define your own, but in the interests of interoperability it is better these are kept the same.
	Extensions such as eGMS used by government for discoverability of resources.

	Dublin Core is a permissive standard: you can put what you like within the container.

	<h>XML/RDF</h>
	How the terminology used in those areas relate to metadata standards.

	Document Type Definitions (DTDs)
	A means of expressing a metadata standard.
	But more widely used now are XML Schemeas (W3C)
	More flexible. Refers to locations of definitions of components of the schema.
	RelaxNG: an alternative that is more human readable (and yet still readable by machines).

	XML
	Repeat fields as needed, with the ability to define a minimum number required for a field.
	'Namespace' refers to schemas or standards.

	RDF
	Standard of data interchange on the web.
	Structure allows for merging of data.
	One thing related to another by a relationship (what is called triple).
	Eg:
	Resource - Hound of the Baskervilles
	Element/Property - Creator
	Value/Content - Author Catalogue ID 0001
	Recursive system, so the value here could be a resource, linked to elements (name/lifespan) and Values (Conan Doyle, 1859-1930)
	[So, in essence, RDF builds a network for structures relationships]


	<h>Metadata Standards</h>
	Library materials (MARC, RDA)
	Archives and manuscripts (EAD, EAC)
	Records management (use international standards)
	Digised and surrogate material (MODS)
	Web Resources (Dublin Core, eGMS)
	Learning Objects (LOM)
	Born Digital Material (MODS, PREMIS)

	Ingest...
	..Point at which metadata is created (though some metadata likely to be included at the point of creation).
	..Review purpose of the metadata.
	..Adapt schema for specific local or subject specific requirements.

	MODS
	Metadata Object Description Schema
	Basd on MARC
	Richer than Dublin Core
	Used for sharing metadata.
	There is also a MODS lite which corresponds to Dublic Core.
	METS is the container within which more specific metadata, such as as MODS, can go.

	Processing MODS into DC requires us to make decisions. Example of challenging fit:
	http://lcweb2.loc.gov/diglib/ihas/loc.natlib.ihas.200031106/mods.xml


	<h>EAD/EAC-CPF</h>

	Bill is an archivist, not a librarian.

	Encoded Archival Description
	Encoded Archival Context-Corporate Bodies, Persons, Families

	Because archives are format independent, cataloguing them works differently to with books.
	Hierarchical structure, embedded tree.
	We also have to describe those who were responsible for creating the archives...
	..what affairs they were carrying out in order to create the material.
	Archival records include to main things. Description of:
	*the archival material
	*the creating entities [ergo, people]

	EAD and EAC-CPF are the XML standards for describing archives and their creators.

	EAD used worldwide since 1993, particularly by federated services (archives hub).
	You can catalogue directly to EAD XML.
	Main use though for transmitting metadata in XML, for sharing.

	EAC-CPF less used (only published in 2010)
	Potential beyond the archival world as it describes people and their relationship to other objects.
	(well used at http://trove.nla.gov.au/people?q=)

	At the BL... EAD being wrapped up in METS files to tramsform IAMS data to the Qatar project.


	<h>METS</h>

	Metadata Encoding and Transmission Standard.
	Container you can use for more specific metadata.
	Structural and technical aspects: descriptive material but also relational data.
	Maintained by LoC (like MODS).
	7 sections:
	1) METS Header > info about the resource you are describing
	2) Descriptive Metadata > MODS/MARC type data
	3) Administrative Metdata > IP, provenance.
	4) File Section > components that make up a resource.
	5) Structural Map > eg, chapters
	6) Structural Links > links between, for eg, chapters.
	7) Behaviour > such as executable behaviour

	METS allows you to use MODS, DC et al side by side in one wrapper > if interoperability with certain systems required.
	Big container which holds other containers (whose fields are themselves containers)


	<h>Encoding Schemes</h>

	Familiar to those with a cataloguing background; authority lists/files, taxonomies, controlled vocabularies.
	VIAF (Virtual International Authority File)
	Metadata Authority Description Standard (MADS) > maintained by LoC

	Vocabulary Encoding Scheme. Eg:
	Encoding Schemes appropriate for a particular context (for example, ePrints Type Vocabulary Encoding Scheme for IRs.
	Syntax Encoding Scheme. Eg:
	Cataloguing rule: eg SURNAME, INITIAL.INITIAL.

	Both powerful tools for creating good quality metadata.
	One thing to stick to a standard, another to create high quality metadata that improves discoverability.

	Evaluating controlled vocabs: think, up to date? authoritative? detail? flexible? availability? compatible wtih you?


	<h>Access</h>

	Who are your users? And why are they using your collection?
	How do they retrieve information for your systems? [do we need metadata for big data collections? (eg MS books)]