Skip to content

Instantly share code, notes, and snippets.

@padraic7a
Last active August 29, 2015 14:23
Show Gist options
  • Save padraic7a/7914dd512a1f1b96a264 to your computer and use it in GitHub Desktop.
Save padraic7a/7914dd512a1f1b96a264 to your computer and use it in GitHub Desktop.

#Paper Session 4: Policy, Practice, and Research Data (Hogan)

Chair: Jane Gray, Senior Lecturer, Sociology, Maynooth University

####**DRI as a Bilingual Digital Repository; Processes and Challenges; Turning Policy into Practice

Rosemary Coll, NUI Galway @rosemary_coll

DRI requirements - develop an Irish language interface, and be able to use metadata created as gaeilge.

Metadata:

RnaG collection was a DRI demonstrator collection.

Original metadata was as gaeilge, stored on an MS database. Decision taken to both translate this to English and to ingest in Irish. Records were enriched - to makes it more useful to the user / add detail.

Issues around specific Irish language conventions - i.e. variant spelling of surnames. Issues where people would ne known locally by surname and local names. This isn't something the DC legislates for. Response was to use the 'Irish guidelines for indexing archives'.

Questions Christophe: are there cases where translations aren't identical? A: Often came up with technical metadata - 'cuma' means to keepk. If you're using it for preservation then you can't use it else where. Another issue was that in English technical metadata might have four nouns, but no definate articles which just doesn't happen in Irish. [did I get this right?]

####Identifying HSS research data for preservation - a snapshot of current policy and guidelines

Rebecca Grant, Royal Irish Academy @Beck_Grant

PhD question: how can proffessional archivists engage with research data?

Lit review: how is research data defined for researchers? Who provides these definitions?

Research Data is published to:

  • fulfill OA requirements
  • share ideas
  • allow for reproducible research

Humanities REsearch Data;

####Preserving the essence: Identifying the significant properties of social science research data

Astrid Recker, Leibniz Institute for the Social Sciences @CESSDAtraining

Steffan Müller, Leibniz Institute for the Social Sciences

How do we preserve the essence of the data as the medium changes.

Intro: Significent Properties.

Paradoxically digital preservation means changing the digital object. What level of change is possible before the object ceases to be authentic?

The significant properties are those which "must be maintained ... be be accepted as evidence of what it purpots to record"

Different ways to determine significant properties:

  • people centric
  • process centric
  • data centric

What are the objects to be looked at?

Understnding the data requires understanding the research process which created the data. As a consequence data received into the archive is accompanied by questionairres, methods reports, codebooks.

Data model of archival object drawn up using Premis 2.2. Incoprorates data sets, reports, methods and field work reports.

Question

Why not store everything? A: the problem isn't storage - the exercise anticipates change and is planning for a way to deal with the consequences of that change. An example would be the hoped for move from an SPSS db to ascii.

Ruth Gerathy: Do you interact with the researcher when deciding what is significant. A: No. They do interact when ingesting the material in the first place however.

Emphasis on sig props will change with move to emulation and virtualisation. A: emulation may be a game-changer but future users may not know how to interact with emulations. Contextual information will have to be added to facilitate this interaction.

Jane: assumption is SS is that researchers will interact with entire datasets. However researchers may only use parts of this dataset, out of context. How much does that threaten our understanding of how we archive SS data? A: We determine sig props with a designated user community in mind. Also woried about objects being chosen from diff datasets - how can we judge research quality without original context.

#Paper Session 5 (Short Papers): Text, Images, and Objects (Hogan)

Chair: Dermot Frost, Manager, Trinity Centre for High Performance Computing, TCD @astarmain

####Towards a Conceptual Emulation Framework for the Preservation of Archaeological 3D Visualisations

Panagiotis Papageorgiou, University of Portsmouth

I missed the start of this,

advantages of emulation:

  • original bitstreams can be preserved
  • cost efficient - migration every 4 years
  • emulation via java - which platform independent

Another mention of the 'Preserving Complex Objects' book:

Question

Dermot Frost: won't you need to emulate your emulators? Why not bite the bullet and just migrate the data? A: emulators have to be migrated every 4 years, data has to be migrated every year.

Q: Virtualization as a parrallel to emulation: A: Some people consider them to be the same, it's an area we need to research.

Q: how did you choose your project data? A: Based on emulation framework called ??? Data comes from DANS, Kings College Labs

Q: Use cases, is emulation closing an interesting use case for 3d datasets? EXample: scans of carved stones in Scotland. to be repeated in 25 years, compared to measure erosion / change. A: Me: once you have the data what difference does the preservation process matter?

####Irish Archaeological Data: Towards a framework

Anthony Corns, The Discovery Programme @DiscProg

Louise Kennedy, The Discovery Programme

ARIADNE

A survey of Irish archaeological data:

  • national momuments service
  • national museum of ireland
  • national roads authority

Examined:

  • data held
  • how it was managed
  • how it might be integrated

Found that:

  • < 25% of records are digital

Metadata used

  • licence number
  • townland / site name
  • national monument number
  • names of excavators / companies

Backup is more common than deliberate preservation

In terms of quality / standards there are issues around lack of controlled vocabs,

In all cases survey respondents regard their material as being open, however much has to be consulted in hardcopy. Fears around damage to records by this use.

'Cultural Data Framework'

developiing an Irish archaeological thesarus,

Dangers; no requirement for archaeological datasets to be stored.

Q: how will you engage with the commercail sector who create 80 - 90 % of the data? A: the survey showed the massive gap in data. Discovery want to build something to deal with this gap. Follow on Q: Private sector can only do things that are satutory or contractual - otherwise they can't bill clients for the work. D.Frost: this isn't unique to archaeology.

Q: shouldn't we introduce the potential user into this conversation?

Ellen Murphy: commenting on Dublin City archaeological archive - succeeded because requirement for donation waimplemented by a city bylaw. Firms have also stepped up and deposited records which were created prior to the requirement.

####Do No Harm: Mitigating Unintentional Errors When Curating Data

Jared Lyle, Inter-University Consortium for Political and Social Research @ICPSR

Intro referecnes preservation fail: http://www.independent.co.uk/arts-entertainment/art/news/so-bad-it-was-brilliant-botched-fresco-restoration-answers-spanish-towns-prayers-with-tourism-boom-8762069.html

What if metadata enhancements or format migrations cause harm / mutation to data?

Converting excel to tsv loses any colour formatting that might be present.

As opposed to paintings data deals with machines, mediated and you rely on them working as expected. But novices running point and click envirnments can make changes they don't understand.

Case study: Data from the 80s. Going back to check out how data was altered, dificult to audit the changes and verify it's integrity.

It is important to docunt changes to data, to comment in programmes.

Software: Universal Numerical Fingerprint (UNF) checksums accross different formats

Question Are there data vis checks for catching changes? A: No, but more sophisticated tools would be cool, also for the original researchers.

####Sustaining data archives over time: Lessons from the organisational studies literature

Kalpana Shankar, University College Dublin

Kristin Eschenfelder, University of Wisconsin-Madison @UWMadisonSLIS

How do we maintain trust in preservation if funding is on recurring cycles? Problem of short term goals.

Social science data archives have survived fairly successfully since the 1920s. They have also been successful in shaping the practice of the wider field of social science.

So what makes these examples successful? Case studies carried out to find out by going through the archives of archives and carrying out interviews, and field level analyses.

Aspects coded for : business model changes, minutes, organisational minuate.

Lit Analysis: practitioner lit vs[?] organisational studies.

Results, interesting emerging data:

  • Bis models: better if business model aligns with organisational mission -
  • Multi-Institutional Rlationships: more sucessfull institutions have more and better relationships. Sometimes creating a model for their data - getting people into using and then requiring their data.
  • Value of Data Services: arguing for value added curation

Problems with the research:

  • hard to know what success actually looks like
  • timescales don't match for different factors; funding cycles, contract durations,

Questions

can you talk about tension between national and international setups. A: American institiutions used to go it alone, UK more likely to plug into networks. Probably reflects broader researcher cultures.

####Músgraí WYSIWYM WP: a Simple Plug-in to Add Semantically Meaningful Functionality to the Graphical Editor of the WordPress CMS, with Emphasis on Digitally Archiving Irish and Scottish Gaelic Texts

Mícheál Mac Lochlainn, Ollscoil na hÉireann, Gaillimh

WP as an excellent candidate for building online archives. Plugin created to address WP shortcomings.

TinyMCE altered to turn it into a WYSIWYM editor. - what you see is what you mean

Tiny MCE as Semantically compromised

Is the point that formatting and even visual design decisions can be inserted should be considered a form of data corruption?

This plugin removes all TinyMCE butons allowing insertion of formatting tags and options. User can still create valid html.

This plugin also inserts semantically meaningful buttons - things like 'cite'. Tehre are also 3nd order class attributes which can ref cited doc.

module available to add special characters - all unicode standards.

Basically the plugin turns TinyMCE into a more customised visual editor - is all code html compliant? Removes things like tag in place of . It allows the use of a wider use of characters and fonts. Cool project, if you have a suitable use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment