Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Data Science in Libraries Workshop, University of Pittsburgh, 16-17 May 2017

Data Science in Libraries Workshop, University of Pittsburgh, 16-17 May 2017

Live notes, so an incomplete, partial record of what actually happened.

Tags: dsinl

My asides in {}

Day 1

Orienting Talks (9:30am - 10:00am) - Perspectives on Data Science in Libraries. Speakers: Chris Erdmann, Bonnie Tijerina, & Liz Lyon

Chris: DST4L .. practical but with speakers to provide context .. something like DST4L as a service would, potentially, support both librarians and those they serve.

Bonnie: there are larger societal issues at play here .. big data ethics - who has access to data, how are they using it, do they discriminate in their behaviour .. in big data research, lots of grey areas in the middle between formal ethical requirements and later stage data curation. Examples: data collection, storage, reuse, consent/re-identification, unknown/emerging issues .. students who go on to work in tech sector need stronger data ethics training during their degrees .. working on embedding data ethics in data science programmes showcases the values of librarians

Liz: issues - awareness among senior management and scope (both intra- and extra- library) ..

Within libraries, there is varying mgmt gaps. And varying ways to fill those gaps. Gotta know the context of each library. #DSinL

— UC Curation Center (@UC3CDL) May 16, 2017

.. culture: is data science what librarians do?!

Data training in Libraries turf wars? remember old saying: pick big battles with your foes, not small ones with your friends. #DSinL

— UC Curation Center (@UC3CDL) May 16, 2017

Lyon and Maltern Education for Real-World Data Science Roles (Part 2): A Translational Approach to Curriculum Development (2016) IJDC .. future data scientists needs a blend of maths skills and social skills (and even PWC agree!) ..

Data analyst? Data engineer? Data journalist? What are the Data roles we serve in the @LibCarpentry world? #DSinL

— Library Carpentry (@LibCarpentry) May 16, 2017

PwC report on data analytics talent - #DSinL

— Keith Webster (@CMKeithW) May 16, 2017

Data roles in librarianship roughly split by disciplinary background.

Domain disconnect: Data science training focuses on science & computer sci but our workforce tends to come from Arts & Humanaties. #DSinL

— UC Curation Center (@UC3CDL) May 16, 2017

Big Group Discussion (10:00am - 11:30am) - Open Discussion & Affinity Exercise

Good CS people want access to rich, diverse questions (which £££ roles might now provide) .. emphasise the public good libraries can attract those would rather do public good over $$$ ..

Makes sense, especially given we already do modules on stereotype threat, unconscious bias in our workshops #umichswc #DSinL

— Alix Keener (@alix_rae) May 16, 2017

Funder’s Perspective (11:30am - Noon) - Funding Opportunities for data science and libraries. Speakers: Josh Greenberg (Sloan) & Ashley Sands (IMLS)

Up now at #DSinL is @ashley247 talking about work of @US_IMLS

— Keith Webster (@CMKeithW) May 16, 2017

IMLS: Values of librarians should shine through in the services/tools we build: open, access, preservation, privacy

Sloan: started to work more closely with universities as preservation plans for great projects were often unsatisfactory .. how do we support DS in universities leads invariably to how do we support DS in libraries .. DS in libraries can play a strong part in libraries proving and improving their competitiveness ..

We need to help LIS faculty - and ourselves - do computational research - and have the best data sciences toolkit at our disposal #DSinL

— Keith Webster (@CMKeithW) May 16, 2017

Josh Greenberg: data science is intersection of computation and statistics in context of more data than people are accustomed to #DSinL

— Keith Webster (@CMKeithW) May 16, 2017

Keynote Talk (1:00 - 2:30pm) - “Data Science & Critical Thinking” Speaker: Alistair Croll

We used to ask questions then collect data. We now collect data and ask questions later .. role of librarians in taking people outside of their bubble .. we are about to have a permanent life feed ..

Data + Code = Legacy - incredibly important for librarians to get this right when we’re upgrading human cognition #DSinL

— Keith Webster (@CMKeithW) May 16, 2017

#### Lightning talks (3pm - 4pm) - 5 minute talks by participants doing Data Science in libraries

Harriet Green: HTRC Digging Depper, Reaching Further project using HathiTrust dataset as a cornerstone for teaching librarians basic text analysis skills .. non-consumptive use policy

Eleanor Tutt: tackling problems of open data not being useable by the public (and therefore not fulfilling its public purpose of keeping government to account)

Lauren Di Monti: working with graduate students .. win-win: for library (knowledge/embed of DS) and the student (showcase skills, understand need, learn to teach) ..

Working on a massive (web) dataset for equivalent cost to a textbook is achievable if libraries handle the transfer/preservation costs.

Small Breakout Sessions (4pm - 5pm) - Digging deeper into key issues.

Group Discussion (5pm-5:30pm) - Reporting back from the breakout sessions and a larger group discussion reflecting on the day’s conversation.

Day 2 May 17th (9am - Noon)

Orienting Talk (9am - 9:30am) - From laboratories to libraries: mapping the skills and competences for data professionals, lessons from the EDISON project Speaker: Steve Brewer

European Open Data Science Cloud is happening ..

Edison Data Science Framework can be found at @EdisonEU #DSinL

— UC Curation Center (@UC3CDL) May 17, 2017

Are we looking for unicorns? Great obsv by @tracykteal: typical Data Science venn diagrams, big lists of soft skills seem to imply it #DSinL

— Aaron Brenner (@abrennr) May 17, 2017

Case Studies Breakout (9:30am - 11:00am)

  • Skills Gap: Training librarians to be data savvy.
  • Skills Gap: How data savvy librarians can support their communities.
  • Management Gap: How to manage data savvy librarians & data science teams.
  • Management Gap: How to use data in library operations.

Report Back & Closing Discussion (11:00 - noon) - Hear back from the breakout groups and close out the workshop. Speaker: Matt Burton

Next steps: working on a report and a roadmap for DS in libraries .. more workshops? .. focused research projects?

Some admin...

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Exceptions: embeds to and from external sources, and direct quotations from speakers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment