Skip to content

Instantly share code, notes, and snippets.

@birkin
Last active April 1, 2024 18:18
Show Gist options
  • Save birkin/35c90a5010c4290519f60d492fa82f5b to your computer and use it in GitHub Desktop.
Save birkin/35c90a5010c4290519f60d492fa82f5b to your computer and use it in GitHub Desktop.
highlights of 2023 work -- #stuff

stuff I did - 2023

Timeframe: April-2023 through March-2024.

On this page...


Project improvements

BDR

(most BDR work was with combinations of CM, JM, and JU)

  • w/team, deepened knowledge of extracting text from PDFs, and indexing it, to improve discovery
  • experimented with jupyter-notebooks to create interactive API documentation
  • implemented a "queue"-check to alert us if our various queues and workers aren't as expected, and if there are failures
  • updated the django-bulstyle webapp, which the BDR uses, to incorporate YF's google-analytics-4 upgrades
  • actively participated in Stakeholders group
  • shared info on a Rapid Assessment Model of the Digital Preservation Coalition
  • shared "Desirable Characteristics of Digital Publication Repositories" from a webinar
  • with JU, presented on the BDR in a BUL-Session
  • performed infrastructure improvements such as adding a url endpoint to the Workshop to check whether error-emails are being sent
  • w/JU, shared with multiple sets of folk that the BDR (and web more broadly) cannot reliably make something public-but-not-downloadable
  • w/JU, investigated images-of-unusual-size queries from microscopy folk
  • encouraged small-script work that could contribute to dashboard work
  • introduced New-Projects gdocs to facilitate communication between DT-dev, org-member, and relevant-librarian
  • implemented wave-accessibility improvement
  • created repository of small-scripts for documentation and re-usable future code
  • experimented with 3 different libraries to improve text-extraction from PDFs that didn't work well for BDH PDFs

DPLA

  • supported JM's work learning about DPLA architecture and issues

Hall-Hoag

  • created Hall-Hoag slack-channel
  • created code to explore the initial FileMakerPro xml export
  • shared articles on producing PDFs w/team, reviewed as a group
  • figured out how to export complete data from FileMakerPro; added script to facilitate export
  • explored image-orientation-detection code
  • created demo-code for applying stylesheets to xml to illustrate xslt features
  • suggested python multiprocessing library for project
  • suggested python tempfile module
  • showed team my old code for ingesting via BDR-APIs, which ended up being chosen ingest-method
  • tested different libraries to summarize extracted text
  • demoed successful use of applying large-language-model to extracted-text -- to summarize descriptions and create titles to improve browseability
  • w/team, upgraded numerous parts of the BDR to enable MODS 3.8 validation
  • added logging to the ocr-pipeline code
  • added logging to pytesseract experimentation

Leganto

  • enhanced extracts with more movie info
  • ran multiple reading-list exports
  • deployed a self-service webapp for staff which has proved useful to get data out of OCRA in a ready-to-upload-to-Leganto format

Inscriptions of Israel and Palestine

  • implemented minor updates to the IIP-listener/indexer
  • w/PR, multiple meetings with prof and external developer to transition the project off Brown servers

Stolen Relations

  • shared example of auto-creating a json-schema from actual browse.json data
  • improved query that was unnecessarily generating thousands of selects
  • improved a mysqldump script to update a repository used by other SR applications
  • further evolved the mysqldump script to export events and procedures
  • further evolved the mysqldump script to address permissions issues and use fewer server resources
  • working with PR, added age-category handling to the API

Theater That Was Rome

  • w/JU, met with prof EL; implemented a fix for problematic roles

VIVO

  • deployed updated packages
  • worked with CM, JM, and JU on other troubleshooting

Other

  • participated in Aeon meetings
  • Primo: began bookplates script to ensure that database and Alma info are in-sync
  • maintained regular pair-programming practices to facilitate shared knowledge/practices
  • w/JM and KH, ran Hathi script
  • worked with LW's team to identify no-longer-active django webapps, for Drupal migration
  • shared lots of info with CM after he joined the team, including django, github, solr, and other practices
  • w/JM, troubleshot normally-stable CDL system and communicated good-practices with staff

Non project-specific activity

General

  • fostered cross-team communication

    • participated in Data Services group; facilitated ex-DT-developer & colleague from Princeton to share their data-services work
    • shared CDS google-analytics work
    • shared CDS thought-provoking questions about how we measure "success"
    • facilitated EY sharing static-site work with DT
    • facilitated PR sharing ChatGPT and Arc-browser experimentation with DT
  • AI/ML:

    • started ai_libtools slack-channel
    • regularly added links to posts
    • communicated with Library colleagues on other AI committees
    • for Hall-Hoag project, demoed successful use of applying large-language-model to extracted-text -- to summarize descriptions and create titles to improve browseabiity
    • w/JM and JU experimented with using machine-learning to predict missing values in a dataset, based on patterns learned from related fields in the same dataset
  • supported team-building and good team practices:

    • supported the Friday meetings via regularly sharing, and periodically updating the friday-meeting slack-channel with summaries of who shared what.
    • supported folk experimenting with new tools such as github "Projects", and Notion task-management
    • led a 6-3-5 DT brainstorming exercise
    • initiated sprint experimentation
    • initiated new BDR-project tracking
    • started short DT Project-Videos to visually document improvements and retired-sites
  • misc:

    • set up template for Django 4.2x (the new long-term-support version) running and created a template to make it easy to deploy new projects
    • attended numerous candidate presentations
    • attended Library Town Halls
    • attended first post-COVID conference, ENUG in Philly
    • w/team, shared VSCode settings/features/usage/tips

Professional-development

  • attended DH-Salon presentations

  • researched static-site generators, and used one to convert my personal site into a static sight -- gleaned insights into possibilities for new DT work and archival work

  • experimented with using jupyter-notebooks for API documentation

  • began using trello for keeping track of my own work

  • reviewed concurrency approaches and coded a concurrency-template-project to refer to when working on future projects that might require it

  • outside of work, deepened Rust knowledge and experience

    • explored embedding sqlite in a binary for possible offloading of Alma-API lookups
    • learned practices for how to embed things like current branch & commit into a binary.
  • learned about advent-of-code puzzles

  • learned a bit of zig, and nim, and mojo programming languages

  • deepened AI knowledge

    • w/JM and JU, went through tutorials on using machine-learning to create data -- began training a model to predict missing values in a BDR dataset
    • advanced work on a Whisper neural-network transcriber tool that could be useful for Library staff
    • for Hall-Hoag, researched a few ways to use large-language-models for summarization; demoed successful use on Hall-Hoag extracted text
    • shared ai4Libraries virtual conference notes
  • deepened knowledge of vs_code features

  • extended use of ChatGPT for work

  • continued to contribute to team-knowledge through slack and Friday dev-meetings. Shared:

    • OpenAI's API info
    • past work on "Stella", a 2006 Library chatbot with a Brown connection
    • lots of AI resources
    • python "retry" library, for auto-retrying temporarily-unsuccessful function/network-calls
    • useful new git-clone commands, and commands to directly get at current commit and branch
    • how to avoid password-prompts in automated mysqldump scripts
    • data-validation post to integrate data-validation into workflow
    • "source-of-truth" post relevant for many of our projects
    • "rounding" in programs different from assumptions
    • "rocfl" -- a command-line tool that lets us inspect and edit our OCFL storage
    • updated phpMyAdmin unicode collation
    • the importance of "red-green-refactor" in testing
    • dependency-issues like new versions of requests not working on our servers, and why
    • the WebP image-format
    • code4lib journal articles and interesting posts
    • our IIIF server transformation-urls
    • VSCode new diff features and new Copilot features
    • htmx javascript library, bringing dynamic-interaction to regular html/server architectures
    • HexTuples, an extension of standard linked-data triples
    • Canadian Access conference info
    • code-demo of programmatically applying a stylesheet to an xml file to get transformed output
    • large-language-model notes
    • concept of in project-architecture
    • mechanism for embedding git-commit into a binary -- applicable to other compiled processes like static-site-generators
  • regularly perused posts and e-newsletters to keep abreast of interesting code/techniques. Two primary regular sources:

    • python-weekly
    • programmer-weekly
  • began perusing the code4lib slack-channels in addition to the code4lib email-list to be aware of others' work


[end]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment