birkin/stuff_i_did_2023.md

## stuff_i_did_2023.md

      
    Raw
  

              stuff_i_did_2023.md
            
          
    stuff I did - 2023

Timeframe: April-2023 through March-2024.
On this page...

Project improvements

BDR
DPLA
Hall-Hoag
Leganto
IIP
SR
TTWR
VIVO
Other


Non project-specific activity

General
Professional-development


Project improvements

BDR

(most BDR work was with combinations of CM, JM, and JU)

w/team, deepened knowledge of extracting text from PDFs, and indexing it, to improve discovery
experimented with jupyter-notebooks to create interactive API documentation
implemented a "queue"-check to alert us if our various queues and workers aren't as expected, and if there are failures
updated the django-bulstyle webapp, which the BDR uses, to incorporate YF's google-analytics-4 upgrades
actively participated in Stakeholders group
shared info on a Rapid Assessment Model of the Digital Preservation Coalition
shared "Desirable Characteristics of Digital Publication Repositories" from a webinar
with JU, presented on the BDR in a BUL-Session
performed infrastructure improvements such as adding a url endpoint to the Workshop to check whether error-emails are being sent
w/JU, shared with multiple sets of folk that the BDR (and web more broadly) cannot reliably make something public-but-not-downloadable
w/JU, investigated images-of-unusual-size queries from microscopy folk
encouraged small-script work that could contribute to dashboard work
introduced New-Projects gdocs to facilitate communication between DT-dev, org-member, and relevant-librarian
implemented wave-accessibility improvement
created repository of small-scripts for documentation and re-usable future code
experimented with 3 different libraries to improve text-extraction from PDFs that didn't work well for BDH PDFs

DPLA


supported JM's work learning about DPLA architecture and issues

Hall-Hoag


created Hall-Hoag slack-channel
created code to explore the initial FileMakerPro xml export
shared articles on producing PDFs w/team, reviewed as a group
figured out how to export complete data from FileMakerPro; added script to facilitate export
explored image-orientation-detection code
created demo-code for applying stylesheets to xml to illustrate xslt features
suggested python multiprocessing library for project
suggested python tempfile module
showed team my old code for ingesting via BDR-APIs, which ended up being chosen ingest-method
tested different libraries to summarize extracted text
demoed successful use of applying large-language-model to extracted-text -- to summarize descriptions and create titles to improve browseability
w/team, upgraded numerous parts of the BDR to enable MODS 3.8 validation
added logging to the ocr-pipeline code
added logging to pytesseract experimentation

Leganto


enhanced extracts with more movie info
ran multiple reading-list exports
deployed a self-service webapp for staff which has proved useful to get data out of OCRA in a ready-to-upload-to-Leganto format

Inscriptions of Israel and Palestine


implemented minor updates to the IIP-listener/indexer
w/PR, multiple meetings with prof and external developer to transition the project off Brown servers

Stolen Relations


shared example of auto-creating a json-schema from actual browse.json data
improved query that was unnecessarily generating thousands of selects
improved a mysqldump script to update a repository used by other SR applications
further evolved the mysqldump script to export events and procedures
further evolved the mysqldump script to address permissions issues and use fewer server resources
working with PR, added age-category handling to the API

Theater That Was Rome


w/JU, met with prof EL; implemented a fix for problematic roles

VIVO


deployed updated packages
worked with CM, JM, and JU on other troubleshooting

Other


participated in Aeon meetings
Primo: began bookplates script to ensure that database and Alma info are in-sync
maintained regular pair-programming practices to facilitate shared knowledge/practices
w/JM and KH, ran Hathi script
worked with LW's team to identify no-longer-active django webapps, for Drupal migration
shared lots of info with CM after he joined the team, including django, github, solr, and other practices
w/JM, troubleshot normally-stable CDL system and communicated good-practices with staff


Non project-specific activity

General


fostered cross-team communication

participated in Data Services group; facilitated ex-DT-developer & colleague from Princeton to share their data-services work
shared CDS google-analytics work
shared CDS thought-provoking questions about how we measure "success"
facilitated EY sharing static-site work with DT
facilitated PR sharing ChatGPT and Arc-browser experimentation with DT


AI/ML:

started ai_libtools slack-channel
regularly added links to posts
communicated with Library colleagues on other AI committees
for Hall-Hoag project, demoed successful use of applying large-language-model to extracted-text -- to summarize descriptions and create titles to improve browseabiity
w/JM and JU experimented with using machine-learning to predict missing values in a dataset, based on patterns learned from related fields in the same dataset


supported team-building and good team practices:

supported the Friday meetings via regularly sharing, and periodically updating the friday-meeting slack-channel with summaries of who shared what.
supported folk experimenting with new tools such as github "Projects", and Notion task-management
led a 6-3-5 DT brainstorming exercise
initiated sprint experimentation
initiated new BDR-project tracking
started short DT Project-Videos to visually document improvements and retired-sites


misc:

set up template for Django 4.2x (the new long-term-support version) running and created a template to make it easy to deploy new projects
attended numerous candidate presentations
attended Library Town Halls
attended first post-COVID conference, ENUG in Philly
w/team, shared VSCode settings/features/usage/tips


Professional-development


attended DH-Salon presentations


researched static-site generators, and used one to convert my personal site into a static sight -- gleaned insights into possibilities for new DT work and archival work


experimented with using jupyter-notebooks for API documentation


began using trello for keeping track of my own work


reviewed concurrency approaches and coded a concurrency-template-project to refer to when working on future projects that might require it


outside of work, deepened Rust knowledge and experience

explored embedding sqlite in a binary for possible offloading of Alma-API lookups
learned practices for how to embed things like current branch & commit into a binary.


learned about advent-of-code puzzles


learned a bit of zig, and nim, and mojo programming languages


deepened AI knowledge

w/JM and JU, went through tutorials on using machine-learning to create data -- began training a model to predict missing values in a BDR dataset
advanced work on a Whisper neural-network transcriber tool that could be useful for Library staff
for Hall-Hoag, researched a few ways to use large-language-models for summarization; demoed successful use on Hall-Hoag extracted text
shared ai4Libraries virtual conference notes


deepened knowledge of vs_code features


extended use of ChatGPT for work


continued to contribute to team-knowledge through slack and Friday dev-meetings. Shared:

OpenAI's API info
past work on "Stella", a 2006 Library chatbot with a Brown connection
lots of AI resources
python "retry" library, for auto-retrying temporarily-unsuccessful function/network-calls
useful new git-clone commands, and commands to directly get at current commit and branch
how to avoid password-prompts in automated mysqldump scripts
data-validation post to integrate data-validation into workflow
"source-of-truth" post relevant for many of our projects
"rounding" in programs different from assumptions
"rocfl" -- a command-line tool that lets us inspect and edit our OCFL storage
updated phpMyAdmin unicode collation
the importance of "red-green-refactor" in testing
dependency-issues like new versions of requests not working on our servers, and why
the WebP image-format
code4lib journal articles and interesting posts
our IIIF server transformation-urls
VSCode new diff features and new Copilot features
htmx javascript library, bringing dynamic-interaction to regular html/server architectures
HexTuples, an extension of standard linked-data triples
Canadian Access conference info
code-demo of programmatically applying a stylesheet to an xml file to get transformed output
large-language-model notes
concept of in project-architecture
mechanism for embedding git-commit into a binary -- applicable to other compiled processes like static-site-generators


regularly perused posts and e-newsletters to keep abreast of interesting code/techniques. Two primary regular sources:

python-weekly
programmer-weekly


began perusing the code4lib slack-channels in addition to the code4lib email-list to be aware of others' work


[end]