Timeframe: April-2023 through March-2024.
On this page...
(most BDR work was with combinations of CM, JM, and JU)
- w/team, deepened knowledge of extracting text from PDFs, and indexing it, to improve discovery
- experimented with jupyter-notebooks to create interactive API documentation
- implemented a "queue"-check to alert us if our various queues and workers aren't as expected, and if there are failures
- updated the django-bulstyle webapp, which the BDR uses, to incorporate YF's google-analytics-4 upgrades
- actively participated in Stakeholders group
- shared info on a
Rapid Assessment Model
of theDigital Preservation Coalition
- shared "Desirable Characteristics of Digital Publication Repositories" from a webinar
- with JU, presented on the BDR in a BUL-Session
- performed infrastructure improvements such as adding a url endpoint to the Workshop to check whether error-emails are being sent
- w/JU, shared with multiple sets of folk that the BDR (and web more broadly) cannot reliably make something public-but-not-downloadable
- w/JU, investigated images-of-unusual-size queries from microscopy folk
- encouraged small-script work that could contribute to dashboard work
- introduced New-Projects gdocs to facilitate communication between DT-dev, org-member, and relevant-librarian
- implemented wave-accessibility improvement
- created repository of small-scripts for documentation and re-usable future code
- experimented with 3 different libraries to improve text-extraction from PDFs that didn't work well for BDH PDFs
- supported JM's work learning about DPLA architecture and issues
- created Hall-Hoag slack-channel
- created code to explore the initial FileMakerPro xml export
- shared articles on producing PDFs w/team, reviewed as a group
- figured out how to export complete data from FileMakerPro; added script to facilitate export
- explored image-orientation-detection code
- created demo-code for applying stylesheets to xml to illustrate xslt features
- suggested python
multiprocessing
library for project - suggested python
tempfile
module - showed team my old code for ingesting via BDR-APIs, which ended up being chosen ingest-method
- tested different libraries to summarize extracted text
- demoed successful use of applying large-language-model to extracted-text -- to summarize descriptions and create titles to improve browseability
- w/team, upgraded numerous parts of the BDR to enable MODS 3.8 validation
- added logging to the ocr-pipeline code
- added logging to pytesseract experimentation
- enhanced extracts with more movie info
- ran multiple reading-list exports
- deployed a self-service webapp for staff which has proved useful to get data out of OCRA in a ready-to-upload-to-Leganto format
- implemented minor updates to the IIP-listener/indexer
- w/PR, multiple meetings with prof and external developer to transition the project off Brown servers
- shared example of auto-creating a json-schema from actual browse.json data
- improved query that was unnecessarily generating thousands of selects
- improved a mysqldump script to update a repository used by other SR applications
- further evolved the mysqldump script to export events and procedures
- further evolved the mysqldump script to address permissions issues and use fewer server resources
- working with PR, added age-category handling to the API
- w/JU, met with prof EL; implemented a fix for problematic roles
- deployed updated packages
- worked with CM, JM, and JU on other troubleshooting
- participated in Aeon meetings
- Primo: began bookplates script to ensure that database and Alma info are in-sync
- maintained regular pair-programming practices to facilitate shared knowledge/practices
- w/JM and KH, ran Hathi script
- worked with LW's team to identify no-longer-active django webapps, for Drupal migration
- shared lots of info with CM after he joined the team, including django, github, solr, and other practices
- w/JM, troubleshot normally-stable CDL system and communicated good-practices with staff
-
fostered cross-team communication
- participated in Data Services group; facilitated ex-DT-developer & colleague from Princeton to share their data-services work
- shared CDS google-analytics work
- shared CDS thought-provoking questions about how we measure "success"
- facilitated EY sharing static-site work with DT
- facilitated PR sharing ChatGPT and Arc-browser experimentation with DT
-
AI/ML:
- started ai_libtools slack-channel
- regularly added links to posts
- communicated with Library colleagues on other AI committees
- for Hall-Hoag project, demoed successful use of applying large-language-model to extracted-text -- to summarize descriptions and create titles to improve browseabiity
- w/JM and JU experimented with using machine-learning to predict missing values in a dataset, based on patterns learned from related fields in the same dataset
-
supported team-building and good team practices:
- supported the Friday meetings via regularly sharing, and periodically updating the friday-meeting slack-channel with summaries of who shared what.
- supported folk experimenting with new tools such as github "Projects", and Notion task-management
- led a 6-3-5 DT brainstorming exercise
- initiated sprint experimentation
- initiated new BDR-project tracking
- started short DT Project-Videos to visually document improvements and retired-sites
-
misc:
- set up template for Django 4.2x (the new long-term-support version) running and created a template to make it easy to deploy new projects
- attended numerous candidate presentations
- attended Library Town Halls
- attended first post-COVID conference, ENUG in Philly
- w/team, shared VSCode settings/features/usage/tips
-
attended DH-Salon presentations
-
researched static-site generators, and used one to convert my personal site into a static sight -- gleaned insights into possibilities for new DT work and archival work
-
experimented with using jupyter-notebooks for API documentation
-
began using trello for keeping track of my own work
-
reviewed concurrency approaches and coded a concurrency-template-project to refer to when working on future projects that might require it
-
outside of work, deepened Rust knowledge and experience
- explored embedding sqlite in a binary for possible offloading of Alma-API lookups
- learned practices for how to embed things like current branch & commit into a binary.
-
learned about advent-of-code puzzles
-
learned a bit of zig, and nim, and mojo programming languages
-
deepened AI knowledge
- w/JM and JU, went through tutorials on using machine-learning to create data -- began training a model to predict missing values in a BDR dataset
- advanced work on a Whisper neural-network transcriber tool that could be useful for Library staff
- for Hall-Hoag, researched a few ways to use large-language-models for summarization; demoed successful use on Hall-Hoag extracted text
- shared ai4Libraries virtual conference notes
-
deepened knowledge of vs_code features
-
extended use of ChatGPT for work
-
continued to contribute to team-knowledge through slack and Friday dev-meetings. Shared:
- OpenAI's API info
- past work on "Stella", a 2006 Library chatbot with a Brown connection
- lots of AI resources
- python "retry" library, for auto-retrying temporarily-unsuccessful function/network-calls
- useful new git-clone commands, and commands to directly get at current commit and branch
- how to avoid password-prompts in automated mysqldump scripts
- data-validation post to integrate data-validation into workflow
- "source-of-truth" post relevant for many of our projects
- "rounding" in programs different from assumptions
- "rocfl" -- a command-line tool that lets us inspect and edit our OCFL storage
- updated phpMyAdmin unicode collation
- the importance of "red-green-refactor" in testing
- dependency-issues like new versions of requests not working on our servers, and why
- the
WebP
image-format - code4lib journal articles and interesting posts
- our IIIF server transformation-urls
- VSCode new diff features and new Copilot features
- htmx javascript library, bringing dynamic-interaction to regular html/server architectures
- HexTuples, an extension of standard linked-data triples
- Canadian Access conference info
- code-demo of programmatically applying a stylesheet to an xml file to get transformed output
- large-language-model notes
- concept of in project-architecture
- mechanism for embedding git-commit into a binary -- applicable to other compiled processes like static-site-generators
-
regularly perused posts and e-newsletters to keep abreast of interesting code/techniques. Two primary regular sources:
- python-weekly
- programmer-weekly
-
began perusing the code4lib slack-channels in addition to the code4lib email-list to be aware of others' work
[end]