Skip to content

Instantly share code, notes, and snippets.

View mjlassila's full-sized avatar

Matti Lassila mjlassila

View GitHub Profile
@mjlassila
mjlassila / luontoon-downloader.py
Last active April 29, 2025 16:17
Downloader for old Luontoon.fi PDF maps and brochures
import os
import time
import random
import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin, urlparse
# Base URL
BASE_URL = 'https://julkaisut.metsa.fi'
SEARCH_URL = 'https://julkaisut.metsa.fi/julkaisut/?publication-search&publication-languages&publication-series&publication-product-categories=9764'
@mjlassila
mjlassila / create-xml.py
Last active March 21, 2025 08:20
Convert form-feed separated, line-based metadata records to XML
import os
import re
import argparse
import xml.etree.ElementTree as ET
import xml.dom.minidom
def parse_record(record_text):
metadata = {}
content_elements = []
@mjlassila
mjlassila / write_metadata.py
Last active March 28, 2025 13:58
Write metadata from CSV to OGG files
import csv
import os
import re
import sys
from collections import defaultdict
from mutagen.oggvorbis import OggVorbis
def extract_number(text):
numbers = re.findall(r'\d+', text)
return f"{int(numbers[0]):05d}" if numbers else "99999"
@mjlassila
mjlassila / create-price-change-table.qmd
Created November 7, 2024 11:03
Create Excel worksheet of yearly price changes
---
title: "Tietokantojen hinnanmuutokset"
format: html
---
```{r}
#| label: setup
#| echo: false
@mjlassila
mjlassila / primo_matomo_example.js
Created April 18, 2023 10:49 — forked from gpeterso/primo_ga_example.js
Tracking Primo pageviews in Matomo Analytics
const gaTrackingId = '...';
class GoogleAnalytics {
constructor($rootScope, $location, $window) {
this.$rootScope = $rootScope;
this.$location = $location;
this.$window = $window;
this.loadAnalytics(this.$window);
}
@mjlassila
mjlassila / count-available-affiliation-data.xq
Last active February 8, 2023 13:04
Count affiliation data availability in Journal.fi data and create list for all relevant publications.
declare option output:method "csv";
declare option output:csv "header=yes, separator=;";
let $docs_with_affils:=
<records>{
for $record in //oai_marc
where $record/controlfield[@tag eq "001"] contains text {'article.*'} using wildcards
let $parent:=substring-before(substring-after(base-uri($record),"/journalfi-data/"),".xml")
let $affiliation_count:=count($record//datafield[@tag eq '100' or @tag eq '700']/subfield[@code eq 'u'])
let $orcid_count:=count($record//datafield[@tag eq '100' or @tag eq '700']/subfield[@code eq '0' and . contains text {"orcid.*"} using wildcards])
@mjlassila
mjlassila / create-simplestats-csv.xq
Created April 20, 2022 15:28
Convert DSpace Simplestats statistics to analysis-friendly CSV
declare option output:method "csv";
declare option output:csv "header=yes, separator=comma";
declare function local:to-datestamp($raw_date){
let $parts:=tokenize($raw_date," / ")
return $parts[2] || '-' || $parts[1] || '-1'
};
declare function local:not-total($string) {
@mjlassila
mjlassila / README.md
Created May 11, 2021 10:49
Konferenssien vakiintuneet nimet JUFO-tietokannasta

JUFO-konferenssit

Julkaisufoorumi-tietokannasta poistetut konferenssien vakiintuneet nimet. Muunnettu Julkaisufoorumin sivulta ladatusta PDF-tiedostosta käyttäen Tabula-työkalua ja merkistöt siivottu ftfy:llä.

# vim:set ft=perl:ts=4:sw=4
#
# Ref.: http://librecat.org/Catmandu/
# https://github.com/LibreCat/Catmandu/wiki/Example%20Fix%20Script
# https://github.com/scriptotek/simplemarcparser/blob/master/src/BibliographicRecord.php
# For ElasticSearch 2.0
# See https://github.com/LibreCat/Catmandu-Store-Elasticsearch/commit/63795416d2585eab7af1d5263f5823b4cae94251
# <s>Note that we use _identifier over _id to cover deleted records which do not have _id</s>
# UPDATE: When importing from a MARC dump, we don't have OAI IDs, so use the simple _ids instead.
@mjlassila
mjlassila / item-status-query.sql
Created October 29, 2019 06:39
Listaa niteiden tilat BIBID:n perusteella.
SELECT
biblio.title as 'Nimeke',
CONCAT('<a href=\"/cgi-bin/koha/catalogue/moredetail.pl?',
'biblionumber=', biblio.biblionumber, '&itemnumber=', items.itemnumber,
'\">', items.barcode, '</a>' ) AS 'Viivakoodi',
items.datelastborrowed as 'Lainattu viimeksi',
items.datelastseen as 'Käsitelty viimeksi',
CONCAT(
'<a target="_blank" href=\"/cgi-bin/koha/members/moremember.pl?borrowernumber=',
borrower.borrowernumber,