Skip to content

Instantly share code, notes, and snippets.

@rafaelpezzuto
rafaelpezzuto / check-last-record.py
Created June 10, 2021 00:05
check last records - SciELO OAI-PMH
import requests.exceptions as r_exceptions
import urllib3.exceptions as u_exceptions
from articlemeta.client import RestfulClient
from datetime import datetime, timedelta
from sickle import Sickle
from sickle.oaiexceptions import NoRecordsMatch
URL_STATIC_PDF_FILES = 'http://%s/static_pdf_files.txt'
@rafaelpezzuto
rafaelpezzuto / check-static-files.ipynb
Created June 10, 2021 00:10
check static files update date
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rafaelpezzuto
rafaelpezzuto / doaj_dump_parser.py
Created January 17, 2022 22:44
Parse doaj dump
import csv
import json
import os
DOAJ_DUMP_FILES = [f for f in os.listdir() if f.endswith('json')] # diretório com arquivos JSON do DOAJ DUMP
DOAJ_KEYS = ['publisher', 'doi', 'czu', 'pissn', 'issn', 'eissn', 'elocationid'] # chaves esperadas nos dados DOAJ
def extract_links(data):
@rafaelpezzuto
rafaelpezzuto / parse_scielo_dump.py
Created January 17, 2022 22:46
Parse SciELO Brazil Mongo dump
import csv
import json
import os
PATH_ISSNS = 'issns.txt'
PATH_DOAJ_DUMP_PARSED = 'doaj-dump-parsed.csv'
issns = set([i.strip() for i in open(PATH_ISSNS)])
scielo_doaj_docs = []
@rafaelpezzuto
rafaelpezzuto / match_scielo_doaj.py
Last active January 18, 2022 00:34
Match SciELO and DOAJ
import urllib.parse
PATH_SCL_DOAJ_DOCS = 'scl-doaj-docs.csv'
PATH_SCL_PIDS_DOIS = 'scl-pids-dois.csv'
PATH_SCL_PIDS_DOAJ_LAST = 'scl-doaj-docs-jan-22.csv'
PATH_ISSNS = 'issns.txt'
scl_doaj_docs = [i.strip().split('|') for i in open(PATH_SCL_DOAJ_DOCS)]
scl_pids_dois = [i.strip().split(',') for i in open(PATH_SCL_PIDS_DOIS)]
issns = set([i.strip().upper() for i in open(PATH_ISSNS)])
@rafaelpezzuto
rafaelpezzuto / gen-journal-tables.ipynb
Created June 27, 2022 14:42
Gera arquivos CSV de bibliometria
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@rafaelpezzuto
rafaelpezzuto / count-unconformities.ipynb
Last active July 4, 2022 20:37
Contabiliza inconformidades entre StatBiblio e NewBiblio
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.