Skip to content

Instantly share code, notes, and snippets.

View ettorerizza's full-sized avatar
🏠
Working from home

Ettore Rizza ettorerizza

🏠
Working from home
View GitHub Profile
@ettorerizza
ettorerizza / csv_to_sqlserver.ps
Last active April 25, 2020 16:49
bulk import a folder of csv into sql server, creating tables on the fly
#Install-Module dbatools
#In case of scripts are disabled, run first :
#powershell -noprofile -ExecutionPolicy bypass
import-module dbatools;
Get-ChildItem -Path "C:\CSV_PATH" | ForEach-Object {
Import-DbaCsv -Csv $_.FullName -SQLInstance "DESKTOP-C5EUKT9" -Database "stagging" -AutoCreateTable
@ettorerizza
ettorerizza / most_common.py
Created March 12, 2020 08:10
Most common elements in a list (with ties)
def most_commons(List):
"""Return a new list with the most common elements
in a list
"""
from collections import Counter
count = Counter(List)
freq_list = count.values()
max_cnt = max(freq_list)
total = freq_list.count(max_cnt)
most_commons = count.most_common(total)
# Source : https://pythonprogramminglanguage.com/logistic-regression-spam-filter/
# dataset : https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model.logistic import LogisticRegression
from sklearn.model_selection import train_test_split, cross_val_score
df = pd.read_csv(r'C:/Users/student/Desktop/spam detect logistic regression python/SMSSpamCollection', delimiter='\t',header=None)
@ettorerizza
ettorerizza / post_request.py
Created January 2, 2020 06:47
How to use a POST APi with Jython in OpenRefine
import urllib
import urllib2
import json
url = 'https://api.monkeylearn.com/v3/classifiers/cl_pi3C7JiL/classify/'
headers = {
import os
CHROME_PATH = r"/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome"
url = "https://www.tidytextmining.com/tfidf.html"
def url_to_pdf(url, filename):
chrome_args = [CHROME_PATH,
@ettorerizza
ettorerizza / textfolder_to_csv.py
Last active November 3, 2019 13:52
Import the content of each files in a folder in a unique csv where each row contains the content of a file
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Import the content of each files in a folder in a unique csv
where each row contains the content of a file
Arguments:
-i or --inputfolder : path to the folder containing the files
@ettorerizza
ettorerizza / scraping_4instance.py
Created October 6, 2019 12:24
Script destiné à scraper les noms de cabinettards sur un vieux site au HTML très pourri
#!/usr/bin/env python
#-*- coding: utf-8 -*-
"""
Script destiné à scraper les noms de cabinettards sur le vieux site au HTML très pourri de 4instance
"""
# J'importe les modules externes qui seront nécessaires
# A installer au préalable en ligne de commandes (ou dans le terminal de VSCode)
# exemple : pip install bs4 ; pip install requests ; pip install pandas ; pip install regex
#!/usr/bin/env python
import csv
from pymarc import MARCReader
from os import listdir
from re import search
# change this line to match your folder structure
SRC_DIR = '/path/to/mrc/records'
@ettorerizza
ettorerizza / urls_to_pdf.py
Last active April 15, 2024 16:50
List of urls to PDF with headless chrome (Mac)
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import requests
from bs4 import BeautifulSoup
import glob
from PyPDF2 import PdfFileMerger
#Todo: debug this function
@ettorerizza
ettorerizza / import_viaf.pl
Created May 2, 2019 21:26 — forked from phochste/import_viaf.pl
Match authors against VIAF using Catmandu and Linked Data Fragments
#!/usr/bin/env perl
#
# Match authors against VIAF
#
# License: http://dev.perl.org/licenses/artistic.html
#
# Author: Patrick Hochstenbach <Patrick.Hochstenbach@UGent.be>
#
# Apr 2015
$|++;