Skip to content

Instantly share code, notes, and snippets.

View mattiasostmar's full-sized avatar

Mattias Östmar mattiasostmar

View GitHub Profile
{
"metadata": {
"name": "Playing with MySql"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
@mattiasostmar
mattiasostmar / Pandas CSV
Last active December 18, 2015 07:19
Learning Pandas within iPython Notebook
{
"metadata": {
"name": "CSV read with Pandas"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
import re
import sys
_, infile, outfile = sys.argv
s_pat_row = r'''
"([^"]+)" # match column; this is group 1
\s*\t\s* # match separating tab and any optional white space
([^\t]+) # match a string of non-tab chars; this is group 2
\s*\t\s* # match separating tab and any optional white space
@mattiasostmar
mattiasostmar / chunker.py
Last active September 16, 2018 13:53
A slight moderation to get the code working on http://bdurblg.blogspot.se/2011/06/python-split-any-file-binary-to.html (darn, need to get those indentations right...)
# define the function to split the file into smaller chunks
def splitFile(inputFile,chunkSize):
#read the contents of the file
f = open(inputFile, 'rb')
data = f.read()
f.close()
# get the length of data, ie size of the input file in bytes
bytes = len(data)
@mattiasostmar
mattiasostmar / Vincent ipython notebook examples
Created February 3, 2014 23:13
Vincent ipython notebook examples
{
"metadata": {
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{

README is empty

import regex
import logging
import gensim
from gensim import corpora, models
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
class MySentences(object):
def __init__(self, fname):
self.fname = fname
@mattiasostmar
mattiasostmar / tree_versions_code_and_traceback.md
Last active January 8, 2016 21:36
JSON-STAGGER UnicodeDecodeError

Working:

import requests
text = "Fördomen har alltid sin rot i vardagslivet - Olof Palme 🙈🙈🙌🙆👪👏yüá🎧ÖÅÄê"
r = requests.post("http://json-tagger.herokuapp.com/tag",data=dict(data=text))
r.json()

{'entities': [{'token_ids': ['tok:0:8', 'tok:0:9', 'tok:0:10'], 'word_form': 'Olof Palme 🙈🙈🙌🙆👪👏yüá🎧ÖÅÄê'}], 'sentences': [[{'morph_feat': 'UTR|SIN|DEF|NOM',

"0307.n.0003": {
"cats": null,
"meta_cats": null,
"filename": "Utgrävningar_i_Teotihuacan_(1932)_-_SMVK_-_0307.n.0003",
"info": "{{photograph\n|photographer = {{creator:Sigvald_Linné}}\n|title = \n|description = {{sv|Chichén Itzá, Dzitas. Utgrävningar i Teotihuacan (1932).}}\n{{en|Images from the 1932 Sigvald Linné archeological expedition at Teotihuacán, Mexico.}}\n|depicted place = Chichén Itzá, Dzitas\n|date = 1932\n|medium = \n|dimensions = \n|institution = {{Institution:Statens museer för världskultur}}\n|department = [[:d:Q1371375|Etnografiska muséet]]\n|references = \n|object history = \n|exhibition history = \n|credit line = \n|inscriptions = \n|notes = \n|accession number = {{SMVK-EM-link|1=foto|2=2803890|3=0307.n.0003}}\n|source = Original file name, as received from SMVK: <br /> '''0307.n.0003.tif'''\n{{SMVK_cooperation_project|COH|museu
/Users/mos/anaconda/bin/python /Users/mos/PycharmProjects/Medelhavsmuseet_2016-08/check_pages_without_images_smvk-em.py
--- [[commons:File:Från utgrävningarna vid Xolalpan - SMVK - 0307.a.0154.tif]]
{{speedydelete|broken file upload}}
{{photograph
|photographer = {{creator:Sigvald_Linné}}
|title =
|description = {{sv|Från utgrävningarna vid Xolalpan. Teotihuacan. Utgrävningar i Teotihuacan (1932).}}
{{en|Images from the 1932 Sigvald Linné archeological expedition at Teotihuacán, Mexico.}}
|depicted place = Q172613