Skip to content

Instantly share code, notes, and snippets.

@PietrH
PietrH / intermed.R
Created June 18, 2019 11:31
Save intermediate result in dplyr pipe
library(tidyverse)
head(test) %>%
{.->>intermed_result} %>% #save intermediate result
print()
@PietrH
PietrH / date_day_not_equal.txt
Created June 19, 2019 10:36
Check if a date doesn't start with a number of day values in Openrefine GREL (General Refine Expression Language)
#parse the date as a string and extract the day,
#compare it to the string value '1' and '15' and if it's equal to either, respond TRUE else respond FALSE
or((value.toDate('dd MMM yyyy').toString('d'))==toString(1),(value.toDate('dd MMM yyyy').toString('d'))==toString(15))
#to be used as a Facet in OpenRefine
@PietrH
PietrH / distinct_rows.R
Created June 25, 2019 11:58
Keep only distinct rows of a single column in a dataframe in R, but return all columns of these rows
library(dplyr)
distinct(DataFrame,Column_To_Filter_On,.keep_all = TRUE) %>%
view('Dataframe distinct')
@PietrH
PietrH / grepl.R
Created June 26, 2019 08:01
R grepl Function, REGEX examples
#Orignial Source: http://www.endmemo.com/program/R/grepl.php
#grepl returns TRUE if a string contains the pattern, otherwise FALSE; if the parameter is a string vector,
#returns a logical vector (match or not for each element of the vector).
grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE)
#pattern: regular expression, or string for fixed=TRUE
#x: string, the character vector
@PietrH
PietrH / check_string_ending.txt
Created June 26, 2019 09:13
Checks if a string ends with a certain character, and if this is the case, omits that character in OpenRefine GREL
if(value.endsWith(';'),substring(value,0,-1),value)
@PietrH
PietrH / xpath_basic_syntax.md
Last active July 4, 2019 10:23
XPath basic syntax
Expression Description
nodename Selects all nodes with the name "nodename"
/ Selects from the root node
// Selects nodes in the document from the current node that match the selection no matter where they are
. Selects the current node
.. Selects the parent of the current node
@ Selects attributes
[] Condition on a selection
@PietrH
PietrH / xpath_wildcards.md
Created July 4, 2019 10:31
Wildcards for XPath queries
Wildcard Description
* matches any element node
@* matches any attribute node
node() matches any node
@PietrH
PietrH / number_from_string_re.py
Created July 5, 2019 12:12
Extract a number from a string in Python using regular expressions
import re #regular expressions in Python
def number_from_string(string):
return re.findall("\d+",string)
#The function will return a list of all digit sequences in the string
@PietrH
PietrH / ChangeExt.ps1
Created July 8, 2019 09:59
Bulk change the extension of files in a folder in Powershell
Get-ChildItem -Path C:\Demo -Filter *.txt | Rename-Item -NewName {[System.IO.Path]::ChangeExtension($_.Name, ".old")}
@PietrH
PietrH / count_nchar.R
Created July 15, 2019 06:59
Frequency table of character length
library(dplyr)
#we want to produce a frequency table of the length (nchar) of a column 'column' of our dataframe 'dataset'
mutate(dataset,nchar=nchar(column)) %>% count(nchar)