Skip to content

Instantly share code, notes, and snippets.

View romainx's full-sized avatar
☃️
Holidays

Romain romainx

☃️
Holidays
View GitHub Profile
@romainx
romainx / python-test_import.md
Last active January 31, 2022 08:40
python - test import

Python Test Package

To check if a Python package works as expected one solution is to try to import its main (or top level) modules. It's possible to get the list of all its top level modules from the [top_level.txt file][1] and to try to import them.

This file is a list of the top-level module or package names provided by the project, one Python identifier per line.

Here is a test to illustrate this solution.

@romainx
romainx / pytest.ini
Created May 15, 2020 05:45
[pytest best practices] Some pytest best practices #python #pytest
[pytest]
# Default options:
# - `--tb=line` stand for shorter traceback format (only one line per failure)
# - `-rfEPpxX` Select the short summary info to display
# See https://docs.pytest.org/en/latest/usage.html#modifying-python-traceback-printing
addopts = --tb=line -rfEPpxX
log_cli = 1
log_cli_level = INFO
log_cli_format = %(asctime)s [%(levelname)8s] %(message)s (%(filename)s:%(lineno)s)
log_cli_date_format=%Y-%m-%d %H:%M:%S
@romainx
romainx / lump.R
Last active April 19, 2020 13:55
[Reduce number of categories] Reduce the number of categories by lumping some of them together into an "Other" category #R
# Refs: https://forcats.tidyverse.org/reference/fct_lump.html
# A test data frame
set.seed(2)
df <- tibble(cat = factor(sample(c("a", "b", "c", "d"), 10, replace = TRUE)),
val = sample(1:10))
df %>% head()
# A tibble: 6 x 2
# cat val
# <fct> <int>
@romainx
romainx / drop_na.R
Last active April 16, 2020 05:32
[Drop NA] Drop rows containing missing values NA in a data frame #python #R
# Refs: https://tidyr.tidyverse.org/reference/drop_na.html
library(dplyr)
library(tidyr)
df <- tibble(x = c(1, 2, NA), y = c("a", NA, "b"))
# Drop each row containing at least a NA
df %>% drop_na()
# # A tibble: 1 x 2
# x y
@romainx
romainx / test_ec2.yml
Last active May 13, 2018 09:48
Sample Ansible Playbook listing EC2 t2.micro instances
---
# Source: https://gist.github.com/romainx/681f9ea6a96ebe79ea970289cae1a59f
- name: List EC2 instances
hosts: localhost
# To run the playbook locally
# http://docs.ansible.com/ansible/latest/user_guide/playbooks_delegation.html#local-playbooks
connection: local
# There is no need to gather facts here
gather_facts: false
vars:
@romainx
romainx / python_common.py
Last active October 7, 2018 16:22
Python snippets
# Compare strings case insensitive
string1 = 'Hello'
string2 = 'hello'
if string1.lower() == string2.lower():
print "The strings are the same (case insensitive)"
else:
print "The strings are not the same (case insensitive)"
@romainx
romainx / blinker_demo.py
Last active April 27, 2018 06:36
Blinker demo
# The only import needed
from blinker import signal
# Signals definition
number_generator_number = signal('number_generator_number', doc='Return a generated number')
number_generator_start = signal('number_generator_start', doc='The number generator has started')
number_generator_end = signal('number_generator_end', doc='The number generator has ended')
class NumberGenerator():
"""A dumb number generator"""

#Ambari API - Run all Service Checks In order to check the status and stability of your cluster it makes sense to run the service checks that are included in Ambari. Usually each Ambari Service provides its own service check, but their might be services that wont include any service check at all. To run a service check you have to select the service (e.g. HDFS) in Ambari and click "Run Service Check" in the "Actions" dropdown menu.

Service Checks can be started via the Ambari API and it is also possible to start all available service checks with a single API command. To bulk run these checks it is necessary to use the same API/method that is used to trigger a rolling restart of Datanodes (request_schedules). The "request_schedules" API starts all defined commands in the specified order, its even possible to specify a pause between the commands.

Available Service Checks:

Service Name service_name Command
HDFS HDFS HDFS_SERVICE_CHECK
@romainx
romainx / paste.py
Last active May 5, 2018 14:38
One of the tedious thing in Stack Overflow is to grab example data provided by users in order to be able to use it to reproduce the case and try to solve it. Here is how to do it efficiently in R and Python, hope this will help you being the first to answer!
import pandas as pd
import io
# Paste the text by using of triple-quotes to span String literals on multiple lines
zz = """Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
"""
@romainx
romainx / 0_reuse_code.js
Created June 25, 2016 17:23
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console