Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View liopic's full-sized avatar
💡

Julio Martinez liopic

💡
View GitHub Profile
@liopic
liopic / java_and_scala_installation.md
Created October 29, 2021 18:32
Java and Scala installation

Installing Java 8 or 11 on Linux

sudo apt-get install openjdk-8-jdk
sudo apt-get install openjdk-11-jdk
update-java-alternatives --list
sudo update-java-alternatives --set /path/to/java-8

It is also recommended to follow the instructions on installing sbt on Linux.

@liopic
liopic / describe_table.sql
Last active June 10, 2021 09:10
Ways to display tables' names in diferent DB
-- redshift/postgres
select column_name,data_type
from information_schema.columns
where table_name = 'table_name';
-- databricks
describe table tablename;
@liopic
liopic / tensorflow_notes.md
Last active July 28, 2020 18:35
Notes for Coursera's TensorFlow in Practice Specialization

Introduction to TensorFlow

Basic code

import tensorflow as tf
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(64, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
@liopic
liopic / notes.md
Last active June 13, 2020 10:16
DataEngConf 2018

DataEngConf 2018 Barcelona

Data Lake

  • Health metrics (performance, bottlenecks)
  • Metadata: tables size, num (min, max, ...), string (distinct, common)
  • Costs: which models are expensive, why real time?
  • Dashboard for monitoring
  • Query annotations
  • Data validation + anomalies
@liopic
liopic / notes.md
Created October 14, 2019 21:14
PyCon DE & PyData Berlin 2019

PyCon DE & PyData Berlin, notes

Lots of lots of people!

Algo.Rules - Ethics in code

  • List of rules

Airflow for beginners

  • Operator (worker), DAG (instructions), Task (job), Connection (credentials), Hooks (common interfaces to external services, Slack Hook), Variables (envs), XComs (small messages between Tasks)
  • github.com/karpenkovarya/airflow_for_beginners
@liopic
liopic / notes.md
Last active October 14, 2019 20:32
PyConEs 2019

PyconES, notes

Lots of people!

Keynote Safia

  • You are the average of the 5 people you spent most the time with
  • Conway law (organization structure)
  • Like code ideas

Mom, I want to be a data artist

  • Copying and following inspiring people
@liopic
liopic / diputados_13.csv
Last active July 31, 2019 21:30
Diputados de la 13ª legislatura
id nombre grupo twitter
1 Pastor Julián, Ana María PP https://twitter.com/anapastorjulian
2 Canales Duque, Mariana de Gracia PSOE https://twitter.com/Graciacanales3
3 Sahuquillo García, Luis Carlos PSOE https://twitter.com/lcsahuquillo
4 Pita Cárdenes, María del Carmen UP https://twitter.com/meripita44
5 Morlà Florit, Pau PSOE https://twitter.com/pmorla68
6 Pons Sampietro, Pere Joan PSOE https://twitter.com/perejoanpons
7 Movellán Lombilla, Diego PP https://twitter.com/DiegoMovellan
8 Carrillo de los Reyes, Beatriz Micaela PSOE https://twitter.com/beamcarrillo
9 Píriz Maya, Víctor Valentín PP https://twitter.com/vicpiriz1975
@liopic
liopic / deterministic.py
Last active July 25, 2020 19:09
Keep Keras model deterministic
from numpy.random import seed
seed(42)
from tensorflow import set_random_seed
set_random_seed(42)
# tf 2.x
tf.keras.backend.clear_session()
tf.random.set_seed(51)
np.random.seed(51)
@liopic
liopic / notes.md
Last active July 14, 2018 06:31
PyDataBerlin 2018

PyData Berlin

FRIDAY

  • Text analysis
    • libraries
      • nltk - not as well maintain, old academic code
      • spacy - has languages models
      • gensim included corpus - Lee Background
  • NLP
@liopic
liopic / notes.md
Last active May 21, 2017 17:09
PyDataBcn 2017

keynote Travis Oliphant

  • NumFocus (ngo)
  • ML -> python
  • anaconda makes ML magic available to mortals
    • modeling, predicting, classif, visualization
    • feat labeling, data clean, data extrac, scaling, deploy
  • spyder IDE
  • recommends: Scikit, TF, Keras, XGBoost
  • intros de numpy, scipy(stats helpers), matplotlib
  • numba (bigger nodes, scale up) vs dask (more nodes, scale out), blaze (best of both: GPU cluster)