Skip to content

Instantly share code, notes, and snippets.

View lonly197's full-sized avatar
💭
I may be slow to respond.

lonly lonly197

💭
I may be slow to respond.
View GitHub Profile
@lonly197
lonly197 / manage_spring_boot_app.sh
Created October 4, 2018 06:18
The script for manage the spring boot app
#!/bin/bash
# Set App Name
AppName="bumblebee-api"
# Set Version
Version="1.0.0"
# Set Jar Package File Name
SpringBoot="$AppName-$Version.jar"
# Set The Home Path
HomePath="/opt/bumblebee"
@lonly197
lonly197 / import_big_sql_file_to_mysql.sh
Last active October 4, 2018 06:19
Import a large sql dump file to a MySQL database from command line
#!/bin/sh
# store start date to a variable
imeron=`date`
echo "Import started: OK"
# set sql dump file
dumpfile="/home/lonly/big.sql"
ddl="set names utf8; "
@lonly197
lonly197 / Md5Utils.java
Created August 29, 2018 16:21
Utility methods for computing MD5 sums.
import com.google.inject.Singleton;
import lombok.extern.slf4j.Slf4j;
import org.apache.commons.codec.digest.DigestUtils;
import org.glassfish.jersey.internal.util.Base64;
import java.io.*;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
/**
@lonly197
lonly197 / spark_udfs.scala
Created August 29, 2018 16:19
Some Custom Spark UDF
import scala.collection.mutable.WrappedArray
import scala.collection.JavaConversions._
import scala.collection.JavaConverters._
import breeze.linalg.{DenseVector => BDV, SparseVector => BSV, Vector => BV}
import org.apache.spark.ml.linalg.{DenseVector, Matrices, Matrix, SparseVector, Vector, Vectors}
import org.apache.spark.mllib.linalg.{Vectors => OldVectors}
import org.apache.spark.sql.UDFRegistration
import streaming.common.UnicodeUtils
@lonly197
lonly197 / spark_udf_dataframe_dropDuplicateCols.scala
Created August 29, 2018 16:15
Drop duplicate columns on a dataframe in spark
import org.apache.spark.sql.DataFrame
import scala.annotation.tailrec
implicit class DataFrameOperations(df: DataFrame) {
def dropDuplicateCols(rmvDF: DataFrame): DataFrame = {
val cols = df.columns.groupBy(identity).mapValues(_.size).filter(_._2 > 1).keySet.toSeq
@tailrec
def deleteCol(df: DataFrame, cols: Seq[String]): DataFrame = {
if (cols.size == 0) df else deleteCol(df.drop(rmvDF(cols.head)), cols.tail)
@lonly197
lonly197 / spark_udf_concat_dataframe.scala
Created August 29, 2018 16:08
Union two DataFrames with different amounts of columns in spark
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions._
def concat(df1: DataFrame, df2: DataFrame): DataFrame = {
val cols1 = df1.columns.toSet
val cols2 = df2.columns.toSet
val total = cols1 ++ cols2 // union
def expr(myCols: Set[String], allCols: Set[String]) = {
// Base64 encode
val text = "This is plaintext."
val bytesEncoded = java.util.Base64.getEncoder.encode(text.getBytes())
// Base64 decode
val textDecoded = new String(java.util.Base64.getDecoder.decode(bytesEncoded))
println(textDecoded)
@lonly197
lonly197 / ExcelUtils.java
Created June 26, 2018 06:38
Excel解析工具类(依赖于poi-ooxml)
import com.google.common.collect.Lists;
import org.apache.commons.lang3.StringUtils;
import org.apache.poi.hssf.usermodel.HSSFRow;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Cell;
import org.apache.poi.ss.usermodel.CellType;
import org.apache.poi.xssf.usermodel.XSSFRow;
import org.apache.poi.xssf.usermodel.XSSFSheet;
import org.apache.poi.xssf.usermodel.XSSFWorkbook;
@lonly197
lonly197 / MapUtils.java
Last active June 26, 2018 06:34
提供一个工具咧,通过@fieldmap标注来识别,把前端传过来的json,在@requestbody转的对象后,再转为Map对象
import com.google.common.base.CaseFormat;
import com.google.common.base.Preconditions;
import com.google.common.collect.Maps;
import java.beans.BeanInfo;
import java.beans.Introspector;
import java.beans.PropertyDescriptor;
import java.lang.reflect.Method;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
@lonly197
lonly197 / find_correlation.py
Created January 2, 2018 16:15
相关性阈值,它会去掉那些高度相关的特征(亦即,这些特征的特征值变化与其他特征非常相似)。它们提供的是冗余信息。
import pandas as pd
import numpy as np
def find_correlation(df, thresh=0.9):
"""
Given a numeric pd.DataFrame, this will find highly correlated features,
and return a list of features to remove
params:
- df : pd.DataFrame
- thresh : correlation threshold, will remove one of pairs of features with