This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0 | |
down vote | |
accepted | |
You could do this through positive lookahead, | |
>>> import re | |
>>> s = "My name is really nice. This is so awesome." | |
>>> m = re.findall(r'(?=(\b\w+\b \S+))', s) | |
>>> m |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#-*- coding:utf-8 -*- | |
from itertools import izip | |
from math import sqrt | |
from itertools import islice | |
from HTMLParser import HTMLParser | |
import MongoDBConn | |
from bson import ObjectId | |
dbconn=MongoDBConn.DBConn() | |
dbconn.connect() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Custom plugins may be added to ~/.oh-my-zsh/custom/plugins/ | |
# Example format: plugins=(rails git textmate ruby lighthouse) | |
# Add wisely, as too many plugins slow down shell startup. | |
plugins=(git autojump) | |
# User configuration | |
export PATH="/Users/dongjian/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin" | |
# export MANPATH="/usr/local/man:$MANPATH" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def f7(seq): | |
seen = set() | |
seen_add = seen.add | |
return [x for x in seq if not (x in seen or seen_add(x))] | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sqlContext.createDataFrame(rs,["white_value"]).registerTempTable("stray_user_white_list_af_expand") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
find . -size +500k| grep -v ipynb >> .gitignore |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
This script provides reusable code for generating lead/lag time | |
delta features (using epoch time) for an arbitrary choice of lead/lag orders. | |
You can use this to generate useful visit time delta features for | |
this competition,and it should be fairly straightforward to | |
apply the functions to other datasets as well. Feel free to just | |
take the output from this kernel as features, they'll match the original | |
order of train and test. I hope it's helpful! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
3 | |
down vote | |
accepted | |
This is what you could do, split the string with pipe and explode the data using spark function | |
import org.apache.spark.sql.functions._ | |
import spark.implicits._ | |
val df = Seq(("a1", "b1", "c1|c2|c3|c4")).toDF("A", "B", "C") |