Daniel Hnyk hnykda

## tribeca.log
tribeca_1        | n:    --minUptime not set. Defaulting to: 1000ms
tribeca_1        | warn:    --spinSleepTime not set. Your script will exit if it does not stay up for at least 1000ms
tribeca_1        | {"name":"tribeca:config","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"TradedPair = BTC/USD","time":"2017-08-19T16:23:21.732Z","v":0}
tribeca_1        | {"name":"tribeca:config","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"WebClientUsername = dan","time":"2017-08-19T16:23:21.746Z","v":0}
tribeca_1        | {"name":"tribeca:config","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"WebClientPassword = seznam12","time":"2017-08-19T16:23:21.748Z","v":0}
tribeca_1        | {"name":"tribeca:main","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"Requiring authentication to web client","time":"2017-08-19T16:23:21.748Z","v":0}
tribeca_1        | {"name":"tribeca:config","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"WebClientListenPort = 3000","time":"2017-08-19T16:23:21.755Z","v":0}
tribe

## errors.log
$ for i in $(ls -d */); do PYXECUTOR_LOG_LEVEL=ERROR pyxecutor spss2py -g $i/spss $i; done
20180103 17:32:50:spss2py.parser:ERROR - `Encoding: UTF-8` on lines 1:1 not parsed!
20180103 17:32:50:spss2py.parser:ERROR - `FILTER OFF
USE ALL` on lines 8:9 not parsed!
20180103 17:32:50:spss2py.parser:ERROR - `alter type q4(f20)
alter type q6_1_Count(f20)
alter type q6_2.3.4_Count(f20)
alter type q6_5.6(f20)
alter type h_q6(f20)
alter type q13Ire(f20)

## hdf_compression.log
dan at think460s in ~/load/FinalWave422122017/tst
$ ll
total 68M
-rw-r--r-- 1 dan dan 68M Jan  3 19:54 a.hdf
(pygwi)
dan at think460s in ~/load/FinalWave422122017/tst
$ python -c "import pandas; df = pandas.read_hdf('a.hdf'); df.to_hdf('b.hdf', 'df', complib='blosc:snappy'); pandas.show_versions()"

INSTALLED VERSIONS
------------------

## running.log
$ pyxecutor exec_py data/input/q4_2017/reduced/core.hdf ../lagoon/core/q4_2017/desktop_1/ data/output/q4_2017/desktop_1.hdf                                                                                     <<<
20180105 22:24:58:git.cmd:DEBUG - Popen(['git', 'version'], cwd=/home/dan/prac/gwi/pyxecutor, universal_newlines=False, shell=None)
20180105 22:24:58:git.cmd:DEBUG - Popen(['git', 'version'], cwd=/home/dan/prac/gwi/pyxecutor, universal_newlines=False, shell=None)
20180105 22:24:58:s3fs.core:DEBUG - Open S3 connection.  Anonymous: False
20180105 22:24:58:pyxecutor.io:DEBUG - Getting data/input/q4_2017/reduced/core.hdf
20180105 22:24:58:pyxecutor.io:DEBUG - Loading file from local file system
20180105 22:25:01:pyxecutor.dsl.ops:DEBUG - Assigning only to 85284 rows
20180105 22:25:01:pyxecutor.dsl.pipeline:DEBUG - 1: Assign(condition=(q5d_1==1 | q5d_2==1 | q5d_3==1 | q5d_4==1 | q5d_5==1), target=q999_99, value=1)
20180105 22:25:01:pyxecutor.dsl.ops:INFO - 85284 rows were selected
20180105 22:25:01:pyxecut

## run.log
dan at think460s in ~/prac/gwi/pyxecutor (feature/add-logging-and-skip-exceptions●)
$ SKIP_ERRORS=True INIT_DEBUG=True SAVE_INTERMEDIATE_DATASETS=True pyxecutor exec_py ../pyxecutor/data/input/q4_2017/reduced/core.hdf  ../lagoon/core/q4_2017/desktop_1/ data/output/q4_2017/desktop_1.hdf
20180108 22:48:54:git.cmd:DEBUG - Popen(['git', 'version'], cwd=/home/dan/prac/gwi/pyxecutor, universal_newlines=False, shell=None)
20180108 22:48:54:git.cmd:DEBUG - Popen(['git', 'version'], cwd=/home/dan/prac/gwi/pyxecutor, universal_newlines=False, shell=None)
20180108 22:48:54:s3fs.core:DEBUG - Open S3 connection.  Anonymous: False
20180108 22:48:54:pyxecutor.io:INFO - Loading '...lagoon.core.q4_2017.desktop_1.main' module ...
20180108 22:48:54:pyxecutor.main:ERROR - the 'package' argument is required to perform a relative import for '...lagoon.core.q4_2017.desktop_1.main'
Traceback (most recent call last):
  File "/home/dan/prac/gwi/pyxecutor/pyxecutor/main.py", line 72, in respond
    res = actions.env_exec(action, *arg

## gist:846ccd8d93b606ddd4c282c70e3ec594
20180118 14:03:39:pyxecutor.actions:INFO - Processing desktop_mobile_ext_4///spss/2.5 - Straightlining.sps -> desktop_mobile_ext_4//straightlining.py ...
20180118 14:03:39:spss2py.preprocessing:INFO - Preprocessed 784 lines
20180118 14:03:39:spss2py.preprocessing:INFO - Total number of lines in the output file: 705
20180118 14:03:39:spss2py.preprocessing:INFO - Total number of lines skipped: 79
20180118 14:03:39:spss2py.parser:WARNING - `q1108a_1_5,
q1108j_1_5,
q1108t_1_5,
q1108u_1_5,
q1108a2_1_5,
q1108v_1_5,

## gist:3f70f3ea05dfb399a0e6a12b99835571
20180204 02:07:02:pyxecutor.dsl.utils:DEBUG  [country_specific_screener:677] Assign(condition=((q3<18 | (s2_1==1 & q3<21) | s2_966==1 | s2_971==1 | s2_20==1) & q1017_1_1==1), target=q1017_1_1, value=0)
20180204 02:07:02:pyxecutor.dsl.ops:DEBUG  Ouch. Could not use simple mask. Falling back to pandas one.
20180204 02:07:03:pyxecutor.dsl.utils:DEBUG  [country_specific_screener:678] Assign(condition=((q3<18 | (s2_1==1 & q3<21) | s2_966==1 | s2_971==1 | s2_20==1) & q1017_1_2==1), target=q1017_1_2, value=0)
20180204 02:07:03:pyxecutor.dsl.ops:DEBUG  Ouch. Could not use simple mask. Falling back to pandas one.
20180204 02:07:03:pyxecutor.dsl.utils:DEBUG  [country_specific_screener:679] Assign(condition=((q3<18 | (s2_1==1 & q3<21) | s2_966==1 | s2_971==1 | s2_20==1) & q1017_1_3==1), target=q1017_1_3, value=0)
20180204 02:07:03:pyxecutor.dsl.ops:DEBUG  Ouch. Could not use simple mask. Falling back to pandas one.
20180204 02:07:04:pyxecutor.dsl.utils:DEBUG  [country_specific_screener:680] Assign(condition=((q3<18 | (s

## gist:91f9f195ed62033fe92bba583223912c
gwi-123

# raw data from qualtrics:

```
respondent_id,q2,q3,s2,panelprovider
respid-1,1,0,44,ondevice
respid-2,2,,1,usamp
respid-3,2,,3,usamp
```

## convert.py
# run as `python conver.py <path-to-your-zsh-history-file>

import sys

output_file = 'fish_converted_history'
zsh_history_file = sys.argv[1]

with open(zsh_history_file, 'r', errors='ignore') as ifile:
    result = []
    for cmd in ifile:

## gist:0099d0629f8ba45ace50c3e8896a2fb2
In [24]: %time df.loc[:, 'cc'] = np.full(df.shape[0], pd.np.nan, dtype='float16')
CPU times: user 286 µs, sys: 5.47 ms, total: 5.75 ms
Wall time: 4.41 ms

In [25]: %time df.loc[:, 'ccc'] = np.full(df.shape[0], pd.np.nan, dtype='float16')
CPU times: user 24.5 ms, sys: 19.2 ms, total: 43.7 ms
Wall time: 39.5 ms

In [26]: %time df.loc[:, 'cccc'] = np.full(df.shape[0], pd.np.nan, dtype='float16')
CPU times: user 492 ms, sys: 1.2 s, total: 1.69 s
	tribeca_1 \| n: --minUptime not set. Defaulting to: 1000ms
	tribeca_1 \| warn: --spinSleepTime not set. Your script will exit if it does not stay up for at least 1000ms
	tribeca_1 \| {"name":"tribeca:config","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"TradedPair = BTC/USD","time":"2017-08-19T16:23:21.732Z","v":0}
	tribeca_1 \| {"name":"tribeca:config","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"WebClientUsername = dan","time":"2017-08-19T16:23:21.746Z","v":0}
	tribeca_1 \| {"name":"tribeca:config","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"WebClientPassword = seznam12","time":"2017-08-19T16:23:21.748Z","v":0}
	tribeca_1 \| {"name":"tribeca:main","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"Requiring authentication to web client","time":"2017-08-19T16:23:21.748Z","v":0}
	tribeca_1 \| {"name":"tribeca:config","hostname":"ec40165fa73a","pid":16,"level":30,"msg":"WebClientListenPort = 3000","time":"2017-08-19T16:23:21.755Z","v":0}
	tribe
	$ for i in $(ls -d */); do PYXECUTOR_LOG_LEVEL=ERROR pyxecutor spss2py -g $i/spss $i; done
	20180103 17:32:50:spss2py.parser:ERROR - `Encoding: UTF-8` on lines 1:1 not parsed!
	20180103 17:32:50:spss2py.parser:ERROR - `FILTER OFF
	USE ALL` on lines 8:9 not parsed!
	20180103 17:32:50:spss2py.parser:ERROR - `alter type q4(f20)
	alter type q6_1_Count(f20)
	alter type q6_2.3.4_Count(f20)
	alter type q6_5.6(f20)
	alter type h_q6(f20)
	alter type q13Ire(f20)
	dan at think460s in ~/load/FinalWave422122017/tst
	$ ll
	total 68M
	-rw-r--r-- 1 dan dan 68M Jan 3 19:54 a.hdf
	(pygwi)
	dan at think460s in ~/load/FinalWave422122017/tst
	$ python -c "import pandas; df = pandas.read_hdf('a.hdf'); df.to_hdf('b.hdf', 'df', complib='blosc:snappy'); pandas.show_versions()"

	INSTALLED VERSIONS
	------------------
	$ pyxecutor exec_py data/input/q4_2017/reduced/core.hdf ../lagoon/core/q4_2017/desktop_1/ data/output/q4_2017/desktop_1.hdf <<<
	20180105 22:24:58:git.cmd:DEBUG - Popen(['git', 'version'], cwd=/home/dan/prac/gwi/pyxecutor, universal_newlines=False, shell=None)
	20180105 22:24:58:git.cmd:DEBUG - Popen(['git', 'version'], cwd=/home/dan/prac/gwi/pyxecutor, universal_newlines=False, shell=None)
	20180105 22:24:58:s3fs.core:DEBUG - Open S3 connection. Anonymous: False
	20180105 22:24:58:pyxecutor.io:DEBUG - Getting data/input/q4_2017/reduced/core.hdf
	20180105 22:24:58:pyxecutor.io:DEBUG - Loading file from local file system
	20180105 22:25:01:pyxecutor.dsl.ops:DEBUG - Assigning only to 85284 rows
	20180105 22:25:01:pyxecutor.dsl.pipeline:DEBUG - 1: Assign(condition=(q5d_1==1 \| q5d_2==1 \| q5d_3==1 \| q5d_4==1 \| q5d_5==1), target=q999_99, value=1)
	20180105 22:25:01:pyxecutor.dsl.ops:INFO - 85284 rows were selected
	20180105 22:25:01:pyxecut
	dan at think460s in ~/prac/gwi/pyxecutor (feature/add-logging-and-skip-exceptions●)
	$ SKIP_ERRORS=True INIT_DEBUG=True SAVE_INTERMEDIATE_DATASETS=True pyxecutor exec_py ../pyxecutor/data/input/q4_2017/reduced/core.hdf ../lagoon/core/q4_2017/desktop_1/ data/output/q4_2017/desktop_1.hdf
	20180108 22:48:54:git.cmd:DEBUG - Popen(['git', 'version'], cwd=/home/dan/prac/gwi/pyxecutor, universal_newlines=False, shell=None)
	20180108 22:48:54:git.cmd:DEBUG - Popen(['git', 'version'], cwd=/home/dan/prac/gwi/pyxecutor, universal_newlines=False, shell=None)
	20180108 22:48:54:s3fs.core:DEBUG - Open S3 connection. Anonymous: False
	20180108 22:48:54:pyxecutor.io:INFO - Loading '...lagoon.core.q4_2017.desktop_1.main' module ...
	20180108 22:48:54:pyxecutor.main:ERROR - the 'package' argument is required to perform a relative import for '...lagoon.core.q4_2017.desktop_1.main'
	Traceback (most recent call last):
	File "/home/dan/prac/gwi/pyxecutor/pyxecutor/main.py", line 72, in respond
	res = actions.env_exec(action, *arg
	20180118 14:03:39:pyxecutor.actions:INFO - Processing desktop_mobile_ext_4///spss/2.5 - Straightlining.sps -> desktop_mobile_ext_4//straightlining.py ...
	20180118 14:03:39:spss2py.preprocessing:INFO - Preprocessed 784 lines
	20180118 14:03:39:spss2py.preprocessing:INFO - Total number of lines in the output file: 705
	20180118 14:03:39:spss2py.preprocessing:INFO - Total number of lines skipped: 79
	20180118 14:03:39:spss2py.parser:WARNING - `q1108a_1_5,
	q1108j_1_5,
	q1108t_1_5,
	q1108u_1_5,
	q1108a2_1_5,
	q1108v_1_5,
	20180204 02:07:02:pyxecutor.dsl.utils:DEBUG [country_specific_screener:677] Assign(condition=((q3<18 \| (s2_1==1 & q3<21) \| s2_966==1 \| s2_971==1 \| s2_20==1) & q1017_1_1==1), target=q1017_1_1, value=0)
	20180204 02:07:02:pyxecutor.dsl.ops:DEBUG Ouch. Could not use simple mask. Falling back to pandas one.
	20180204 02:07:03:pyxecutor.dsl.utils:DEBUG [country_specific_screener:678] Assign(condition=((q3<18 \| (s2_1==1 & q3<21) \| s2_966==1 \| s2_971==1 \| s2_20==1) & q1017_1_2==1), target=q1017_1_2, value=0)
	20180204 02:07:03:pyxecutor.dsl.ops:DEBUG Ouch. Could not use simple mask. Falling back to pandas one.
	20180204 02:07:03:pyxecutor.dsl.utils:DEBUG [country_specific_screener:679] Assign(condition=((q3<18 \| (s2_1==1 & q3<21) \| s2_966==1 \| s2_971==1 \| s2_20==1) & q1017_1_3==1), target=q1017_1_3, value=0)
	20180204 02:07:03:pyxecutor.dsl.ops:DEBUG Ouch. Could not use simple mask. Falling back to pandas one.
	20180204 02:07:04:pyxecutor.dsl.utils:DEBUG [country_specific_screener:680] Assign(condition=((q3<18 \| (s
	gwi-123

	# raw data from qualtrics:

	```
	respondent_id,q2,q3,s2,panelprovider
	respid-1,1,0,44,ondevice
	respid-2,2,,1,usamp
	respid-3,2,,3,usamp
	```
	# run as `python conver.py <path-to-your-zsh-history-file>

	import sys

	output_file = 'fish_converted_history'
	zsh_history_file = sys.argv[1]

	with open(zsh_history_file, 'r', errors='ignore') as ifile:
	result = []
	for cmd in ifile:
	In [24]: %time df.loc[:, 'cc'] = np.full(df.shape[0], pd.np.nan, dtype='float16')
	CPU times: user 286 µs, sys: 5.47 ms, total: 5.75 ms
	Wall time: 4.41 ms

	In [25]: %time df.loc[:, 'ccc'] = np.full(df.shape[0], pd.np.nan, dtype='float16')
	CPU times: user 24.5 ms, sys: 19.2 ms, total: 43.7 ms
	Wall time: 39.5 ms

	In [26]: %time df.loc[:, 'cccc'] = np.full(df.shape[0], pd.np.nan, dtype='float16')
	CPU times: user 492 ms, sys: 1.2 s, total: 1.69 s