shaypal5/pdpipe_2nd_look.py

## pdpipe_2nd_look.py
>>> df = pd.DataFrame(
...   [[23, 'Jo', 45], [19, 'Bo', 72], [15, 'Di', 12], [5, 'Jo', 0]],
...   columns=['age', 'name', 'salary'])
>>> df
   age name  salary
0   23   Jo      45
1   19   Bo      72
2   15   Di      12
3    5   Jo       0
>>> pipeline = pdp.DropDuplicates('name').Bin({'salary': [0, 20, 50]}) \
...   + pdp.SetIndex('name').ColDrop('name')
>>> pipeline
A pdpipe pipeline:
[ 0]  Drop duplicates in columns 'name'
[ 1]  Bin salary by [0, 20, 50].
[ 2]  Set indexes.
[ 3]  Drop columns 'name'
>>>  pipeline(df)
FailedPreconditionError: Pipeline stage failed because not all columns 'name' were found in the input dataframe.
The above exception was the direct cause of the following exception:
...
PipelineApplicationError: Exception raised in stage [ 3] PdPipelineStage: Drop columns 'name'
>>> pipeline[0:3](df, verbose=True)
- Drop duplicates in columns 'name'
1 rows dropped.
- Bin salary by [0, 20, 50].
salary: 100%|████████████████| 1/1 [00:00<00:00, 338.39it/s]
- Set indexes.
      age salary
name
Jo     23  20-50
Bo     19    50≤
Di     15   0-20
	>>> df = pd.DataFrame(
	... [[23, 'Jo', 45], [19, 'Bo', 72], [15, 'Di', 12], [5, 'Jo', 0]],
	... columns=['age', 'name', 'salary'])
	>>> df
	age name salary
	0 23 Jo 45
	1 19 Bo 72
	2 15 Di 12
	3 5 Jo 0
	>>> pipeline = pdp.DropDuplicates('name').Bin({'salary': [0, 20, 50]}) \
	... + pdp.SetIndex('name').ColDrop('name')
	>>> pipeline
	A pdpipe pipeline:
	[ 0] Drop duplicates in columns 'name'
	[ 1] Bin salary by [0, 20, 50].
	[ 2] Set indexes.
	[ 3] Drop columns 'name'
	>>> pipeline(df)
	FailedPreconditionError: Pipeline stage failed because not all columns 'name' were found in the input dataframe.
	The above exception was the direct cause of the following exception:
	...
	PipelineApplicationError: Exception raised in stage [ 3] PdPipelineStage: Drop columns 'name'
	>>> pipeline[0:3](df, verbose=True)
	- Drop duplicates in columns 'name'
	1 rows dropped.
	- Bin salary by [0, 20, 50].
	salary: 100%\|████████████████\| 1/1 [00:00<00:00, 338.39it/s]
	- Set indexes.
	age salary
	name
	Jo 23 20-50
	Bo 19 50≤
	Di 15 0-20