Darin Plutchok dplutcho

## ai-deep-dive-wordcloud-analysis.html

<style>
  @import url('https://fonts.googleapis.com/css2?family=DM+Serif+Display:ital@0;1&family=DM+Sans:wght@300;400;500&display=swap');

  * { box-sizing: border-box; margin: 0; padding: 0; }

  :root {
    --scene-bg: var(--color-background-primary);
    --surface: var(--color-background-secondary);
    --border: var(--color-border-tertiary);

## pyspark_nlp_error_11_22_2019
ipdb>  n
> <ipython-input-73-5e4223e8e1f9>(31)tokenizer_unigram()
     30
---> 31             print("Groubpy aggregat on mongoid.")
     32             df_tokes = df_tokes.groupBy('_id').agg(collect_list('finished_tokes').alias('finished_tokes'))

ipdb>  df_tokes.count()
*** py4j.protocol.Py4JJavaError: An error occurred while calling o2186.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 23.0 failed 1 times, most recent failure: Lost task 10.0 in stage 23.0 (TID 130, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate$1: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)

	<style>
	@import url('https://fonts.googleapis.com/css2?family=DM+Serif+Display:ital@0;1&family=DM+Sans:wght@300;400;500&display=swap');

	* { box-sizing: border-box; margin: 0; padding: 0; }

	:root {
	--scene-bg: var(--color-background-primary);
	--surface: var(--color-background-secondary);
	--border: var(--color-border-tertiary);
	ipdb> n
	> <ipython-input-73-5e4223e8e1f9>(31)tokenizer_unigram()
	30
	---> 31 print("Groubpy aggregat on mongoid.")
	32 df_tokes = df_tokes.groupBy('_id').agg(collect_list('finished_tokes').alias('finished_tokes'))

	ipdb> df_tokes.count()
	*** py4j.protocol.Py4JJavaError: An error occurred while calling o2186.count.
	: org.apache.spark.SparkException: Job aborted due to stage failure: Task 10 in stage 23.0 failed 1 times, most recent failure: Lost task 10.0 in stage 23.0 (TID 130, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$dfAnnotate$1: (array<array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>>) => array<struct<annotatorType:string,begin:int,end:int,result:string,metadata:map<string,string>,embeddings:array<float>>>)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)