Maziyar Panahi maziyarpanahi

## hf-vit-pytorch.py
from transformers import ViTFeatureExtractor, ViTForImageClassification
from PIL import Image
import requests

url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')
model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')

## gist:0d67a7ee858da20ce94317358d0f5a2a
Vivek Gupta Sep 2nd, 2020 at 10:02 AM
I am new to sparknlp. I am writing a custom transformer which will remove tokens from text whose length is <=2. Transformer is working and doing its job. But it is not giving proper structure as an output. Instead it is returning only Array of String. I am struggling to get output in following structure -
ArrayType(
        StructType([
            StructField("annotatorType", StringType(), False),
            StructField("begin", IntegerType(), False),
            StructField("end", IntegerType(), False),
            StructField("result", StringType(), False),
            StructField("metadata", MapType(StringType(), StringType()), True)
        ])

## wikipedia-iso-country-codes.csv

          
            English short name lower case
            Alpha-2 code
            Alpha-3 code
            Numeric code
            ISO 3166-2

            
              Afghanistan
              AF
              AFG
              004
              ISO 3166-2:AF

            
              Åland Islands
              AX
              ALA
              248
              ISO 3166-2:AX

            
              Albania
              AL
              ALB
              008
              ISO 3166-2:AL

            
              Algeria
              DZ
              DZA
              012
              ISO 3166-2:DZ

            
              American Samoa
              AS
              ASM
              016
              ISO 3166-2:AS

            
              Andorra
              AD
              AND
              020
              ISO 3166-2:AD

            
              Angola
              AO
              AGO
              024
              ISO 3166-2:AO

            
              Anguilla
              AI
              AIA
              660
              ISO 3166-2:AI

            
              Antarctica
              AQ
              ATA
              010
              ISO 3166-2:AQ

## readme.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                maziyarpanahi
                / readme.md
            
            
              Created
              September 4, 2019 12:32
                — forked from baraldilorenzo/readme.md
            
              
                VGG-16 pre-trained model for Keras
              
          
    ##VGG16 model for Keras
This is the Keras model of the 16-layer network used by the VGG team in the ILSVRC-2014 competition.
It has been obtained by directly converting the Caffe model provived by the authors.
Details about the network architecture can be found in the following arXiv paper:
Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman

  
## zeppelin-pyspark-yarn.txt
DEBUG [2019-02-18 11:27:25,397] ({YARN application state monitor} ProtobufRpcEngine.java[invoke]:249) - Call: getApplicationReport took 2ms
DEBUG [2019-02-18 11:27:25,878] ({FIFOScheduler-Worker-1} InterpreterOutputStream.java[processLine]:81) - Interpreter output:import org.apache.spark.sql.functions._
 INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.java[getStatus]:818) - job:null
DEBUG [2019-02-18 11:27:25,931] ({pool-6-thread-2} Interpreter.java[getProperty]:204) - key: zeppelin.spark.concurrentSQL, value: false
 INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.java[getStatus]:818) - job:null
 INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.java[getStatus]:818) - job:null
 INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.java[getStatus]:818) - job:org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob@f7c36f41
 INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.

## zeppelin-pyspark-yarn-client.txt
INFO [2019-02-06 22:23:16,364] ({main} RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter server on port 0, intpEventServerAddress: IP_ADDRESS:36131
INFO [2019-02-06 22:23:16,384] ({main} RemoteInterpreterServer.java[<init>]:175) - Launching ThriftServer at IP_ADDRESS:46727
INFO [2019-02-06 22:23:16,549] ({pool-6-thread-1} RemoteInterpreterServer.java[createInterpreter]:333) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter
INFO [2019-02-06 22:23:16,553] ({pool-6-thread-1} RemoteInterpreterServer.java[createInterpreter]:333) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
INFO [2019-02-06 22:23:16,556] ({pool-6-thread-1} RemoteInterpreterServer.java[createInterpreter]:333) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter
INFO [2019-02-06 22:23:16,560] ({pool-6-thread-1} RemoteInterpreterServer.java[createInterpreter]:333) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter
INFO [2019-02-06 22:23:16,563] ({pool

## yarn-cluster-error.txt
org.apache.spark.SparkException: Task not serializable
  at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
  at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393)
  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
  at org.apache.spark.SparkContext.clean(SparkContext.scala:2338)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:850)
  at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:849)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)

## gist:aee182aab3e320749fbc9a81031deab3
    {
      "error": {
        "root_cause": [
          {
            "type": "mapper_parsing_exception",
            "reason": "Root mapping definition has unsupported parameters:  [namespace : {dynamic=false, properties={wiki={analyzer=keyword, type=text, index_options=docs}, name={analyzer=near_match_asciifolding, type=text, index_options=docs}}}] [archive : {dynamic=false, properties={wiki={analyzer=keyword, type=text, index_options=docs}, namespace={type=long}, title={search_analyzer=text_search, similarity=BM25, analyzer=text, position_increment_gap=10, type=text, fields={trigram={similarity=BM25, analyzer=trigram, type=text, index_options=docs}, prefix_asciifolding={search_analyzer=near_match_asciifolding, similarity=BM25, analyzer=prefix_asciifolding, type=text, index_options=docs}, plain={search_analyzer=plain_search, similarity=BM25, analyzer=plain, position_increment_gap=10, type=text}, prefix={search_analyzer=near_match, similarity=BM25, analyzer=prefix, type=text, index_options=docs}, keyword={s

## Spark-NLP-POS.scala
import com.johnsnowlabs.nlp.{DocumentAssembler, Finisher}
import com.johnsnowlabs.nlp.annotators.{Normalizer, Stemmer, Tokenizer}
import com.johnsnowlabs.nlp.annotator._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.util.Benchmark
import org.apache.spark.ml.feature.NGram

import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.{StopWordsRemover, IDF, HashingTF, CountVectorizer, Word2Vec}

## tours.json
[
   {
      "tourBlurb" : "Big Sur is big country. The Big Sur Retreat takes you to the most majestic part of the Pacific Coast and show you the secret trails.",
      "tourName" : "Big Sur Retreat",
      "tourPackage" : "Backpack Cal",
      "tourBullets" : "\"Accommodations at the historic Big Sur River Inn, Privately guided hikes through any of the 5 surrounding national parks, Picnic lunches prepared by the River Inn kitchen, Complimentary country breakfast, Admission to the Henry Miller Library and the Point Reyes Lighthouse \"",
      "tourRegion" : "Central Coast",
      "tourDifficulty" : "Medium",
      "tourLength" : 3,
      "tourPrice" : 750,
	from transformers import ViTFeatureExtractor, ViTForImageClassification
	from PIL import Image
	import requests

	url = 'http://images.cocodataset.org/val2017/000000039769.jpg'
	image = Image.open(requests.get(url, stream=True).raw)

	feature_extractor = ViTFeatureExtractor.from_pretrained('google/vit-base-patch16-224')
	model = ViTForImageClassification.from_pretrained('google/vit-base-patch16-224')
	Vivek Gupta Sep 2nd, 2020 at 10:02 AM
	I am new to sparknlp. I am writing a custom transformer which will remove tokens from text whose length is <=2. Transformer is working and doing its job. But it is not giving proper structure as an output. Instead it is returning only Array of String. I am struggling to get output in following structure -
	ArrayType(
	StructType([
	StructField("annotatorType", StringType(), False),
	StructField("begin", IntegerType(), False),
	StructField("end", IntegerType(), False),
	StructField("result", StringType(), False),
	StructField("metadata", MapType(StringType(), StringType()), True)
	])
English short name lower case	Alpha-2 code	Alpha-3 code	Numeric code	ISO 3166-2
Afghanistan	AF	AFG	004	ISO 3166-2:AF
Åland Islands	AX	ALA	248	ISO 3166-2:AX
Albania	AL	ALB	008	ISO 3166-2:AL
Algeria	DZ	DZA	012	ISO 3166-2:DZ
American Samoa	AS	ASM	016	ISO 3166-2:AS
Andorra	AD	AND	020	ISO 3166-2:AD
Angola	AO	AGO	024	ISO 3166-2:AO
Anguilla	AI	AIA	660	ISO 3166-2:AI
Antarctica	AQ	ATA	010	ISO 3166-2:AQ
	DEBUG [2019-02-18 11:27:25,397] ({YARN application state monitor} ProtobufRpcEngine.java[invoke]:249) - Call: getApplicationReport took 2ms
	DEBUG [2019-02-18 11:27:25,878] ({FIFOScheduler-Worker-1} InterpreterOutputStream.java[processLine]:81) - Interpreter output:import org.apache.spark.sql.functions._
	INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.java[getStatus]:818) - job:null
	DEBUG [2019-02-18 11:27:25,931] ({pool-6-thread-2} Interpreter.java[getProperty]:204) - key: zeppelin.spark.concurrentSQL, value: false
	INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.java[getStatus]:818) - job:null
	INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.java[getStatus]:818) - job:null
	INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.java[getStatus]:818) - job:org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob@f7c36f41
	INFO [2019-02-18 11:27:25,931] ({pool-6-thread-2} RemoteInterpreterServer.
	INFO [2019-02-06 22:23:16,364] ({main} RemoteInterpreterServer.java[<init>]:148) - Starting remote interpreter server on port 0, intpEventServerAddress: IP_ADDRESS:36131
	INFO [2019-02-06 22:23:16,384] ({main} RemoteInterpreterServer.java[<init>]:175) - Launching ThriftServer at IP_ADDRESS:46727
	INFO [2019-02-06 22:23:16,549] ({pool-6-thread-1} RemoteInterpreterServer.java[createInterpreter]:333) - Instantiate interpreter org.apache.zeppelin.spark.SparkInterpreter
	INFO [2019-02-06 22:23:16,553] ({pool-6-thread-1} RemoteInterpreterServer.java[createInterpreter]:333) - Instantiate interpreter org.apache.zeppelin.spark.SparkSqlInterpreter
	INFO [2019-02-06 22:23:16,556] ({pool-6-thread-1} RemoteInterpreterServer.java[createInterpreter]:333) - Instantiate interpreter org.apache.zeppelin.spark.DepInterpreter
	INFO [2019-02-06 22:23:16,560] ({pool-6-thread-1} RemoteInterpreterServer.java[createInterpreter]:333) - Instantiate interpreter org.apache.zeppelin.spark.PySparkInterpreter
	INFO [2019-02-06 22:23:16,563] ({pool
	org.apache.spark.SparkException: Task not serializable
	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:393)
	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
	at org.apache.spark.SparkContext.clean(SparkContext.scala:2338)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:850)
	at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:849)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	{
	"error": {
	"root_cause": [
	{
	"type": "mapper_parsing_exception",
	"reason": "Root mapping definition has unsupported parameters: [namespace : {dynamic=false, properties={wiki={analyzer=keyword, type=text, index_options=docs}, name={analyzer=near_match_asciifolding, type=text, index_options=docs}}}] [archive : {dynamic=false, properties={wiki={analyzer=keyword, type=text, index_options=docs}, namespace={type=long}, title={search_analyzer=text_search, similarity=BM25, analyzer=text, position_increment_gap=10, type=text, fields={trigram={similarity=BM25, analyzer=trigram, type=text, index_options=docs}, prefix_asciifolding={search_analyzer=near_match_asciifolding, similarity=BM25, analyzer=prefix_asciifolding, type=text, index_options=docs}, plain={search_analyzer=plain_search, similarity=BM25, analyzer=plain, position_increment_gap=10, type=text}, prefix={search_analyzer=near_match, similarity=BM25, analyzer=prefix, type=text, index_options=docs}, keyword={s
	import com.johnsnowlabs.nlp.{DocumentAssembler, Finisher}
	import com.johnsnowlabs.nlp.annotators.{Normalizer, Stemmer, Tokenizer}
	import com.johnsnowlabs.nlp.annotator._
	import com.johnsnowlabs.nlp.base._
	import com.johnsnowlabs.util.Benchmark
	import org.apache.spark.ml.feature.NGram

	import org.apache.spark.ml.Pipeline
	import org.apache.spark.ml.feature.{StopWordsRemover, IDF, HashingTF, CountVectorizer, Word2Vec}
	[
	{
	"tourBlurb" : "Big Sur is big country. The Big Sur Retreat takes you to the most majestic part of the Pacific Coast and show you the secret trails.",
	"tourName" : "Big Sur Retreat",
	"tourPackage" : "Backpack Cal",
	"tourBullets" : "\"Accommodations at the historic Big Sur River Inn, Privately guided hikes through any of the 5 surrounding national parks, Picnic lunches prepared by the River Inn kitchen, Complimentary country breakfast, Admission to the Henry Miller Library and the Point Reyes Lighthouse \"",
	"tourRegion" : "Central Coast",
	"tourDifficulty" : "Medium",
	"tourLength" : 3,
	"tourPrice" : 750,