Marek Wiewiórka mwiewior

## cache-oblivious.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mwiewior
                / cache-oblivious.md
            
            
              Created
              February 17, 2024 15:03
                — forked from debasishg/cache-oblivious.md
            
              
                Papers related to cache oblivious data structures
              
          
    Cache Oblivious and Cache Aware Data Structure and Algorithms


Cache-Oblivious Algorithms and Data Structures - Erik Demaine (One of the earliest papers in cache oblivious data structures and algorithms that introduces the cache oblivious model in detail and examines static and dynamic cache oblivious data structures built between 2000-2003)


Cache Oblivious B-Trees - Bender, Demaine, Farch-Colton (This paper presents two dynamic search trees attaining near-optimal performance on any hierarchical memory. One of the fundamental papers in the field where both search trees discussed match the optimal search bound of Θ(1+log (B+1)N) memory transfers)


Cache Oblivious Search Trees via Binary Trees of Small Height - Brodal, Fagerberg, Jacob (The data structure discussed in this paper works on the version of [2] but avoids the use o


## main-fix-role.tf
resource "google_project_iam_member" "tbd-editor-member" {
  #checkov:skip=CKV_GCP_49: "Ensure no roles that enable to impersonate and manage all service accounts are used at a project level"
  #checkov:skip=CKV_GCP_117: "Ensure basic roles are not used at project level."
  # This is only used for workshops!!!
  project = google_project.tbd_project.project_id
  role    = "roles/owner"
  member  = "serviceAccount:${google_service_account.tbd-terraform.email}"
}

## spark-amm.sh
#!/usr/bin/env bash

export SPARK_HOME="${SPARK_HOME:-/usr/lib/spark2}"
export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}"/conf}"

source ${SPARK_HOME}/bin/load-spark-env.sh
export HIVE_CONF_DIR=${SPARK_CONF_DIR}
export HADOOP_CONF_DIR=/etc/hadoop/conf

AMMONITE=~/bin/amm # This is amm binary release 2.11-1.6.7

## README.md

      
              3 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mwiewior
                / README.md
            
            
              Created
              July 1, 2020 12:23
                — forked from bradfordcp/README.md
            
              
                Setting up Apache Spark to use Apache Shiro for authentication of Spark Master dashboard.
              
          
    Securing Apache Spark with Apache Shiro


Download shiro-core-1.2.5.jar Apache Shiro Downloads
Download shiro-web-1.2.5.jar Apache Shiro Downloads
Note the location of the JAR files and shiro.ini. I placed it in the root of my Spark download
Update the spark-env.sh file with the Shiro JARs and add an entry for the path where the shiro.ini resides
Start the Spark master sbin/start-master.sh
Navigate to the Spark master dashboard
Authenticate with credentials in shiro.ini

Note this was developed / tested with Apache Spark 1.4.1, but should work with newer versions as well.

  
## carbon.scala
// ./spark-shell -v --master yarn-client --driver-memory 1G --executor-memory 2G --executor-cores 2 \
// --jars /tmp/apache-carbondata-1.6.0-SNAPSHOT-bin-spark2.3.2-hadoop2.7.2.jar  \
// --conf spark.hadoop.hive.metastore.uris=thrift://cdh01.cl.ii.pw.edu.pl:9083 \
// --conf spark.hadoop.yarn.timeline-service.enabled=false \
// --conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
// --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
// --conf spark.hadoop.metastore.catalog.default=hive


import org.apache.spark.sql.SparkSession

## coverage_bases_bed.scala
spark-shell -v --master=local[$cores] --driver-memory=12g  --conf "spark.sql.catalogImplementation=in-memory" --packages org.biodatageeks:bdg-sequila_2.11:0.5.3-spark-2.4.0-SNAPSHOT --repositories http://repo.hortonworks.com/content/repositories/releases/,http://zsibio.ii.pw.edu.pl/nexus/repository/maven-snapshots/

import org.apache.spark.sql.SequilaSession
import org.biodatageeks.utils.{SequilaRegister, UDFRegister,BDGInternalParams}

val ss = SequilaSession(spark)
SequilaRegister.register(ss)
ss.sqlContext.setConf("spark.biodatageeks.bam.useGKLInflate","true")
ss.sqlContext.setConf("spark.biodatageeks.bam.useSparkBAM","false")

## scala-sbt-project-structure.sh
#!/usr/bin/env bash
touch build.sbt ; touch README.md; mkdir -p project; touch project/plugins.sbt; mkdir -p src/{main/{scala,resources,java},test/{scala,resources,java}}/

## map-pushdow.sc
// Este script é para rodar no Ammonite.
// Crie o arquivo catalyst_04.sc com este conteúdo
// Dentro da shell REPL do Ammonitem, você deve invocar assim:
//   import $file.catalyst_04, catalyst_04._
//
// Mas antes execute estes tres comandos abaixo
// import coursier.MavenRepository
// interp.repositories() ++= Seq(MavenRepository("file:/Users/admin/.m2/repository"))
// import $ivy.`org.apache.spark::spark-sql:2.3.0`

## slack.sh
#!/usr/bin/env bash
####################################################################################
# Slack Bash console script for sending messages.
####################################################################################
# Installation
#    $ curl -s https://gist.githubusercontent.com/andkirby/67a774513215d7ba06384186dd441d9e/raw --output /usr/bin/slack
#    $ chmod +x /usr/bin/slack
####################################################################################
# USAGE
# Send message to slack channel/user

## extraStrategies.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mwiewior
                / extraStrategies.md
            
            
              Created
              October 13, 2017 08:54
                — forked from marmbrus/extraStrategies.md
            
              
                Example of injecting custom planning strategies into Spark SQL.
              
          
    First a disclaimer: This is an experimental API that exposes internals that are likely to change in between different Spark releases.  As a result, most datasources should be written against the stable public API in org.apache.spark.sql.sources.  We expose this mostly to get feedback on what optimizations we should add to the stable API in order to get the best performance out of data sources.
We'll start with a simple artificial data source that just returns ranges of consecutive integers.
/** A data source that returns ranges of consecutive integers in a column named `a`. */
case class SimpleRelation(
    start: Int, 
    end: Int)(
    @transient val sqlContext: SQLContext)
	resource "google_project_iam_member" "tbd-editor-member" {
	#checkov:skip=CKV_GCP_49: "Ensure no roles that enable to impersonate and manage all service accounts are used at a project level"
	#checkov:skip=CKV_GCP_117: "Ensure basic roles are not used at project level."
	# This is only used for workshops!!!
	project = google_project.tbd_project.project_id
	role = "roles/owner"
	member = "serviceAccount:${google_service_account.tbd-terraform.email}"
	}
	#!/usr/bin/env bash

	export SPARK_HOME="${SPARK_HOME:-/usr/lib/spark2}"
	export SPARK_CONF_DIR="${SPARK_CONF_DIR:-"${SPARK_HOME}"/conf}"

	source ${SPARK_HOME}/bin/load-spark-env.sh
	export HIVE_CONF_DIR=${SPARK_CONF_DIR}
	export HADOOP_CONF_DIR=/etc/hadoop/conf

	AMMONITE=~/bin/amm # This is amm binary release 2.11-1.6.7
	// ./spark-shell -v --master yarn-client --driver-memory 1G --executor-memory 2G --executor-cores 2 \
	// --jars /tmp/apache-carbondata-1.6.0-SNAPSHOT-bin-spark2.3.2-hadoop2.7.2.jar \
	// --conf spark.hadoop.hive.metastore.uris=thrift://cdh01.cl.ii.pw.edu.pl:9083 \
	// --conf spark.hadoop.yarn.timeline-service.enabled=false \
	// --conf spark.driver.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
	// --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=3.1.0.0-78 \
	// --conf spark.hadoop.metastore.catalog.default=hive


	import org.apache.spark.sql.SparkSession
	spark-shell -v --master=local[$cores] --driver-memory=12g --conf "spark.sql.catalogImplementation=in-memory" --packages org.biodatageeks:bdg-sequila_2.11:0.5.3-spark-2.4.0-SNAPSHOT --repositories http://repo.hortonworks.com/content/repositories/releases/,http://zsibio.ii.pw.edu.pl/nexus/repository/maven-snapshots/

	import org.apache.spark.sql.SequilaSession
	import org.biodatageeks.utils.{SequilaRegister, UDFRegister,BDGInternalParams}

	val ss = SequilaSession(spark)
	SequilaRegister.register(ss)
	ss.sqlContext.setConf("spark.biodatageeks.bam.useGKLInflate","true")
	ss.sqlContext.setConf("spark.biodatageeks.bam.useSparkBAM","false")
	#!/usr/bin/env bash
	touch build.sbt ; touch README.md; mkdir -p project; touch project/plugins.sbt; mkdir -p src/{main/{scala,resources,java},test/{scala,resources,java}}/
	// Este script é para rodar no Ammonite.
	// Crie o arquivo catalyst_04.sc com este conteúdo
	// Dentro da shell REPL do Ammonitem, você deve invocar assim:
	// import $file.catalyst_04, catalyst_04._
	//
	// Mas antes execute estes tres comandos abaixo
	// import coursier.MavenRepository
	// interp.repositories() ++= Seq(MavenRepository("file:/Users/admin/.m2/repository"))
	// import $ivy.`org.apache.spark::spark-sql:2.3.0`
	#!/usr/bin/env bash
	####################################################################################
	# Slack Bash console script for sending messages.
	####################################################################################
	# Installation
	# $ curl -s https://gist.githubusercontent.com/andkirby/67a774513215d7ba06384186dd441d9e/raw --output /usr/bin/slack
	# $ chmod +x /usr/bin/slack
	####################################################################################
	# USAGE
	# Send message to slack channel/user