Sujith Jay Nair sujithjay

## Google Protobuf MacOS Installation for Building the Hadoop Project
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2
tar xvf protobuf-2.5.0.tar.bz2
cd protobuf-2.5.0
./configure CC=clang CXX=clang++ CXXFLAGS='-std=c++11 -stdlib=libc++ -O3 -g' LDFLAGS='-stdlib=libc++' LIBS="-lc++ -lc++abi"
make -j 4
sudo make install
protoc --version

## 2.4.3.scala
 /**
   * Returns a new `DataFrame` that replaces null or NaN values in specified
   * numeric, string columns. If a specified column is not a numeric, string
   * or boolean column it is ignored.
   */
  private def fillValue[T](value: T, cols: Seq[String]): DataFrame = {
    // the fill[T] which T is  Long/Double,
    // should apply on all the NumericType Column, for example:
    // val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 164.3)).toDF("a","b")
    // input.na.fill(3.1)

## UnionOverOr.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sujithjay
                / UnionOverOr.md
            
            
              Created
              October 11, 2019 16:58
            
              
                Union over Or
              
          
    /* Part 0: Initialise */
def randomInt = scala.util.Random.nextInt(10000)

val dataframe = sc.parallelize(
  Seq.fill(100000){(randomInt,randomInt,randomInt)}
  ).toDF("cID", "c2", "c3")

val anotherDataframe = sc.parallelize(

  
## ExplainOutput.md

      
              5 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sujithjay
                / ExplainOutput.md
            
            
              Last active
              June 28, 2018 08:43
            
              
                Spark SQL Joins: Code Snippets
              
          
    scala> widetable.explain()
== Physical Plan ==
*SortMergeJoin [customer_id#2], [customer_id#17], Inner
:- *Sort [customer_id#2 ASC NULLS FIRST], false, 0
:  +- Exchange hashpartitioning(customer_id#2, 200)
:     +- *Scan JDBCRelation(foodmart.sales_fact_1998) [numPartitions=1] [product_id#0,time_id#1,customer_id#2,promotion_id#3,store_id#4,store_sales#5,store_cost#6,unit_sales#7] ReadSchema: struct<product_id:int,time_id:int,customer_id:int,promotion_id:int,store_id:int,store_sales:decim...
+- *Sort [customer_id#17 ASC NULLS FIRST], false, 0
   +- Exchange hashpartitioning(customer_id#17, 200)
 +- *Scan JDBCRelation(foodmart.customer) [numPartitions=1] [customer_id#17,fullname#45] ReadSchema: struct

  
## Combinatorial.scala
/**
 * A tiny class that extends a list with four combinatorial operations:
 * ''combinations'', ''subsets'', ''permutations'', ''variations''.
 *
 * You can find all the ideas behind this code at blog-post:
 *
 *   http://vkostyukov.ru/posts/combinatorial-algorithms-in-scala/
 *
 * How to use this class.
 *

## ASeq.scala
import scala.reflect.ClassTag
import scala.collection.mutable.WrappedArray
import scala.collection.mutable.ArrayLike

def ASeq[T](elt: T*)(implicit ct: ClassTag[T]): IndexedSeq[T] = {
   val a = elt.toArray.clone
   a.deep.asInstanceOf[IndexedSeq[T]]
}

val a = Array(1,2,3)                              //> a  : Array[Int] = Array(1, 2, 3)

## BroadcastHashJoin.md

      
              2 files
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sujithjay
                / BroadcastHashJoin.md
            
            
              Last active
              February 18, 2018 07:21
            
              
                Spark SQL Joins: Code Snippets
              
          
    /* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator[] inputs;
/* 008 */   private scala.collection.Iterator inputadapter_input;

  
## CoalesceVsCase.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sujithjay
                / CoalesceVsCase.md
            
            
              Last active
              December 13, 2017 11:43
            
              
                SQL Queries on Subtyped Data Models using Case Expressions and Coalesce
              
          
    Disjoint subtyping is a scenario that is often encountered in data modeling. In one frequently used modeling approach, an entity of a certain type is represented by a database table, and each subtype of this entity is represented by another table. Subtyping is disjoint if an instance of a type corresponds to at most one instance of a subtype.
When querying on a data model with subtypes, a common (verbose?!) way of doing this is using case expressions. The idea of this post is to introduce an alternative to this approach using coalesce. I will be illustrating this across multiple examples.
Example #1

In the toy schema depicted below, list all orders with their corresponding vendors and customers
Order Table
Id | Status | Quantity | Type |

  
## MariaDBvsMySQL.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              2 stars
            
          
                sujithjay
                / MariaDBvsMySQL.md
            
            
              Last active
              November 30, 2017 08:49
            
              
                A Point of Distinction between MariaDB and MySQL
              
          
    A Point of Distinction between MariaDB and MySQL

TL; DR

In MariaDB, query with ORDER BY in a FROM subquery produces an unordered result. In effect, ORDER BY is ignored in FROM subqueries. MySQL does not ignore ORDER BY in FROM subqueries.
Longer Version

Older versions of MariaDB(< 10.2.0) did not have window functions such as rank(), dense_rank(), row_number() among others. To understand where you would use such a function, dense_rank() for instance, consider the following example:
	wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.bz2
	tar xvf protobuf-2.5.0.tar.bz2
	cd protobuf-2.5.0
	./configure CC=clang CXX=clang++ CXXFLAGS='-std=c++11 -stdlib=libc++ -O3 -g' LDFLAGS='-stdlib=libc++' LIBS="-lc++ -lc++abi"
	make -j 4
	sudo make install
	protoc --version
	/**
	* Returns a new `DataFrame` that replaces null or NaN values in specified
	* numeric, string columns. If a specified column is not a numeric, string
	* or boolean column it is ignored.
	*/
	private def fillValue[T](value: T, cols: Seq[String]): DataFrame = {
	// the fill[T] which T is Long/Double,
	// should apply on all the NumericType Column, for example:
	// val input = Seq[(java.lang.Integer, java.lang.Double)]((null, 164.3)).toDF("a","b")
	// input.na.fill(3.1)
	/**
	* A tiny class that extends a list with four combinatorial operations:
	* ''combinations'', ''subsets'', ''permutations'', ''variations''.
	*
	* You can find all the ideas behind this code at blog-post:
	*
	* http://vkostyukov.ru/posts/combinatorial-algorithms-in-scala/
	*
	* How to use this class.
	*
	import scala.reflect.ClassTag
	import scala.collection.mutable.WrappedArray
	import scala.collection.mutable.ArrayLike

	def ASeq[T](elt: T*)(implicit ct: ClassTag[T]): IndexedSeq[T] = {
	val a = elt.toArray.clone
	a.deep.asInstanceOf[IndexedSeq[T]]
	}

	val a = Array(1,2,3) //> a : Array[Int] = Array(1, 2, 3)