Skip to content

Instantly share code, notes, and snippets.

View animeshtrivedi's full-sized avatar

Animesh Trivedi animeshtrivedi

View GitHub Profile
/*
* MIT License
Copyright (c) 2020-2021
Authors: Sacheendra Talluri, Giulia Frascaria, and, Animesh Trivedi
This code is part of the Storage System Course at VU Amsterdam
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
@animeshtrivedi
animeshtrivedi / .vimrc
Created April 22, 2020 19:25 — forked from simonista/.vimrc
A basic .vimrc file that will serve as a good template on which to build.
" Don't try to be vi compatible
set nocompatible
" Helps force plugins to load correctly when it is turned back on below
filetype off
" TODO: Load plugins here (pathogen or vundle)
" Turn on syntax highlighting
syntax on
This patch deos the following changes:
* moves two common function "getNullCount" and "splitAndTransferValidityBuffer" to the top-level BaseValueVector. This change requries moving "validityBuffer" to the BaseValueVector class (as recommended in this TODO: https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/BaseFixedWidthVector.java#L89)
* optimize the implementation of loadValidityBuffer (in the BaseValueVector) to just pass the reference for the validity buffer read from the storage
* optimize for the common boundary condition when all variables are valid (as done in the C++ code: https://github.com/apache/arrow/blob/master/cpp/src/arrow/array.h#L290)
The optimization delivers performance.
Tests: Read 50M integers from a single Int column (2GB).
@animeshtrivedi
animeshtrivedi / gist:9341ce0c14664ea8139076a3fb63324e
Created September 17, 2019 06:57 — forked from bobisme/gist:1078482
Build/install custom linux kernel headers in ubuntu.

Linux Headers

  • Install tools to build:
sudo apt-get update
sudo apt-get install kernel-package fakeroot wget bzip2
  • Linux-2.6.39.1-linode34 is same as regular 2.6.39
@animeshtrivedi
animeshtrivedi / UberTeraSort.scala
Created July 25, 2018 14:01
TeraSort in Scala so that this could be used on the serverless SQL shell.
// Author: Animesh Trivedi
// atr@zurich.ibm.com
import org.apache.spark.sql.{SaveMode, SparkSession}
import scala.collection.mutable.ListBuffer
import scala.util.Random
private def generateTSRecord(key: Array[Byte], recBuf:Array[Byte], rand: Random): Unit = {
val fixed = 10
@animeshtrivedi
animeshtrivedi / spark-defaults.conf
Created March 23, 2018 13:51
Spark configuration for a local mode execution
# Command to launch TPCDS:
# ./bin/spark-submit -v --master local[2] --class com.ibm.crail.spark.tools.ParquetGenerator ~/jars/parquet-generator-1.0.jar -c tpcds -o crail://localhost:9060/F1/tpcds/ -p 4 -t 4 -tsf 1 -tdsd /home/atr/zrl/external/github/databricks/tpcds-kit/tools/ -tdd 1
# And you need to put core-site.xml from crail into the conf folder.
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
@animeshtrivedi
animeshtrivedi / crail-site.config
Created March 23, 2018 13:50
A localhost crail site with the TPC tier
crail.blocksize 4096
crail.buffersize 4096
#crail.buffersize 1048576
#crail.buffersize 8192
#crail.slicesize 8192
crail.regionsize 1073741824
crail.cachelimit 1073741824
@animeshtrivedi
animeshtrivedi / ParquetToArrow.java
Last active March 8, 2022 12:04
Example program to convert Apache Parquet data to Apache Arrow
/* This code snippet is a part of the blog at
https://github.com/animeshtrivedi/blog/blob/master/post/2017-12-26-arrow.md
*/
import com.google.common.collect.ImmutableList;
import org.apache.arrow.memory.RootAllocator;
import org.apache.arrow.vector.*;
import org.apache.arrow.vector.dictionary.DictionaryProvider;
import org.apache.arrow.vector.types.FloatingPointPrecision;
import org.apache.arrow.vector.types.pojo.ArrowType;
@animeshtrivedi
animeshtrivedi / HdfsSeekableByteChannel.java
Last active December 26, 2017 15:19
SeekableByteChannel implementation for HDFS. SeekableByteChannel is used in Apache ArrowFileReader
/* This code snippet is a part of the blog at
https://github.com/animeshtrivedi/blog/blob/master/post/2017-12-26-arrow.md
*/
import org.apache.hadoop.fs.FSDataInputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
/**
@animeshtrivedi
animeshtrivedi / HDFSWritableByteChannel.java
Last active July 12, 2021 12:17
WritableByteChannel implementation for HDFS. WritableByteChannel is used in Apache ArrowFileWriter
/* This code snippet is a part of the blog at
https://github.com/animeshtrivedi/blog/blob/master/post/2017-12-26-arrow.md
*/
import org.apache.hadoop.fs.FSDataOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.WritableByteChannel;