Skip to content

Instantly share code, notes, and snippets.

https://github.com/Thomas-George-T/Movies-Analytics-in-Spark-and-Scala
Change execution engine = Tez, spark ( set Tez/Spark client jars into HADOOP_CLASSPATH)
Partitioning - PARTITIONED BY clause is used to divide the table into buckets.
Buckting - CLUSTERED BY clause is used to divide the table into buckets.
Map-Side join, Bucket-Map-Side join, Sorted Bucket-Map-Side join
Usage of suitable file format = ORC(Optimized Row Columnar) file formate
Indexing
Vectorization along with ORC
CBO
@phaneesh
phaneesh / jenkins-get-git-url.groovy
Last active August 2, 2022 12:18
Jenkins git repository URL for all jobs in a view
import jenkins.model.*
Jenkins.instance.getView('<view name>').items.each {
try {
println(it.fullName +"," + it.getScm().getUserRemoteConfigs()[0].getUrl())
} catch(Exception e) {
println("Error getting git url for: " +it)
}
}
@phaneesh
phaneesh / rundeck-cleanup.sh
Last active August 28, 2020 06:03
Clear rundeck logs and job execution history
#!/bin/bash
RUNDECK_DB_HOST=''
RUNDECK_DB_USER=''
RUNDECK_DB_PASSWORD=''
RUNDECK_DB_NAME=''
cd /var/log/rundeck/
find . -type f \( -name "*.api.log.*" -o -name "*.executions.log.*" -o -name "rundeck.log.*" -o -name "*.service.log.*" -o -name "*.jobs.log.*" -o -name "*.audit.log.*" \) -mtime +3 -exec rm {} \;
cd /var/lib/rundeck/logs/rundeck/
find . -type f \( -name "*.rdlog" -o -name "*.json" -o -name "*.xml" \) -mtime +10 -exec rm {} \;
find . -type d -empty -delete
@phaneesh
phaneesh / jenkins-old-builds-cleanup.groovy
Last active March 10, 2020 05:38
Cleanup Old Jenkins builds
import jenkins.model.Jenkins
import hudson.model.Job
MAX_BUILDS = 3
for (job in Jenkins.instance.items) {
if(job.name.toLowerCase().endsWith("develop") || job.name.toLowerCase().endsWith("stage")) {
println "Deleting old builds: " +job.name
def recent = job.builds.limit(MAX_BUILDS)
for (build in job.builds) {
@phaneesh
phaneesh / jenkins-workspace-cleanup.groovy
Last active March 10, 2020 03:59
Cleanup Jenkins Workspaces (All Projects)
import jenkins.model.*
Jenkins.instance.getAllItems(AbstractProject.class)
.each {
try {
println("Wiping workspace for "+it.fullName)
it.doDoWipeOutWorkspace()
} catch(Exception e) {
println("Error wiping workspace for "+it.fullName)
}
}
@phaneesh
phaneesh / jenkins-get-invalid-branch-list.groovy
Created July 29, 2019 06:37
Get Branch invalid/non-standard configuration from Jenkins
import hudson.model.*
import hudson.maven.MavenModuleSet
import hudson.tasks.*
import hudson.plugins.git.GitSCM
//List valid branches that can be built
def validBranches = ["develop","master"]
Hudson.instance.items.each {
if(it.getScm() instanceof GitSCM) {
Set branchNames = []
it.getScm().getBranches().each { b ->
@phaneesh
phaneesh / set-build-history.groovy
Created July 26, 2019 15:27
Jenkins - Set Build History Retention
import hudson.model.*
import hudson.maven.MavenModuleSet
import hudson.tasks.*
Hudson.instance.items.each {
if(it instanceof MavenModuleSet) {
if(it.getBuildDiscarder() != null && it.getBuildDiscarder() instanceof LogRotator) {
def l = new LogRotator(365,-1,-1,-1)
it.setBuildDIscarder(l)
}
@phaneesh
phaneesh / log4j.properties
Created March 19, 2019 08:48
Rundeck log rotation
log4j.appender.cmd-logger=org.apache.log4j.RollingFileAppender
log4j.appender.cmd-logger.append=true
log4j.appender.cmd-logger.layout=org.apache.log4j.PatternLayout
log4j.appender.cmd-logger.layout.ConversionPattern=%d{ISO8601} [%t] %-5p %c - %m%n
log4j.appender.cmd-logger.File=/var/log/rundeck/command.log
log4j.appender.cmd-logger.rollingPolicy=org.apache.log4j.rolling.TimeBasedRollingPolicy
log4j.appender.cmd-logger.rollingPolicy.ActiveFileName=/var/log/rundeck/command.log
log4j.appender.cmd-logger.rollingPolicy.FileNamePattern=/var/log/rundeck/command-%d{yyyy-MM-dd}.log
log4j.appender.cmd-logger.MaxBackupIndex=7
@phaneesh
phaneesh / fstab
Created February 19, 2019 10:04
hadoop-os-tuning
/dev/sdb1 /data/hdp01 xfs defaults,noatime,nodiratime,nobarrier 1 2
/dev/sdc1 /data/hdp02 xfs defaults,noatime,nodiratime,nobarrier 1 2
@phaneesh
phaneesh / restart-nn-ha.md
Created December 10, 2018 13:13
Restart NameNode HA

Steps to restart NameNode HA with zero downtime

  • Failover to secondary namenode
$ sudo -u hdfs hdfs haadmin -failover nn1 nn2
  • Restart nn1 from Ambari
  • Failover again to secondary namenode
$ sudo -u hdfs hdfs haadmin -failover nn2 nn1