pycoder2000/StepsToCreateCronjob.md

## StepsToCreateCronjob.md

      
    Raw
  

              StepsToCreateCronjob.md
            
          
    Steps to create an ETL Pipeline with Cronjob


Create fat Jar file
mvn clean package -DskipTests


Connect with Dev Sever
ssh -o ServerAliveInterval=10 -i C:\Users\parth.b.desai\Desktop\byte_121.pem centos@172.31.96.121


Move to recon_master
cd /data/nirav/code/recon_master


Create directory with <jobname>
mkdir <jobname>
cd <jobname>


Create directory date and Test_SavetableAPI
mkdir data
mkdir Test_SavetableAPI
mkdir python_script


Add path to data and Test_SavetableAPI in application.conf
pathvariables{
  zomatopath = "/data/nirav/code/recon_master/zomato_settlement_report/data/"
  tempwriteoutpath = "/data/nirav/code/recon_master/zomato_settlement_report/Test_SavetableAPI/"
}


Add these paths to ApplicationConfig.scala
//Custom variables for pushing zomato data
def ZOMATO_PATH = config.getString("monitor.sparketl.sample.job.pathvariables.zomatopath")
def TEMP_WRITEOUT_PATH = config.getString("monitor.sparketl.sample.job.pathvariables.tempwriteoutpath")


Copy application.conf from local system to Dev (In another terminal : Not in dev server)
scp -i C:\Users\parth.b.desai\Desktop\byte_121.pem C:\Users\parth.b.desai\Desktop\Github\SparkSQL_ETL\src\test\resources\application.conf centos@172.31.96.121:/data/nirav/code/recon_master/config/application.conf


Copy fat Jar from local system to Dev
scp -i C:\Users\parth.b.desai\Desktop\byte_121.pem C:\Users\parth.b.desai\Desktop\Github\SparkSQL_ETL\target\spark_etl-1.0-SNAPSHOT-jar-with-dependencies.jar centos@172.31.96.121:/data/nirav/code/recon_master/


Add <jobname>.sh script to shell_Script
cd /data/nirav/code/recon_master/shell_Script/
nano <jobname>.sh


(Optional) Add a python script if needed
cd /data/nirav/code/recon_master/<jobname>/python_script
nano ConvertToCSV.py


Add the following to <jobname>.sh
#!/bin/bash
myarray=(`find /data/nirav/code/recon_master/<jobname>/data/ -maxdepth 1 -name "*.<fileformat>"`)
if [ ${#myarray[@]} -gt 0 ]; then 
    /data/spark-3.1.1-bin-hadoop2.7/bin/spark-submit --class com.byteprophecy.monitor.recon.<classname> --master local[*] --deploy-mode client --jars /data/nirav/code/recon_master/spark_etl-1.0-SNAPSHOT-jar-with-dependencies.jar /data/nirav/code/recon_master/config/application.conf /data/nirav/code/recon_master/config/application.conf
    if [ $? -eq 0 ]; then
        echo "Successfully executed"
        echo "Deleting files..."
        rm  /data/nirav/code/recon_master/<jobname>/data/*.<fileformat>
        echo "Files deleted"
    else
        echo "Failed to execute. Please check logs for more information."
    fi
else 
    echo "File not found"
fi
(For Excel Files)
#!/bin/bash
myarray=(`find /data/nirav/code/recon_master/<jobname>/data/ -maxdepth 1 -name "*.<fileformat>"`)
if [ ${#myarray[@]} -gt 0 ]; then 
    /data/spark-3.1.1-bin-hadoop2.7/bin/spark-submit --class com.byteprophecy.monitor.recon.<classname> --master local[*] --deploy-mode client --driver-cores 1 --driver-memory 4G --jars /data/nirav/code/recon_master/spark_etl-1.0-SNAPSHOT-jar-with-dependencies.jar /data/nirav/code/recon_master/config/application.conf /data/nirav/code/recon_master/config/application.conf
    if [ $? -eq 0 ]; then
        echo "Successfully executed"
        echo "Deleting files..."
        rm  /data/nirav/code/recon_master/<jobname>/data/*.<fileformat>
        echo "Files deleted"
    else
        echo "Failed to execute. Please check logs for more information."
    fi
else 
    echo "File not found"
fi


Create folder for <jobname> in logs
cd /data/nirav/code/recon_master/logs
mkdir <jobname>


(Optional) Copy test data if needed
scp -i C:\Users\parth.b.desai\Desktop\byte_121.pem C:\Users\parth.b.desai\Downloads\Cashless Summary Report-Domino's_Mar'21____This_has_SalesData.xlsb centos@172.31.96.121:/data/nirav/code/recon_master/cash_summary_report_salesdata/data/


Create cronjob
*/10 * * * * sh /data/nirav/code/recon_master/shell_Script/<jobname>.sh > /data/nirav/code/recon_master/logs/<jobname>/`date +\%Y\%m\%d\%H\%M\%S`-cron.log 2>&1>

OPTIONAL : Execute script if you want to test the job before scheduling it with sh /data/nirav/code/recon_master/shell_Script/<jobname>.sh