mugli/A MySQL Table to BigQuery Import Script.md

## A MySQL Table to BigQuery Import Script.md

      
    Raw
  

              A MySQL Table to BigQuery Import Script.md
            
          
    Import a MySQL Table or Database to a BigQuery Schema.

Usage:

~/bin/mysql_table_to_big_query.sh bucket_name schema_name table_name

~/bin/mysql_schema_to_big_query.sh bucket_name \[schema_name\].
Description

mysql_table_to_big_query.sh exports the table from MySQL to CSV and exports the schema to JSON and SQL file.
The files are then uploaded to a folder of an existing cloud bucket.
These files are then imported to big query.
A BigQuery Dataset is created in the same project (if not existing) with the name {SCHEMA_NAME}_{DATE}.
If the table has a column with datatype=DATE, the BigQuery table will be partitioned.
mysql_schema_to_big_query.sh extracts a list of all tables from the MySQLschema and calls mysql_table_to_big_query.sh for each.
It creates log files in the local directory for each table. These can be used to review the import errors, then deleted.
Attribution

Script is based on work by Shantanuo shantanuo/mysql_to_big_query.sh
Main differences from the original
This version requires you to create a Google cloud storage bucket first. This gives some control of the storage class and also allows tables with underscores (which are not allowed in bucket name)
There are some tweaks to the logging.
Added BQ Partitioning form fork https://gist.github.com/apenney/2d78fc1fb734bf3f58d5adbd1dfdd461
Dependencies

bash,
MySQL with authenticated access to specified schema.
Google Cloud SDK installed with shell configured for a GCP project.
A bucket belonging to the GCP project with write permissions. Select appropriate storage class and region. If the bucket is not used for anything else then conisder adding lifecycle rule to delete the intermediary files after a couple of days. They can't be deleted immediately without affecting the bq load.
Local Drive must have adequate space to hold the table in uncompressed CSV format.
mysql_schema_to_big_query.sh expects to find mysql_table_to_big_query.sh in ~/bin.
Installation


Use wget to download these scripts.


wget https://gist.github.com/intotecho/173401b1ce1a2c18decc3ce22ffeb5a7/raw/mysql_table_to_big_query.sh
wget https://gist.github.com/intotecho/173401b1ce1a2c18decc3ce22ffeb5a7/raw/mysql_schema_to_big_query.sh


Copy both scripts to ~/bin and chmod to executable.
Install gsutil and bigquery (or Google Cloud SDK) and configure gsutil for your GCP project.

Limitations

Cannot import tables containing blobs.
Does not delete the intermediate files in the storage bucket. These must remain until the BQ load job completes.
To check status of a load job use >bq ls -j
Tested on Ubuntu.
Subject to all the Limitations for importing CSV data into Big Query
If your date or timestamp data is not in the correct format then import as string by modifying the select statement that creates the schema in json_query.txt.
The Dataset Location is set by default rules for the project. Check the location of the dataset matches the buckets.

  
## mysql_schema_to_big_query.sh
#!/bin/bash
BUCKET_NAME=$1
TABLE_SCHEMA=$2
TABLE_NAME=$3
PROJECT_ID=`gcloud config list --format 'value(core.project)' 2>/dev/null`
UIDX=`date  '+_%Y_%m_%d'`
DATASET="$TABLE_SCHEMA$UIDX"
upload_folder="$BUCKET_NAME/UPLOAD/"

echo "Import $TABLE_SCHEMA.$TABLE_NAME from MySQL to BigQuery $PROJECT_ID:$DATASET.$TABLE_NAME via bucket $upload_folder"

#  uncomment set -x for more printouts
#set -x

mytime=`date '+%y%m%d%H%M'`
hostname=`hostname | tr 'A-Z' 'a-z'`
file_prefix="$TABLE_NAME$mytime$TABLE_SCHEMA"

echo $file_prefix
splitat="4000000000"
bulkfiles=200
maxbad=300


# Cleanup function called on exit
function cleanup_files() {
  rm -f ${TABLE_NAME}.*
  rm -f ${file_prefix}*
  rm -f blob_query.txt json_query.txt
}


# make sure bucket, schema and table names are supplied
if [ $# -ne 3 ];then
echo "Copy a table from MySQL schema to existing bucket in current gcp project then import to BigQuery dataset $DATASET."
echo "usage: $0 BUCKET_NAME SCHEMA_NAME TABLE_NAME"
exit 1
fi

echo "Checking that table doesn't contain blob data."
cat > blob_query.txt << heredoc
select sum(IF((DATA_TYPE LIKE '%blob%'),1, 0)) from INFORMATION_SCHEMA.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
heredoc
mycount=`mysql $mysqlargs -Bs < blob_query.txt`
if [ $mycount -ne 0 ]; then
  echo "Cannot import table $TABLE_NAME. Blob column found."
  exit 3
fi

echo "Creating JSON schema from mysql table structure"
##
## DATE and TIMESTAMP format
## If the DATE field is not formatted as 'YYYY-DD-MM' or TIMESTAMP field is not formatted as YYYY-DD-MM HH:MM:SS' then import these as STRING and use a Query to transform them. You will loose partitioning. E.g.
#select CONCAT('{"name": "', COLUMN_NAME, '","type":"', IF(DATA_TYPE like "%date%", "STRING",IF(DATA_TYPE like "%int%", "INTEGER",IF(DATA_TYPE = "decimal","FLOAT","STRING"))) , '"},') as json from information_schema.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
##
cat > json_query.txt << heredoc
select CONCAT('{"name": "', COLUMN_NAME, '","type":"', IF(DATA_TYPE like "%date%", "TIMESTAMP",IF(DATA_TYPE like "%int%", "INTEGER",IF(DATA_TYPE = "decimal","FLOAT","STRING"))) , '"},') as json from information_schema.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
heredoc
echo '[' >  $TABLE_NAME.json
mysql -Bs < json_query.txt | sed '$s/,$//' >> $TABLE_NAME.json
mysql $TABLE_SCHEMA -Bse"show create table $TABLE_NAME\G" > $TABLE_NAME.sql
echo ', {"name": "hostname","type":"STRING"} ]' >>  $TABLE_NAME.json

echo "Uploading schemas: $TABLE_NAME.json, $TABLE_NAME.sql to cloud: $upload_folder"
gsutil cp $TABLE_NAME.json gs://$upload_folder
gsutil cp $TABLE_NAME.sql gs://$upload_folder

echo "Exporting table: $TABLE_SCHEMA.$TABLE_NAME to CSV-like file: $TABLE_NAME.txt1"
mysql $TABLE_SCHEMA --quick -Bse"select *  from $TABLE_NAME" > $TABLE_NAME.txt1
FILESIZE=$(stat -c%s "$TABLE_NAME.txt1")
if [ -s "$TABLE_NAME.txt1" ]
then
   echo "Converting file $TABLE_NAME.txt1 $FILESIZE to CSV format for Big Query"
else
  echo "$TABLE_NAME.txt1 is empty or does not exist! Exiting."
  exit 3
fi
echo 'tr'
tr -d "\r" < $TABLE_NAME.txt1 > $TABLE_NAME.txt
echo 'sed1'
sed -i "s/$/\t$TABLE_SCHEMA/"  $TABLE_NAME.txt
echo 'sed2'
sed -i 's/(Ctrl-v)(Ctrl-m)//g' $TABLE_NAME.txt

echo "Spliting large file $TABLE_NAME.txt to files with prefix $file_prefix"
split -C $splitat $TABLE_NAME.txt $file_prefix

#echo "Loop and upload $file_prefix to google cloud"
for file in `ls $file_prefix*`
do
  # big query does not seem to like double quotes and NULL
  sed -i -e 's/\"//g' -e's/NULL//g' $file
  gzip $file
  echo "Uploading CSV dumpfile: $file.gz to google cloud: $upload_folder"
  gsutil cp $file.gz gs://$upload_folder
  if [ $? -ne 0 ];then
    echo "$file could not be uploaded to cloud"
    cleanup_files
    exit 3
  fi
  #rm -f $file.gz
done

#check if data set exists and create if not
bq_safe_mk() {
    dataset=$1
    exists=$(bq ls -d | grep -w $dataset)
    if [ -n "$exists" ]; then
       echo "BigQuery Dataset $dataset already exists"
    else
       echo "Creating BigQuery Dataset $dataset"
       bq mk $dataset
    fi
}

# import data to big query
bq_safe_mk $DATASET
#echo "Check what's matching in the bucketi: $upload_folder$file_prefix"
filelist=`gsutil ls gs://$upload_folder$file_prefix*.gz | xargs -n$bulkfiles | tr ' ', ','`
echo "Loading files $filelist into Big Query"

for mylist in $filelist
do
  # Check if we have a TIMESTAMP or DATE column, if we do then we can use partitioning
  BQ_ARGS="--nosync -F \t --job_id=$file --max_bad_record=$maxbad"

  if grep "TIMESTAMP" "$TABLE_NAME.json"; then
    FIELD=$(grep "TIMESTAMP" "$TABLE_NAME.json" | tail -1 | cut -d'"' -f4)
    echo "Partitioning Table based on TIMESTAMP column $FIELD"
    BQ_ARGS+=" --time_partitioning_type=DAY --time_partitioning_field=${FIELD}"
  elif  grep "DATE" "$TABLE_NAME.json"; then
    FIELD=$(grep "DATE" "$TABLE_NAME.json" | tail -1 | cut -d'"' -f4)
    echo "Partitioning Table based on DATE column $FIELD"
    BQ_ARGS+=" --time_partitioning_type=DAY --time_partitioning_field=${FIELD}"
  fi

  #bq load --nosync -F '\t' --job_id="$file" --max_bad_record=$maxbad $DATASET.$TABLE_NAME $mylist $TABLE_NAME.json
  if ! bq load $BQ_ARGS "${DATASET}.${TABLE_NAME}" "$mylist" "${TABLE_NAME}.json"; then
    echo "ERROR: bq load failed for $file, check file exists in cloud."
    cleanup_files
    exit 2
  fi
done
echo "$0 completed. Cleaning up"
cleanup_files
exit

## mysql_table_to_big_query.sh
#!/bin/bash
BUCKET_NAME=$1
TABLE_SCHEMA=$2
TABLE_NAME=$3
PROJECT_ID=`gcloud config list --format 'value(core.project)' 2>/dev/null`
UIDX=`date  '+_%Y_%m_%d'`
DATASET="$TABLE_SCHEMA$UIDX"
upload_folder="$BUCKET_NAME/UPLOAD/"

echo "Import $TABLE_SCHEMA.$TABLE_NAME from MySQL to BigQuery $PROJECT_ID:$DATASET.$TABLE_NAME via bucket $upload_folder"

#  uncomment set -x for more printouts
#set -x

mytime=`date '+%y%m%d%H%M'`
hostname=`hostname | tr 'A-Z' 'a-z'`
file_prefix="$TABLE_NAME$mytime$TABLE_SCHEMA"

echo $file_prefix
splitat="4000000000"
bulkfiles=200
maxbad=300


# Cleanup function called on exit
function cleanup_files() {
  rm -f "$TABLE_NAME.*"
  rm -f "$file_prefix*"
  rm -f blob_query.txt json_query.txt
}


# make sure bucket, schema and table names are supplied
if [ $# -ne 3 ];then
echo "Copy a table from MySQL schema to existing bucket in current gcp project then import to BigQuery dataset $DATASET."
echo "usage: $0 BUCKET_NAME SCHEMA_NAME TABLE_NAME"
exit 1
fi

echo "Checking that table doesn't contain blob data."
cat > blob_query.txt << heredoc
select sum(IF((DATA_TYPE LIKE '%blob%'),1, 0)) from INFORMATION_SCHEMA.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
heredoc
mycount=`mysql $mysqlargs -Bs < blob_query.txt`
if [ $mycount -ne 0 ]; then
  echo "Cannot import table $TABLE_NAME. Blob column found."
  exit 3
fi

echo "Creating JSON schema from mysql table structure"
##
## DATE and TIMESTAMP format
## If the DATE field is not formatted as 'YYYY-DD-MM' or TIMESTAMP field is not formatted as YYYY-DD-MM HH:MM:SS' then import these as STRING and use a Query to transform them. You will loose partitioning. E.g.
#select CONCAT('{"name": "', COLUMN_NAME, '","type":"', IF(DATA_TYPE like "%date%", "STRING",IF(DATA_TYPE like "%int%", "INTEGER",IF(DATA_TYPE = "decimal","FLOAT","STRING"))) , '"},') as json from information_schema.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
##
cat > json_query.txt << heredoc
select CONCAT('{"name": "', COLUMN_NAME, '","type":"', IF(DATA_TYPE like "%date%", "TIMESTAMP",IF(DATA_TYPE like "%int%", "INTEGER",IF(DATA_TYPE = "decimal","FLOAT","STRING"))) , '"},') as json from information_schema.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
heredoc
echo '[' >  $TABLE_NAME.json
mysql -Bs < json_query.txt | sed '$s/,$//' >> $TABLE_NAME.json
mysql $TABLE_SCHEMA -Bse"show create table $TABLE_NAME\G" > $TABLE_NAME.sql
echo ', {"name": "hostname","type":"STRING"} ]' >>  $TABLE_NAME.json

echo "Uploading schemas: $TABLE_NAME.json, $TABLE_NAME.sql to cloud: $upload_folder"
gsutil cp $TABLE_NAME.json gs://$upload_folder
gsutil cp $TABLE_NAME.sql gs://$upload_folder

echo "Exporting table: $TABLE_SCHEMA.$TABLE_NAME to CSV-like file: $TABLE_NAME.txt1"
mysql $TABLE_SCHEMA --quick -Bse"select *  from $TABLE_NAME" > $TABLE_NAME.txt1
FILESIZE=$(stat -c%s "$TABLE_NAME.txt1")
if [ -s "$TABLE_NAME.txt1" ]
then
   echo "Converting file $TABLE_NAME.txt1 $FILESIZE to CSV format for Big Query"
else
  echo "$TABLE_NAME.txt1 is empty or does not exist! Exiting."
  exit 3
fi
echo 'tr'
tr -d "\r" < $TABLE_NAME.txt1 > $TABLE_NAME.txt
echo 'sed1'
sed -i "s/$/\t$TABLE_SCHEMA/"  $TABLE_NAME.txt
echo 'sed2'
sed -i 's/(Ctrl-v)(Ctrl-m)//g' $TABLE_NAME.txt

echo "Spliting large file $TABLE_NAME.txt to files with prefix $file_prefix"
split -C $splitat $TABLE_NAME.txt $file_prefix

#echo "Loop and upload $file_prefix to google cloud"
for file in `ls $file_prefix*`
do
  # big query does not seem to like double quotes and NULL
  sed -i -e 's/\"//g' -e's/NULL//g' $file
  gzip $file
  echo "Uploading CSV dumpfile: $file.gz to google cloud: $upload_folder"
  gsutil cp $file.gz gs://$upload_folder
  if [ $? -ne 0 ];then
    echo "$file could not be uploaded to cloud"
    cleanup_files
    exit 3
  fi
  #rm -f $file.gz
done

#check if data set exists and create if not
bq_safe_mk() {
    dataset=$1
    exists=$(bq ls -d | grep -w $dataset)
    if [ -n "$exists" ]; then
       echo "BigQuery Dataset $dataset already exists"
    else
       echo "Creating BigQuery Dataset $dataset"
       bq mk $dataset
    fi
}

# import data to big query
bq_safe_mk $DATASET
#echo "Check what's matching in the bucketi: $upload_folder$file_prefix"
filelist=`gsutil ls gs://$upload_folder$file_prefix*.gz | xargs -n$bulkfiles | tr ' ', ','`
echo "Loading files $filelist into Big Query"

for mylist in $filelist
do
  # Check if we have a TIMESTAMP or DATE column, if we do then we can use partitioning
  BQ_ARGS="--nosync -F \t --job_id=$file --max_bad_record=$maxbad"

  if grep "TIMESTAMP" "$TABLE_NAME.json"; then
    FIELD=$(grep "TIMESTAMP" "$TABLE_NAME.json" | tail -1 | cut -d'"' -f4)
    echo "Partitioning Table based on TIMESTAMP column $FIELD"
    BQ_ARGS+=" --time_partitioning_type=DAY --time_partitioning_field=${FIELD}"
  elif  grep "DATET" "$TABLE_NAME.json"; then
    FIELD=$(grep "DATE" "$TABLE_NAME.json" | tail -1 | cut -d'"' -f4)
    echo "Partitioning Table based on DATE column $FIELD"
    BQ_ARGS+=" --time_partitioning_type=DAY --time_partitioning_field=${FIELD}"
  fi

  #bq load --nosync -F '\t' --job_id="$file" --max_bad_record=$maxbad $DATASET.$TABLE_NAME $mylist $TABLE_NAME.json
  if ! bq load $BQ_ARGS "${DATASET}.${TABLE_NAME}" "$mylist" "${TABLE_NAME}.json"; then
    echo "ERROR: bq load failed for $file, check file exists in cloud."
    cleanup_files
    exit 2
  fi
done

cleanup_files
exit

## setup utils


# Install google utilities
wget http://commondatastorage.googleapis.com/pub/gsutil.tar.gz
tar xfz gsutil.tar.gz -C $HOME
vi ~/.bashrc
export PATH=${PATH}:$HOME/gsutil
cd gsutil
python setup.py install
gsutil config

sudo sh
easy_install bigquery
bq init
	#!/bin/bash
	BUCKET_NAME=$1
	TABLE_SCHEMA=$2
	TABLE_NAME=$3
	PROJECT_ID=`gcloud config list --format 'value(core.project)' 2>/dev/null`
	UIDX=`date '+_%Y_%m_%d'`
	DATASET="$TABLE_SCHEMA$UIDX"
	upload_folder="$BUCKET_NAME/UPLOAD/"

	echo "Import $TABLE_SCHEMA.$TABLE_NAME from MySQL to BigQuery $PROJECT_ID:$DATASET.$TABLE_NAME via bucket $upload_folder"

	# uncomment set -x for more printouts
	#set -x

	mytime=`date '+%y%m%d%H%M'`
	hostname=`hostname \| tr 'A-Z' 'a-z'`
	file_prefix="$TABLE_NAME$mytime$TABLE_SCHEMA"

	echo $file_prefix
	splitat="4000000000"
	bulkfiles=200
	maxbad=300


	# Cleanup function called on exit
	function cleanup_files() {
	rm -f ${TABLE_NAME}.*
	rm -f ${file_prefix}*
	rm -f blob_query.txt json_query.txt
	}


	# make sure bucket, schema and table names are supplied
	if [ $# -ne 3 ];then
	echo "Copy a table from MySQL schema to existing bucket in current gcp project then import to BigQuery dataset $DATASET."
	echo "usage: $0 BUCKET_NAME SCHEMA_NAME TABLE_NAME"
	exit 1
	fi

	echo "Checking that table doesn't contain blob data."
	cat > blob_query.txt << heredoc
	select sum(IF((DATA_TYPE LIKE '%blob%'),1, 0)) from INFORMATION_SCHEMA.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
	heredoc
	mycount=`mysql $mysqlargs -Bs < blob_query.txt`
	if [ $mycount -ne 0 ]; then
	echo "Cannot import table $TABLE_NAME. Blob column found."
	exit 3
	fi

	echo "Creating JSON schema from mysql table structure"
	##
	## DATE and TIMESTAMP format
	## If the DATE field is not formatted as 'YYYY-DD-MM' or TIMESTAMP field is not formatted as YYYY-DD-MM HH:MM:SS' then import these as STRING and use a Query to transform them. You will loose partitioning. E.g.
	#select CONCAT('{"name": "', COLUMN_NAME, '","type":"', IF(DATA_TYPE like "%date%", "STRING",IF(DATA_TYPE like "%int%", "INTEGER",IF(DATA_TYPE = "decimal","FLOAT","STRING"))) , '"},') as json from information_schema.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
	##
	cat > json_query.txt << heredoc
	select CONCAT('{"name": "', COLUMN_NAME, '","type":"', IF(DATA_TYPE like "%date%", "TIMESTAMP",IF(DATA_TYPE like "%int%", "INTEGER",IF(DATA_TYPE = "decimal","FLOAT","STRING"))) , '"},') as json from information_schema.columns where TABLE_SCHEMA = '$TABLE_SCHEMA' AND TABLE_NAME = '$TABLE_NAME';
	heredoc
	echo '[' > $TABLE_NAME.json
	mysql -Bs < json_query.txt \| sed '$s/,$//' >> $TABLE_NAME.json
	mysql $TABLE_SCHEMA -Bse"show create table $TABLE_NAME\G" > $TABLE_NAME.sql
	echo ', {"name": "hostname","type":"STRING"} ]' >> $TABLE_NAME.json

	echo "Uploading schemas: $TABLE_NAME.json, $TABLE_NAME.sql to cloud: $upload_folder"
	gsutil cp $TABLE_NAME.json gs://$upload_folder
	gsutil cp $TABLE_NAME.sql gs://$upload_folder

	echo "Exporting table: $TABLE_SCHEMA.$TABLE_NAME to CSV-like file: $TABLE_NAME.txt1"
	mysql $TABLE_SCHEMA --quick -Bse"select * from $TABLE_NAME" > $TABLE_NAME.txt1
	FILESIZE=$(stat -c%s "$TABLE_NAME.txt1")
	if [ -s "$TABLE_NAME.txt1" ]
	then
	echo "Converting file $TABLE_NAME.txt1 $FILESIZE to CSV format for Big Query"
	else
	echo "$TABLE_NAME.txt1 is empty or does not exist! Exiting."
	exit 3
	fi
	echo 'tr'
	tr -d "\r" < $TABLE_NAME.txt1 > $TABLE_NAME.txt
	echo 'sed1'
	sed -i "s/$/\t$TABLE_SCHEMA/" $TABLE_NAME.txt
	echo 'sed2'
	sed -i 's/(Ctrl-v)(Ctrl-m)//g' $TABLE_NAME.txt

	echo "Spliting large file $TABLE_NAME.txt to files with prefix $file_prefix"
	split -C $splitat $TABLE_NAME.txt $file_prefix

	#echo "Loop and upload $file_prefix to google cloud"
	for file in `ls $file_prefix*`
	do
	# big query does not seem to like double quotes and NULL
	sed -i -e 's/\"//g' -e's/NULL//g' $file
	gzip $file
	echo "Uploading CSV dumpfile: $file.gz to google cloud: $upload_folder"
	gsutil cp $file.gz gs://$upload_folder
	if [ $? -ne 0 ];then
	echo "$file could not be uploaded to cloud"
	cleanup_files
	exit 3
	fi
	#rm -f $file.gz
	done

	#check if data set exists and create if not
	bq_safe_mk() {
	dataset=$1
	exists=$(bq ls -d \| grep -w $dataset)
	if [ -n "$exists" ]; then
	echo "BigQuery Dataset $dataset already exists"
	else
	echo "Creating BigQuery Dataset $dataset"
	bq mk $dataset
	fi
	}

	# import data to big query
	bq_safe_mk $DATASET
	#echo "Check what's matching in the bucketi: $upload_folder$file_prefix"
	filelist=`gsutil ls gs://$upload_folder$file_prefix*.gz \| xargs -n$bulkfiles \| tr ' ', ','`
	echo "Loading files $filelist into Big Query"

	for mylist in $filelist
	do
	# Check if we have a TIMESTAMP or DATE column, if we do then we can use partitioning
	BQ_ARGS="--nosync -F \t --job_id=$file --max_bad_record=$maxbad"

	if grep "TIMESTAMP" "$TABLE_NAME.json"; then
	FIELD=$(grep "TIMESTAMP" "$TABLE_NAME.json" \| tail -1 \| cut -d'"' -f4)
	echo "Partitioning Table based on TIMESTAMP column $FIELD"
	BQ_ARGS+=" --time_partitioning_type=DAY --time_partitioning_field=${FIELD}"
	elif grep "DATE" "$TABLE_NAME.json"; then
	FIELD=$(grep "DATE" "$TABLE_NAME.json" \| tail -1 \| cut -d'"' -f4)
	echo "Partitioning Table based on DATE column $FIELD"
	BQ_ARGS+=" --time_partitioning_type=DAY --time_partitioning_field=${FIELD}"
	fi

	#bq load --nosync -F '\t' --job_id="$file" --max_bad_record=$maxbad $DATASET.$TABLE_NAME $mylist $TABLE_NAME.json
	if ! bq load $BQ_ARGS "${DATASET}.${TABLE_NAME}" "$mylist" "${TABLE_NAME}.json"; then
	echo "ERROR: bq load failed for $file, check file exists in cloud."
	cleanup_files
	exit 2
	fi
	done
	echo "$0 completed. Cleaning up"
	cleanup_files
	exit


	# Install google utilities
	wget http://commondatastorage.googleapis.com/pub/gsutil.tar.gz
	tar xfz gsutil.tar.gz -C $HOME
	vi ~/.bashrc
	export PATH=${PATH}:$HOME/gsutil
	cd gsutil
	python setup.py install
	gsutil config

	sudo sh
	easy_install bigquery
	bq init