Long H. Nguyen bio photo

Long H. Nguyen

Knowledge worths sharing!

Email

This post will provide basics about bash scripts, quick guide for running Hive in command lines, and essential tips with example for running Hive script on servers with input parameters and loop.

Bash Scripting Basic

Builtin shell variables

$0             #Name of this shell script itself.
$1             #Value of first command line parameter (similarly $2, $3, etc)
$#             #In a shell script, the number of command line parameters.
$*             #All of the command line parameters.
$-             #Options given to the shell.
$?             #Return the exit status of the last command.
$$             #Process id of script (really id of the shell running the script)

IF and CASE condition statements

if [ "$VAR1" = "$VAR2" ]; then
	echo "expression evaluated as true"
else
	echo "expression evaluated as false"
fi
case "$C" in
"1")
	do_this()
	;;
"2" | "3")
	do_what_you_are_supposed_to_do()
	;;
*)    #otherwise
	do_nothing()
	;;
esac

FOR loop

for i in 1 2 3 4 5    # can also be written for i in {1..5} or {start..end..increment}
do
	echo "Welcome $i times"
done

WHILE loop

while [ condition ]
do
	command
done

Hive CLI (Command Line Interface)

Run Query

hive -e 'select a.col from tab1 a'

Run Non-Interactive Script

hive -f script.sql

Run script inside shell

source file_name

Setting Configuration Property for current Hive

hive --hiveconf year_mm=201808 -f script.sql

Run a Hive Script Even after You Logout

Nohup is very helpful when you have to execute a shell-script or command that take a long time to finish. In that case, you don’t want to be connected to the shell and waiting for the command to complete. Instead, execute it with nohup, exit the shell and continue with your other work.

nohup hive --hiveconf year_mm=201808 -f script.sql 

By default, the standard output will be redirected to nohup.out file in the current directory. And the standard error will be redirected to stdout, thus it will also go to nohup.out. So, your nohup.out will contain both standard output and error messages from the script that you’ve executed using nohup command.

Bash Scripting for Hive

Suppose that your Hive script needs to loop over several months (or years) and requires two parameters, year_mm and pre_year_mm. The Hive script is named as my_hive_script.hql

Below bash script can help to execute your Hive script over the months and notify you by sending an email when the job is done.

EMAIL="nlong@gmail.com"

echo "CMI: Finding common accounts within same months of two years ...."

yyyymm=(201808 201807 201806 201805 201804 201803 201802)
prev_yyyymm=(201807 201806 201805 201804 201803 201802 201801)
for((i=0;i<${#yyyymm[@]};i++))
do
  echo "Running for month: ${yyyymm[i]}, previous month: ${prev_yyyymm[i]}"
  hive --hiveconf year_mm=${yyyymm[i]} --hiveconf pre_year_mm=${prev_yyyymm[i]} -f my_hive_script.hql
done
    
mail -s "Script $0 completed." $EMAIL 
exit

Save the above example as my_hive_script.sh. Remember to use nohup when executing the script

nohup ./my_hive_script.sh

Bugs when switch from Window to Unix

Suppose you are using a Window system and want to execute your scripts on your Unix servers. You may have some bugs of mismatch decodes between the two systesms (it took me a few hours to figure out the bug.) To fix it, you can use Notepad++ to convert the format as follow:

  • Notepad++ > Edit > Unix (LF)

Then, save it and copy to your Unix server before running commands.