Metras/gist:49af7f7be3de9bf2363cb79106d2821f

## gistfile1.txt
SCHEDULING AND COORDINATING OOZIE WORKFLOWS IN HADOOP

After you’ve created a set of workflows, you can use a series of Oozie coordinator jobs to schedule when they’re executed. You have two scheduling options for execution: a specific time and the availability of data in conjunction with a certain time.  Thanks to Dirk deRoos for this.

TIME-BASED SCHEDULING FOR OOZIE COORDINATOR JOBS
Oozie coordinator jobs can be scheduled to execute at a certain time, but after they’re started, they can then be configured to run at specified intervals. The following example shows a coordinator job that starts running at a specified start time and date:

<coordinator-app name="sampleCoordinator"
                 frequency="${coord:days(1)}"
                 start="2014-06-01T00:01Z "
                 end="2014-06-01T01:00Z "
                 timezone="UTC"
                 xmlns="uri:oozie:coordinator:0.1">
   <controls>...</controls>
   <action>
      <workflow>
         <app-path>${workflowAppPath}</app-path>
      </workflow>
   </action>
</coordinator-app>

TIME AND DATA AVAILABILITY-BASED SCHEDULING FOR OOZIE COORDINATOR JOBS
Oozie coordinator jobs can also be scheduled to execute at a certain time if specified data files or directories are available. The following listing shows an example of a coordinator that starts running at a specified start time and date, is executed once a day if the data set identified by triggerDatasetDir exists, and runs until the specified end time:

<coordinator-app name="sampleCoordinator"
                 frequency="${coord:days(1)}"
                 start="${startTime}"
                 end="${endTime}"
                 timezone="${timeZoneDef}"
                 xmlns="uri:oozie:coordinator:0.1">
   <controls>...</controls>
   <datasets>
      <dataset name="input" frequency="${coord:days(1)}" initial-instance="${startTime}" timezone="${timeZoneDef}">
         <uri-template>${triggerDatasetDir}</uri-template>
      </dataset>
   </datasets>
   <input-events>
         <data-in name="sampleInput" dataset="input">
         <instance>${startTime}</instance>
      </data-in>
   </input-events>
   <action>
      <workflow>
         <app-path>${workflowAppPath}</app-path>
      </workflow>
   </action>
</coordinator-app>
RUNNING OOZIE COORDINATOR JOBS
Similar to Oozie workflow jobs, coordinator jobs require a job.properties file, and the coordinator.xml file needs to be loaded in the HDFS. To run an Oozie coordinator job from the Oozie command-line interface, issue a command like the following while ensuring that the job.properties file is locally accessible:


$ oozie job –config sampleCoordinator/job.properties –run

After you submit the job, the coordinator is stored in the Oozie object database. On submission, Oozie returns an identifier to enable you to monitor and administer your coordinator — job: 0000001-00000001234567-oozie-C.

To check the status of this job, run the command

oozie job -info 0000001-00000001234567-oozie-C
	SCHEDULING AND COORDINATING OOZIE WORKFLOWS IN HADOOP

	After you’ve created a set of workflows, you can use a series of Oozie coordinator jobs to schedule when they’re executed. You have two scheduling options for execution: a specific time and the availability of data in conjunction with a certain time. Thanks to Dirk deRoos for this.

	TIME-BASED SCHEDULING FOR OOZIE COORDINATOR JOBS
	Oozie coordinator jobs can be scheduled to execute at a certain time, but after they’re started, they can then be configured to run at specified intervals. The following example shows a coordinator job that starts running at a specified start time and date:

	<coordinator-app name="sampleCoordinator"
	frequency="${coord:days(1)}"
	start="2014-06-01T00:01Z "
	end="2014-06-01T01:00Z "
	timezone="UTC"
	xmlns="uri:oozie:coordinator:0.1">
	<controls>...</controls>
	<action>
	<workflow>
	<app-path>${workflowAppPath}</app-path>
	</workflow>
	</action>
	</coordinator-app>

	TIME AND DATA AVAILABILITY-BASED SCHEDULING FOR OOZIE COORDINATOR JOBS
	Oozie coordinator jobs can also be scheduled to execute at a certain time if specified data files or directories are available. The following listing shows an example of a coordinator that starts running at a specified start time and date, is executed once a day if the data set identified by triggerDatasetDir exists, and runs until the specified end time:

	<coordinator-app name="sampleCoordinator"
	frequency="${coord:days(1)}"
	start="${startTime}"
	end="${endTime}"
	timezone="${timeZoneDef}"
	xmlns="uri:oozie:coordinator:0.1">
	<controls>...</controls>
	<datasets>
	<dataset name="input" frequency="${coord:days(1)}" initial-instance="${startTime}" timezone="${timeZoneDef}">
	<uri-template>${triggerDatasetDir}</uri-template>
	</dataset>
	</datasets>
	<input-events>
	<data-in name="sampleInput" dataset="input">
	<instance>${startTime}</instance>
	</data-in>
	</input-events>
	<action>
	<workflow>
	<app-path>${workflowAppPath}</app-path>
	</workflow>
	</action>
	</coordinator-app>
	RUNNING OOZIE COORDINATOR JOBS
	Similar to Oozie workflow jobs, coordinator jobs require a job.properties file, and the coordinator.xml file needs to be loaded in the HDFS. To run an Oozie coordinator job from the Oozie command-line interface, issue a command like the following while ensuring that the job.properties file is locally accessible:


	$ oozie job –config sampleCoordinator/job.properties –run

	After you submit the job, the coordinator is stored in the Oozie object database. On submission, Oozie returns an identifier to enable you to monitor and administer your coordinator — job: 0000001-00000001234567-oozie-C.

	To check the status of this job, run the command

	oozie job -info 0000001-00000001234567-oozie-C