Created
April 11, 2019 19:10
-
-
Save Metras/49af7f7be3de9bf2363cb79106d2821f to your computer and use it in GitHub Desktop.
How to schedule and coordinate OOZIE Workflows in Hadoop
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SCHEDULING AND COORDINATING OOZIE WORKFLOWS IN HADOOP | |
After you’ve created a set of workflows, you can use a series of Oozie coordinator jobs to schedule when they’re executed. You have two scheduling options for execution: a specific time and the availability of data in conjunction with a certain time. Thanks to Dirk deRoos for this. | |
TIME-BASED SCHEDULING FOR OOZIE COORDINATOR JOBS | |
Oozie coordinator jobs can be scheduled to execute at a certain time, but after they’re started, they can then be configured to run at specified intervals. The following example shows a coordinator job that starts running at a specified start time and date: | |
<coordinator-app name="sampleCoordinator" | |
frequency="${coord:days(1)}" | |
start="2014-06-01T00:01Z " | |
end="2014-06-01T01:00Z " | |
timezone="UTC" | |
xmlns="uri:oozie:coordinator:0.1"> | |
<controls>...</controls> | |
<action> | |
<workflow> | |
<app-path>${workflowAppPath}</app-path> | |
</workflow> | |
</action> | |
</coordinator-app> | |
TIME AND DATA AVAILABILITY-BASED SCHEDULING FOR OOZIE COORDINATOR JOBS | |
Oozie coordinator jobs can also be scheduled to execute at a certain time if specified data files or directories are available. The following listing shows an example of a coordinator that starts running at a specified start time and date, is executed once a day if the data set identified by triggerDatasetDir exists, and runs until the specified end time: | |
<coordinator-app name="sampleCoordinator" | |
frequency="${coord:days(1)}" | |
start="${startTime}" | |
end="${endTime}" | |
timezone="${timeZoneDef}" | |
xmlns="uri:oozie:coordinator:0.1"> | |
<controls>...</controls> | |
<datasets> | |
<dataset name="input" frequency="${coord:days(1)}" initial-instance="${startTime}" timezone="${timeZoneDef}"> | |
<uri-template>${triggerDatasetDir}</uri-template> | |
</dataset> | |
</datasets> | |
<input-events> | |
<data-in name="sampleInput" dataset="input"> | |
<instance>${startTime}</instance> | |
</data-in> | |
</input-events> | |
<action> | |
<workflow> | |
<app-path>${workflowAppPath}</app-path> | |
</workflow> | |
</action> | |
</coordinator-app> | |
RUNNING OOZIE COORDINATOR JOBS | |
Similar to Oozie workflow jobs, coordinator jobs require a job.properties file, and the coordinator.xml file needs to be loaded in the HDFS. To run an Oozie coordinator job from the Oozie command-line interface, issue a command like the following while ensuring that the job.properties file is locally accessible: | |
$ oozie job –config sampleCoordinator/job.properties –run | |
After you submit the job, the coordinator is stored in the Oozie object database. On submission, Oozie returns an identifier to enable you to monitor and administer your coordinator — job: 0000001-00000001234567-oozie-C. | |
To check the status of this job, run the command | |
oozie job -info 0000001-00000001234567-oozie-C | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment