Skip to content

Instantly share code, notes, and snippets.

@beugley
Last active November 7, 2022 15:23
Show Gist options
  • Save beugley/2df9b278964217a49828f3f0efa59979 to your computer and use it in GitHub Desktop.
Save beugley/2df9b278964217a49828f3f0efa59979 to your computer and use it in GitHub Desktop.
SAS macro to improve performance/effiiciency via parallel processing
Run time performance of your SAS process can be greatly improved with parallel execution.
This gist describes an approach where your input data set(s) are divided into N equal-sized subsets and your code is
executed in parallel against each subset. For information on other methods of parallel processing, please see this page
from SAS Support: http://support.sas.com/rnd/scalability/tricks/connect.html
Step 1
Divide your input data set(s) into N subsets that are approximately equal in size. The following macro shows one way to
do this.
%macro DIVIDE_INPUT_DATA(N);
/*
** Divide MYLIB.my_data into N data sets (MYLIB.my_subset_1 through MYLIB.my_subset_&N).
*/
data %do I = 1 %to &N; MYLIB.my_subset_&I %end;;
set MYLIB.my_data;
%do I = 1 %to &N-1;
%if "&I" > "1" %then %do; else %end;
if mod(_N_,&N) = (&I-1) then output MYLIB.my_subset_&I;
%end;
else output MYLIB.my_subset_&N;
run;
%mend DIVIDE_INPUT_DATA;
%DIVIDE_INPUT_DATA;
You could also divide your input data by using an element that is already part of your source data (instead of ordinal
record number). But that may not produce equal-sized data sets. It's up to you to choose a method of creating data
subsets that you think is the most effective.
Step 2
Write your code so that it will execute on a specific subset of your data. For example:
data MYLIB.my_out_&PROCESS_NUM;
set MYLIB.my_subset_&PROCESS_NUM;
* Add your code here;
run;
Your code will contain many data steps and procs. Be sure that all steps are written to execute on a specific subset
of data that is identified by the macro variable, &PROCESS_NUM, that specifies the subset number. You may also have
multiple input data sets. Each input data set must be divided into N subsets with corresponding subsets having the
same value of &PROCESS_NUM.
Step 3
Execute your SAS code in N parallel SAS sessions. The following macro, %Exec_Asynch, will do this for you. Define
this macro and include your code where the comment says "Include code here that you want to run in parallel". Then
execute %Exec_Asynch with an argument that specifies the number of data subsets. %Exec_Asynch will invoke a SAS
session on the grid for each subset. The macro variable &PROCESS_NUM will be defined with a different value for
each SAS session. You can use %Exec_Asynch in either batch SAS or SAS-EG. It is recommended that you use batch SAS
to execute long running code.
%macro Exec_Asynch(PROCESS_COUNT, GRID_ENABLE=YES);
%put %sysfunc(putn(%sysfunc(datetime()),datetime19.)) Started;
/*
** Submit each SAS session to the grid by default.
*/
%if ("&GRID_ENABLE" = "YES") %then
%do;
%let rc=%sysfunc(grdsvc_enable(_ALL_, server=SASApp%str(;) jobopts=sambatch));
%let sambatch='queue=sambatch';
%end;
/*
** Instantiate each child SAS session.
*/
%do PROCESS_NUM = 1 %to &PROCESS_COUNT;
signon task_&PROCESS_NUM sascmd="sas" signonwait=yes;
%end;
%put %sysfunc(putn(%sysfunc(datetime()),datetime19.)) Connections established;
/*
** Submit code asynchronously to each SAS session.
*/
%do PROCESS_NUM = 1 %to &PROCESS_COUNT;
%syslput PROCESS_NUM=&PROCESS_NUM/REMOTE=task_&PROCESS_NUM;
rsubmit task_&PROCESS_NUM connectwait=no;
options errorcheck=strict;
%nrstr(%put SAS SESSION &PROCESS_NUM);
/*
** Include code here that you want to run in parallel.
*/
%include "/prod/user/sam/…/…/…/my_code.sas";
%nrstr(%sysrput TASK_&PROCESS_NUM._SYSCC=&SYSCC);
endrsubmit;
%end;
/*
** Wait for all SAS sessions to complete.
*/
waitfor _ALL_
%do PROCESS_NUM = 1 %to &PROCESS_COUNT;
task_&PROCESS_NUM
%end;
;
signoff _ALL_;
/*
** Check status of each SAS session and abort if any failed.
** 0 = success, 4 = warning, all others = failure.
*/
%do PROCESS_NUM = 1 %to &PROCESS_COUNT;
%put TASK_&PROCESS_NUM._SYSCC = &&TASK_&PROCESS_NUM._SYSCC;
%if (&&TASK_&PROCESS_NUM._SYSCC ^= 0 and &&TASK_&PROCESS_NUM._SYSCC ^= 4) %then
%do;
%abort return 2;
%end;
%end;
%put %sysfunc(putn(%sysfunc(datetime()),datetime19.)) Completed;
%mend Exec_Asynch;
/* Execute N concurrent SAS sessions */
%Exec_Asynch(&N);
Step 4
Each of the parallel SAS sessions will produce output for its data subset. You will probably need additional code
to collect and combine that data for subsequent use. Add and execute that code as necessary.
Sort order of your data sets is an important thing to consider. You can divide your input data sets so that each
one is sorted by a key column and that appending them will retain an overall sort order. This is a bit different
from the example above that uses ordinal record number to create subsets. Creating data sets in sorted order has
the advantage of not requiring you to re-sort when you append output data sets at the end. Avoiding sorts will
provide a big performance improvement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment