Skip to content

Instantly share code, notes, and snippets.

@Cesar-Urteaga
Created March 7, 2017 03:57
Show Gist options
  • Save Cesar-Urteaga/690448f19457952114663b1696dd3c56 to your computer and use it in GitHub Desktop.
Save Cesar-Urteaga/690448f19457952114663b1696dd3c56 to your computer and use it in GitHub Desktop.
Split a dataset into smaller tables.
/*-----------------------------------------------------------------------------|
| Description : Macro that split a dataset into smaller datasets. |
| Assumptions : This macro is based on the paper "Splitting a Large SAS Data |
| Set" by John R. Gerlach and Simant Misra. |
| Parameters : InputDataset - Table to be split. It can include the |
| library. |
| NumberOfDatasets - Corresponds to the number of split tables. |
| OutputDatasets - Prefix of the split tables. It can include |
| the library. |
| Output : A set of split tables created from the input table. |
|-----------------------------------------------------------------------------*/
/* Example:
* Creates the file(s) and the table(s) so as to execute the examples. ;
DATA DUMMY_TABLE;
DO I = 1 TO 48;
OUTPUT;
END;
RUN;
* Ex_01;
%MSplitDataset(InputDataset = DUMMY_TABLE);
* Output: Splits the DUMMY_TABLE into two tables (WORK._DS_001 and WORK._DS_002)
each one with 24 records. ;
*/
/*-----------------------------------------------------------------------------|
| Date Author Description |
|------------------------------------------------------------------------------|
| March 06, 2016 Cesar R. Urteaga-Reyesvera Creation. |
|-----------------------------------------------------------------------------*/
%MACRO MSplitDataset(InputDataset = /* Input table. It can include the
library. */,
NumberOfDatasets = 2 /* Number of tables in which input
table will be split. */,
OutputDatasets = WORK._DS_ /* Prefix of the split tables.
It can include the library.
*/
);
DATA %DO _I = 1 %TO &NumberOfDatasets.;
&OutputDatasets.%SYSFUNC(PUTN(&_I, Z3.))(DROP = _SPLITPOINT)
%END;
;
SET &InputDataset. NOBS = _NOBS;
RETAIN _SPLITPOINT;
* We calculate the split-point. ;
IF _N_ = 1 THEN
_SPLITPOINT = INT(_NOBS / &NumberOfDatasets.) +
(MOD(_NOBS, &NumberOfDatasets.) ~= 0);
* Send each record to the designated dataset. ;
IF _N_ <= _SPLITPOINT THEN
OUTPUT &OutputDatasets.001;
%DO _I = 2 %TO &NumberOfDatasets.;
ELSE IF _N_ <= (&_I. * _SPLITPOINT) THEN
OUTPUT &OutputDatasets.%SYSFUNC(PUTN(&_I, Z3.));
%END;
RUN;
%MEND MSplitDataset;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment