Skip to content

Instantly share code, notes, and snippets.

@Cesar-Urteaga
Created June 22, 2017 23:45
Show Gist options
  • Save Cesar-Urteaga/0d546154fa7303b607a8804f035838ea to your computer and use it in GitHub Desktop.
Save Cesar-Urteaga/0d546154fa7303b607a8804f035838ea to your computer and use it in GitHub Desktop.
Compares two tables, record by record, based on a list of ID variables.
/*-----------------------------------------------------------------------------|
| Description : Macro that compares two tables, record by record, based on a |
| list of ID variables. |
| Assumptions : Each table has at least one ID variable, and ID variables must |
| have the same name between tables. |
| N.B.: This macro is based on the suggestion made in the |
| excellent book "Carpenter's Guide to Innovative SAS |
| Techniques" by Art Carpenter, pages 198 through 200. |
| Parameters : BaseTable - Is the reference table. |
| ReferenceTable - Corresponds to the table that is expected |
| to be the same as the reference table. |
| DifferencesTable - The name of the output table that will |
| contain the differences found between both |
| tables. |
| IDVariables - A list of variables that allows to map each |
| record between tables. |
| Output : A table with the differences found between records and based |
| on the ID variables. |
|-----------------------------------------------------------------------------*/
/* Examples:
* We create the sample datasets. ;
DATA BASE_TABLE;
INFILE DATALINES
DSD DLM = ','
MISSOVER;
LENGTH ID
X $2;
INPUT ID $
X $
Y;
DATALINES;
01,AA,100
02,BB,200
03,CC,20.5
04,DD,800
;
RUN;
DATA REFERENCE_TABLE;
INFILE DATALINES
DSD DLM = ','
MISSOVER;
LENGTH ID
X $2;
INPUT ID $
X $
Y;
DATALINES;
01,AA,100
02,B,200
03,CC,8.5
04,DD,800
;
RUN;
* Ex_01;
%MCompareDataSets(BaseTable = BASE_TABLE,
ReferenceTable = REFERENCE_TABLE,
DifferencesTable = DIFFERENCES_TABLE,
IDVariables = ID)
* Output: Creates a table 'DIFFERENCES_TABLE' with the records and variables
that were found to be different (i.e., records 02 and 03 and variables
X and Y, respectively). ;
*/
/*-----------------------------------------------------------------------------|
| Date Author Description |
|------------------------------------------------------------------------------|
| June 22, 2017 Cesar R. Urteaga-Reyesvera Creation. |
|-----------------------------------------------------------------------------*/
%MACRO MCompareDataSets(BaseTable = /* Reference table. */,
ReferenceTable = /* New table which is expected to be
the same as the base table. */,
DifferencesTable = /* Table with the differences
between the above tables. */,
IDVariables = /* A list of variables to identify
de records that must be compared.
*/
);
* Since we use the ID variables, we need to sort them in order to compare
them. ;
PROC SORT DATA = &BaseTable.; BY &IDVariables.; RUN;
PROC SORT DATA = &ReferenceTable.; BY &IDVariables.; RUN;
PROC COMPARE BASE = &BaseTable.
COMPARE = &ReferenceTable.
OUT = TEMPORAL_01
OUTBASE OUTCOMP
NOVALUES LISTVAR
OUTNOEQUAL; /* We only care about the values which are different
between both tables. */
ID &IDVariables;
RUN;
* We change the table from wide format to long format. ;
PROC SORT DATA = TEMPORAL_01;
BY &IDVariables.
_OBS_;
RUN;
PROC TRANSPOSE DATA = TEMPORAL_01
OUT = TEMPORAL_02(DROP = _LABEL_
RENAME = (_NAME_ = VARIABLE));
BY &IDVariables.
_OBS_;
* Allows to list all variables so as to compare them. ;
VAR _NUMERIC_
_CHARACTER_;
* This statement is used to name the two new columns. ;
ID _TYPE_;
RUN;
* Finally, we get the output table with the differences and delete the
temporal tables. ;
DATA &DifferencesTable.;
SET TEMPORAL_02(WHERE = (VARIABLE ~= "_TYPE_" &
BASE ~= COMPARE)); /* We only want the
variables that are
different. */
RUN;
PROC SQL; DROP TABLE TEMPORAL_01, TEMPORAL_02; QUIT;
%MEND MCompareDataSets;
@Cesar-Urteaga
Copy link
Author

Cesar-Urteaga commented Jun 29, 2017

This snippet runs with errors if the two tables are equal; therefore, I will add to my to-do list the consideration of this event.

@vinzibest
Copy link

Any update for this script, in case the 2 tables are equal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment