Created
June 22, 2017 23:45
-
-
Save Cesar-Urteaga/0d546154fa7303b607a8804f035838ea to your computer and use it in GitHub Desktop.
Compares two tables, record by record, based on a list of ID variables.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
/*-----------------------------------------------------------------------------| | |
| Description : Macro that compares two tables, record by record, based on a | | |
| list of ID variables. | | |
| Assumptions : Each table has at least one ID variable, and ID variables must | | |
| have the same name between tables. | | |
| N.B.: This macro is based on the suggestion made in the | | |
| excellent book "Carpenter's Guide to Innovative SAS | | |
| Techniques" by Art Carpenter, pages 198 through 200. | | |
| Parameters : BaseTable - Is the reference table. | | |
| ReferenceTable - Corresponds to the table that is expected | | |
| to be the same as the reference table. | | |
| DifferencesTable - The name of the output table that will | | |
| contain the differences found between both | | |
| tables. | | |
| IDVariables - A list of variables that allows to map each | | |
| record between tables. | | |
| Output : A table with the differences found between records and based | | |
| on the ID variables. | | |
|-----------------------------------------------------------------------------*/ | |
/* Examples: | |
* We create the sample datasets. ; | |
DATA BASE_TABLE; | |
INFILE DATALINES | |
DSD DLM = ',' | |
MISSOVER; | |
LENGTH ID | |
X $2; | |
INPUT ID $ | |
X $ | |
Y; | |
DATALINES; | |
01,AA,100 | |
02,BB,200 | |
03,CC,20.5 | |
04,DD,800 | |
; | |
RUN; | |
DATA REFERENCE_TABLE; | |
INFILE DATALINES | |
DSD DLM = ',' | |
MISSOVER; | |
LENGTH ID | |
X $2; | |
INPUT ID $ | |
X $ | |
Y; | |
DATALINES; | |
01,AA,100 | |
02,B,200 | |
03,CC,8.5 | |
04,DD,800 | |
; | |
RUN; | |
* Ex_01; | |
%MCompareDataSets(BaseTable = BASE_TABLE, | |
ReferenceTable = REFERENCE_TABLE, | |
DifferencesTable = DIFFERENCES_TABLE, | |
IDVariables = ID) | |
* Output: Creates a table 'DIFFERENCES_TABLE' with the records and variables | |
that were found to be different (i.e., records 02 and 03 and variables | |
X and Y, respectively). ; | |
*/ | |
/*-----------------------------------------------------------------------------| | |
| Date Author Description | | |
|------------------------------------------------------------------------------| | |
| June 22, 2017 Cesar R. Urteaga-Reyesvera Creation. | | |
|-----------------------------------------------------------------------------*/ | |
%MACRO MCompareDataSets(BaseTable = /* Reference table. */, | |
ReferenceTable = /* New table which is expected to be | |
the same as the base table. */, | |
DifferencesTable = /* Table with the differences | |
between the above tables. */, | |
IDVariables = /* A list of variables to identify | |
de records that must be compared. | |
*/ | |
); | |
* Since we use the ID variables, we need to sort them in order to compare | |
them. ; | |
PROC SORT DATA = &BaseTable.; BY &IDVariables.; RUN; | |
PROC SORT DATA = &ReferenceTable.; BY &IDVariables.; RUN; | |
PROC COMPARE BASE = &BaseTable. | |
COMPARE = &ReferenceTable. | |
OUT = TEMPORAL_01 | |
OUTBASE OUTCOMP | |
NOVALUES LISTVAR | |
OUTNOEQUAL; /* We only care about the values which are different | |
between both tables. */ | |
ID &IDVariables; | |
RUN; | |
* We change the table from wide format to long format. ; | |
PROC SORT DATA = TEMPORAL_01; | |
BY &IDVariables. | |
_OBS_; | |
RUN; | |
PROC TRANSPOSE DATA = TEMPORAL_01 | |
OUT = TEMPORAL_02(DROP = _LABEL_ | |
RENAME = (_NAME_ = VARIABLE)); | |
BY &IDVariables. | |
_OBS_; | |
* Allows to list all variables so as to compare them. ; | |
VAR _NUMERIC_ | |
_CHARACTER_; | |
* This statement is used to name the two new columns. ; | |
ID _TYPE_; | |
RUN; | |
* Finally, we get the output table with the differences and delete the | |
temporal tables. ; | |
DATA &DifferencesTable.; | |
SET TEMPORAL_02(WHERE = (VARIABLE ~= "_TYPE_" & | |
BASE ~= COMPARE)); /* We only want the | |
variables that are | |
different. */ | |
RUN; | |
PROC SQL; DROP TABLE TEMPORAL_01, TEMPORAL_02; QUIT; | |
%MEND MCompareDataSets; |
Any update for this script, in case the 2 tables are equal?
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This snippet runs with errors if the two tables are equal; therefore, I will add to my to-do list the consideration of this event.