SAS Base (52)

Given the SAS data set WORK.EMP_NAME:

Name EmpID
Jill 1864
Jack 2121
Joan 4698
John 5463

Given the SAS data set WORK.EMP_DEPT:

EmpID Department
2121 Accounti
3567 Finance
4698 Marketin
5463 Accounti

The following program is submitted:

data WORK.ALL;
merge WORK.EMP_NAME(in=Emp_N)
WORK.EMP_DEPT(in=Emp_D);
by Empid;
if (Emp_N and not Emp_D) or (Emp_D and not Emp_N);
run;

How many observations are in data set WORK.ALL after submitting the program?

A. 1
B. 2
C. 3
D. 5

Check Answer
Answer: B

注解:The new data set WORK.ALL only contains observation which exits in EMP_Name but not in EMP_DEPT, or the other way around. WORK.ALL is:

Obs Name EmpID Department
1 Jill 1864
2 3567 Finance

For the details of MERGE statement and IN option, please check SAS Base (28).

SAS Base (28)

Given the SAS data set WORK.P2000:

Location Pop2000
Alaska 626931
Delaware 783595
Vermont 608826
Wyoming 493782

and the SAS data set WORK.P2008:

State Pop2008
Alaska 686293
Delaware 873092
Wyoming 532668

The following output is desired:

Obs State Pop2000 Pop2008 Difference
1 Alaska 626931 686293 59362
2 Delaware 783595 873092 89497
3 Wyoming 493782 532668 38886

Which SAS program correctly combines the data?

A.
data compare;
merge WORK.P2000(in=_a Location=State)
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

B.
data compare;
merge WORK.P2000(rename=(Location=State))
WORK.P2008;
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

C.
data compare;
merge WORK.P2000(in=_a rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

D.
data compare;
merge WORK.P2000(in=_a) (rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

Check Answer
Answer: C

注解:MERGE的作用是将来自于多个data set中的观测值合并为一个观测值(相当于SQL中的JOIN)。IN option用于生成一个布尔型变量,变量名为等号右边的变量(题目中分别为_a和_b),当Merge后的观测值有来自于当前data set的数据时,该变量的值为1,否则为0。比如题目中新data set的第一个观测值的State为Alaska,WORK.P2000和WORK.P2008中均有Alaska这个观测值,那么_a和_b在第一次DATA step循环中均为1。以下程序可以显示各个DATA step中_a和_b的值(由于_a和_b并不会被输出到新的data set中,需要把其值赋值给新的变量,以下程序中分别为a和b):
data compare;
merge P2000(in=_a rename=(Location=State))
P2008(in=_b);
by State;
a = _a;
b = _b;
Difference=Pop2008-Pop2000;
run;

Obs State Pop2000 Pop2008 a b Difference
1 Alaska 626931 686293 1 1 59362
2 Delaware 783595 873092 1 1 89497
3 Vermont 608826 . 1 0 .
4 Wyoming 493782 532668 1 1 38886

RENAME option的作用是重新对变量命名,格式为:RENAME = (旧变量名 = 新变量名)。

本题MERGE statement的语法格式,MERGE dataset1 dataset2 (option1 option2);

最后的IF statement则用于输出_a和_b都为1的那些观测值,即WORK.Pop2000和WORK.Pop2008共同拥有的观测值。