Given the SAS data set WORK.P2000:
Location | Pop2000 |
Alaska | 626931 |
Delaware | 783595 |
Vermont | 608826 |
Wyoming | 493782 |
and the SAS data set WORK.P2008:
State | Pop2008 |
Alaska | 686293 |
Delaware | 873092 |
Wyoming | 532668 |
The following output is desired:
Obs | State | Pop2000 | Pop2008 | Difference |
1 | Alaska | 626931 | 686293 | 59362 |
2 | Delaware | 783595 | 873092 | 89497 |
3 | Wyoming | 493782 | 532668 | 38886 |
Which SAS program correctly combines the data?
A.
data compare;
merge WORK.P2000(in=_a Location=State)
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;
B.
data compare;
merge WORK.P2000(rename=(Location=State))
WORK.P2008;
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;
C.
data compare;
merge WORK.P2000(in=_a rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;
D.
data compare;
merge WORK.P2000(in=_a) (rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;
注解:MERGE的作用是将来自于多个data set中的观测值合并为一个观测值(相当于SQL中的JOIN)。IN option用于生成一个布尔型变量,变量名为等号右边的变量(题目中分别为_a和_b),当Merge后的观测值有来自于当前data set的数据时,该变量的值为1,否则为0。比如题目中新data set的第一个观测值的State为Alaska,WORK.P2000和WORK.P2008中均有Alaska这个观测值,那么_a和_b在第一次DATA step循环中均为1。以下程序可以显示各个DATA step中_a和_b的值(由于_a和_b并不会被输出到新的data set中,需要把其值赋值给新的变量,以下程序中分别为a和b):
data compare;
merge P2000(in=_a rename=(Location=State))
P2008(in=_b);
by State;
a = _a;
b = _b;
Difference=Pop2008-Pop2000;
run;
Obs | State | Pop2000 | Pop2008 | a | b | Difference |
1 | Alaska | 626931 | 686293 | 1 | 1 | 59362 |
2 | Delaware | 783595 | 873092 | 1 | 1 | 89497 |
3 | Vermont | 608826 | . | 1 | 0 | . |
4 | Wyoming | 493782 | 532668 | 1 | 1 | 38886 |
RENAME option的作用是重新对变量命名,格式为:RENAME = (旧变量名 = 新变量名)。
本题MERGE statement的语法格式,MERGE dataset1 dataset2 (option1 option2);
最后的IF statement则用于输出_a和_b都为1的那些观测值,即WORK.Pop2000和WORK.Pop2008共同拥有的观测值。