Given the SAS data set WORK.P2000:
Location |
Pop2000 |
Alaska |
626931 |
Delaware |
783595 |
Vermont |
608826 |
Wyoming |
493782 |
and the SAS data set WORK.P2008:
State |
Pop2008 |
Alaska |
686293 |
Delaware |
873092 |
Wyoming |
532668 |
The following output is desired:
Obs |
State |
Pop2000 |
Pop2008 |
Difference |
1 |
Alaska |
626931 |
686293 |
59362 |
2 |
Delaware |
783595 |
873092 |
89497 |
3 |
Wyoming |
493782 |
532668 |
38886 |
Which SAS program correctly combines the data?
A.
data compare;
merge WORK.P2000(in=_a Location=State)
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;
B.
data compare;
merge WORK.P2000(rename=(Location=State))
WORK.P2008;
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;
C.
data compare;
merge WORK.P2000(in=_a rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;
D.
data compare;
merge WORK.P2000(in=_a) (rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;
Check AnswerHide AnswerAnswer: C
注解:MERGE的作用是将来自于多个data set中的观测值合并为一个观测值(相当于SQL中的JOIN)。IN option用于生成一个布尔型变量,变量名为等号右边的变量(题目中分别为_a和_b),当Merge后的观测值有来自于当前data set的数据时,该变量的值为1,否则为0。比如题目中新data set的第一个观测值的State为Alaska,WORK.P2000和WORK.P2008中均有Alaska这个观测值,那么_a和_b在第一次DATA step循环中均为1。以下程序可以显示各个DATA step中_a和_b的值(由于_a和_b并不会被输出到新的data set中,需要把其值赋值给新的变量,以下程序中分别为a和b):
data compare;
merge P2000(in=_a rename=(Location=State))
P2008(in=_b);
by State;
a = _a;
b = _b;
Difference=Pop2008-Pop2000;
run;
Obs |
State |
Pop2000 |
Pop2008 |
a |
b |
Difference |
1 |
Alaska |
626931 |
686293 |
1 |
1 |
59362 |
2 |
Delaware |
783595 |
873092 |
1 |
1 |
89497 |
3 |
Vermont |
608826 |
. |
1 |
0 |
. |
4 |
Wyoming |
493782 |
532668 |
1 |
1 |
38886 |
RENAME option的作用是重新对变量命名,格式为:RENAME = (旧变量名 = 新变量名)。
本题MERGE statement的语法格式,MERGE dataset1 dataset2 (option1 option2);
最后的IF statement则用于输出_a和_b都为1的那些观测值,即WORK.Pop2000和WORK.Pop2008共同拥有的观测值。