SAS Base (28)

Given the SAS data set WORK.P2000:

Location Pop2000
Alaska 626931
Delaware 783595
Vermont 608826
Wyoming 493782

and the SAS data set WORK.P2008:

State Pop2008
Alaska 686293
Delaware 873092
Wyoming 532668

The following output is desired:

Obs State Pop2000 Pop2008 Difference
1 Alaska 626931 686293 59362
2 Delaware 783595 873092 89497
3 Wyoming 493782 532668 38886

Which SAS program correctly combines the data?

A.
data compare;
merge WORK.P2000(in=_a Location=State)
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

B.
data compare;
merge WORK.P2000(rename=(Location=State))
WORK.P2008;
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

C.
data compare;
merge WORK.P2000(in=_a rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

D.
data compare;
merge WORK.P2000(in=_a) (rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

Check Answer
Answer: C

注解:MERGE的作用是将来自于多个data set中的观测值合并为一个观测值(相当于SQL中的JOIN)。IN option用于生成一个布尔型变量,变量名为等号右边的变量(题目中分别为_a和_b),当Merge后的观测值有来自于当前data set的数据时,该变量的值为1,否则为0。比如题目中新data set的第一个观测值的State为Alaska,WORK.P2000和WORK.P2008中均有Alaska这个观测值,那么_a和_b在第一次DATA step循环中均为1。以下程序可以显示各个DATA step中_a和_b的值(由于_a和_b并不会被输出到新的data set中,需要把其值赋值给新的变量,以下程序中分别为a和b):
data compare;
merge P2000(in=_a rename=(Location=State))
P2008(in=_b);
by State;
a = _a;
b = _b;
Difference=Pop2008-Pop2000;
run;

Obs State Pop2000 Pop2008 a b Difference
1 Alaska 626931 686293 1 1 59362
2 Delaware 783595 873092 1 1 89497
3 Vermont 608826 . 1 0 .
4 Wyoming 493782 532668 1 1 38886

RENAME option的作用是重新对变量命名,格式为:RENAME = (旧变量名 = 新变量名)。

本题MERGE statement的语法格式,MERGE dataset1 dataset2 (option1 option2);

最后的IF statement则用于输出_a和_b都为1的那些观测值,即WORK.Pop2000和WORK.Pop2008共同拥有的观测值。

7 thoughts to “SAS Base (28)”

  1. Why isn’t it if _a = _b but if _a and _b? I am still confused for if statement in this question since I think the statement “_a and _b” has no meaning. Can you explain this? Thank you in advance.

    1. _a and _b will be either 0 or 1. When the WORK.P2008 contains current observation _a will be 1, and 0 otherwise. We only need observations that both datasets have, _a = 1 and _b = 1.

      if _a and _b is if _a = 1 and _b = 1 in shorthand.

    2. I guess here when _a and _b all equals 1, it means TRUE & TRUE, which again returns TRUE. And IF Statement basically needs TRUE or FALSE.

Leave a Reply to Yi Cancel reply

Your email address will not be published. Required fields are marked *