SAS Base (28)

Given the SAS data set WORK.P2000:

Location	Pop2000
Alaska	626931
Delaware	783595
Vermont	608826
Wyoming	493782

and the SAS data set WORK.P2008:

State	Pop2008
Alaska	686293
Delaware	873092
Wyoming	532668

The following output is desired:

Obs	State	Pop2000	Pop2008	Difference
1	Alaska	626931	686293	59362
2	Delaware	783595	873092	89497
3	Wyoming	493782	532668	38886

Which SAS program correctly combines the data?

A.
data compare;
merge WORK.P2000(in=_a Location=State)
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

B.
data compare;
merge WORK.P2000(rename=(Location=State))
WORK.P2008;
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

C.
data compare;
merge WORK.P2000(in=_a rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

D.
data compare;
merge WORK.P2000(in=_a) (rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

Check Answer

Answer: C

注解：MERGE的作用是将来自于多个data set中的观测值合并为一个观测值（相当于SQL中的JOIN）。IN option用于生成一个布尔型变量，变量名为等号右边的变量（题目中分别为_a和_b），当Merge后的观测值有来自于当前data set的数据时，该变量的值为1，否则为0。比如题目中新data set的第一个观测值的State为Alaska，WORK.P2000和WORK.P2008中均有Alaska这个观测值，那么_a和_b在第一次DATA step循环中均为1。以下程序可以显示各个DATA step中_a和_b的值（由于_a和_b并不会被输出到新的data set中，需要把其值赋值给新的变量，以下程序中分别为a和b）：
data compare;
merge P2000(in=_a rename=(Location=State))
P2008(in=_b);
by State;
a = _a;
b = _b;
Difference=Pop2008-Pop2000;
run;

Obs	State	Pop2000	Pop2008	a	b	Difference
1	Alaska	626931	686293	1	1	59362
2	Delaware	783595	873092	1	1	89497
3	Vermont	608826	.	1	0	.
4	Wyoming	493782	532668	1	1	38886

RENAME option的作用是重新对变量命名，格式为：RENAME = (旧变量名 = 新变量名)。

本题MERGE statement的语法格式，MERGE dataset1 dataset2 (option1 option2);

最后的IF statement则用于输出_a和_b都为1的那些观测值，即WORK.Pop2000和WORK.Pop2008共同拥有的观测值。

7 thoughts to “SAS Base (28)”

Pingback: SAS Base (50) | SAS&R&Python
Pingback: SAS Base (52) | SAS&R&Python
Kyle says:

January 27, 2016 at 11:25 am

Why isn’t it if _a = _b but if _a and _b? I am still confused for if statement in this question since I think the statement “_a and _b” has no meaning. Can you explain this? Thank you in advance.

1. Harrison says:
  
  February 3, 2016 at 1:02 pm
  
  _a and _b will be either 0 or 1. When the WORK.P2008 contains current observation _a will be 1, and 0 otherwise. We only need observations that both datasets have, _a = 1 and _b = 1.
  
  if _a and _b is if _a = 1 and _b = 1 in shorthand.
  
  1. Bingwei says:
    
    July 17, 2016 at 5:58 am
    
    Will _a=_b be ambiguous? Since _a=0 and _b=0 also meet the condition.
    
    1. Harrison says:
      
      August 8, 2016 at 6:11 am
      
      Nice pickup. It’s a typo, should be _a and _b. Corrected, thanks.
      
2. Yi says:
  
  July 17, 2017 at 1:51 pm
  
  I guess here when _a and _b all equals 1, it means TRUE & TRUE, which again returns TRUE. And IF Statement basically needs TRUE or FALSE.

SAS Base (28)

Related

7 thoughts to “SAS Base (28)”

Leave a Reply to Yi Cancel reply

Share this:

Related

7 thoughts to “SAS Base (28)”

Leave a Reply to Yi Cancel reply