SAS Base (1)

The following SAS program is submitted:

data WORK.TOTAL;
set WORK.SALARY;
by Department Gender;
if First.<_insert_code_> then Payroll=0;
Payroll+Wagerate;
if Last.<_insert_code_>;
run;

The SAS data set WORK.SALARY is currently ordered by Gender within Department.

Which inserted code will accumulate subtotals for each Gender within Department?
A. Gender
B. Department
C. Gender Department
D. Department Gender

Check Answer
Answer: A

注解:SAS通过FIRST.variable和LAST.variable来判断一个BY group的开始和结束。当SAS在读取一个BY group中的第一条记录时,FIRST.variable被赋值为“1”,其余情况赋值为“0”。LAST.variable则在读取最后一条记录是赋值为“1”,其余情况赋值为“0”。我们用一个例子来说明。

首先虚拟一个data set:

data d;
input department $ gender $ wagerate ;
datalines;
D1 F 10
D1 F 12
D1 M 9
D2 F 8
D2 M 5
D3 F 7
D3 F 15
D3 F 3
D4 F 20
;
run;

接下来为这组数据排序:

proc sort data = d out = salary;
by department gender;
run;

sort之后的数据:

Obs department gender wagerate
1 D1 F 10
2 D1 F 12
3 D1 M 9
4 D2 F 8
5 D2 M 5
6 D3 F 7
7 D3 F 15
8 D3 F 3
9 D4 F 20

首先C和D分别给了2个变量,但FIRST.variable1 variable2不符合SAS的语法。前面的FIRST.variable1,SAS可以理解,但单独的一个variable2,前面没有任何逻辑运算符(AND、OR),SAS在编译代码的时候会出错。排除C和D之后,A和B的区别是按照不同的BY group求和。题目告诉我们,数据是先按Department排序,然后在同一Department中再按Gender排序,那么FIRST.department会对每个Department中无论男女、所有人的Wagerate求和,输出结果为:

Obs department gender wagerate payroll
1 D1 M 9 31
2 D2 M 5 13
3 D3 F 3 25
4 D4 F 20 20

相反,由于Gender是二级排序变量,FIRST.gender会对每一个Department中不同性别的人群的Wagerate求和,而非仅仅按性别求和。所以Gender符合题目的要求:accumulate subtotals for each Gender within Department。

最后输出的数据:

Obs department gender wagerate payroll
1 D1 F 12 22
2 D1 M 9 9
3 D2 F 8 8
4 D2 M 5 5
5 D3 F 3 25
6 D4 F 20 20

2 thoughts to “SAS Base (1)”

Leave a Reply

Your email address will not be published. Required fields are marked *