SAS Base (8)

The SAS data set named WORK.SALARY contains 10 observations for each department,and is currently ordered by Department. The following SAS program is submitted:

data WORK.TOTAL;
set WORK.SALARY(keep=Department MonthlyWageRate);
by Department;
if First.Department=1 then Payroll=0;
Payroll+(MonthlyWageRate*12);
if Last.Department=1;
run;

Which statement is true?
A. The by statement in the DATA step causes a syntax error.
B. The statement Payroll+(MonthlyWageRate*12); in the data step causes a syntax error.
C. The values of the variable Payroll represent the monthly total for each department in the WORK.SALARY data set.
D. The values of the variable Payroll represent a monthly total for all values of WAGERATE in the WORK.SALARY data set.

Check Answer
Answer: C

注解:和SAS Base (1)类似。

SAS Base (7)

Which of the following choices is an unacceptable ODS destination for producing output that can be viewed in Microsoft Excel?

A. MSOFFICE2K
B. EXCELXP
C. CSVALL
D. WINXP

Check Answer
Answer: D

注解:MSOFFICE2K用于生成XLS文件,EXCELXP生成XML文件,CSVALL生成CSV文件,以上三种都可以用Excel打开。最后,不存在WINXP这种ODS destination。

SAS Base (6)

After a SAS program is submitted, the following is written to the SAS log:

101 data WORK.JANUARY;
102 set WORK.ALLYEAR(keep=product month num_Sold Cost);
103 if Month=’Jan’ then output WORK.JANUARY;
104 Sales=Cost * Num_Sold;
105 keep=Product Sales;
-----
22
ERROR 22-322: Syntax error, expecting one of the following: !,!!, &, *, **, +, -, , <=, <>, =, >, >=, AND, EQ, GE, GT, IN, LE, LT, MAX, MIN, NE, NG, NL,NOTIN, OR, ^=, |, ||, ~=.
106 run;

What changes should be made to the KEEP statement to correct the errors in the LOG?
A. keep=(Product Sales);
B. keep Product, Sales;
C. keep=Product, Sales;
D. keep Product Sales;

Check Answer
Answer: D

注解:KEEP用于指定需要包括在data set中的变量。与之相反的是DROP,用于指定那些不需要的变量。KEEP和DROP既可作为option跟在DATA statement或者SET statement后面,也可以作为独立的statement出现。当作为option时,需要写在括号内,并跟上一个等号,比如:(keep = variable1 variable2)。当作为statement时,不需要括号也不需要等号,比如:keep variable1 variable2。

SAS Base (5)

Which statement specifies that records 1 through 10 are to be read from the raw data file customer.txt?

A. infile ‘customer.txt’ 1-10;
B. input ‘customer.txt’ [email protected];
C. infile ‘customer.txt’ obs=10;
D. input ‘customer.txt’ stop=10;

Check Answer
Answer: C

注解:OBS用于指定所需读取的最后一行数据的位置。与之相反的是FIRSTOBS,用于指定读取数据的启示位置。需要注意的是,OBS和FIRSTOBS对应的是INFILE数据源中record的行数,而非observation。比如以下这种情况:
data d;
infile datalines firstobs = 2 obs = 5;
input x;
input y;
datalines;
1
2
3
4
5
6
7
8
9
10
;
run;
两行record组成一个observation。生成的data set应为:

Obs x y
1 2 3
2 4 5

最后,INPUT statement中不存在STOP这个argument。

SAS Base (4)

The following SAS program is submitted:

data WORK.DATE_INFO;
Month=”01″ ;
Yr=1960 ;
X=mdy(Month,01,Yr) ;
run;

What is the value of the variable X?
A. the numeric value 0
B. the character value “01011960”
C. a missing value due to syntax errors
D. the step will not compile because of the character argument in the mdy function.

Check Answer
Answer: A

注解:SAS在存储日期时,将1960年1月1日作为基准日期,以数字0表示,1959年12月31日为-1,1960年1月2日为1,以此类推。MDY这个function的作用是将某一日期转化为SAS日期。1960年1月1日正是基准日期,所以X为数字0.值得注意的是,题目中的Month是字符型(Char)变量而非数字,在程序执行过程中,SAS将char “01”转化成了数字01,并在Log中通过Note予以显示:”Character values have been converted to numeric values at the places given by:”

SAS Base (3)

The Excel workbook REGIONS.XLS contains the following four worksheets:
EAST
WEST
NORTH
SOUTH

The following program is submitted:

libname MYXLS ‘regions.xls’;

Which PROC PRINT step correctly displays the NORTH worksheet?
A. proc print data=MYXLS.NORTH;run;
B. proc print data=MYXLS.NORTH$;run;
C. proc print data=MYXLS.’NORTH’e;run;
D. proc print data=MYXLS.’NORTH$’n;run;

Check Answer
Answer: D

注解:使用SAS/ACCESS engine来读取Excel文件,Excel中的每一个worksheet会对应生成2个文件。以NORTH worksheet为例,SAS会生成2个文件:NORTH和NORTH$,worksheet中的数据存储在NORTH$中,但由于“$”并不是合法的data set名字,需要用到n notation。

SAS Base (2)

Given the following raw data records in TEXTFILE.TXT:

----|----10---|----20---|----30
John,FEB,13,25,14,27,Final
John,MAR,26,17,29,11,23,Current
Tina,FEB,15,18,12,13,Final
Tina,MAR,29,14,19,27,20,Current

The following output is desired:

Obs Name Month Status Week1 Week2 Week3 Week4 Week5
1 John FEB Final $13 $25 $14 $27 .
2 John MAR Current $26 $17 $29 $11 $23
3 Tina FEB Final $15 $18 $12 $13 .
4 Tina MAR Current $29 $14 $19 $27 $20

Which SAS program correctly produces the desired output?

A.
data WORK.NUMBERS;

length Name $ 4 Month $ 3 Status $ 7;

infile ‘TEXTFILE.TXT’ dsd;

input Name $ Month $;

if Month=’FEB’ then input Week1 Week2 Week3 Week4 Status $;

else if Month=’MAR’ then input Week1 Week2 Week3 Week4 Week5 Status $;

format Week1-Week5 dollar6.;

run;

proc print data=WORK.NUMBERS;

run;

B.

data WORK.NUMBERS;

length Name $ 4 Month $ 3 Status $ 7;

infile ‘TEXTFILE.TXT’ dlm=’,’ missover;

input Name $ Month $;

if Month=’FEB’ then input Week1 Week2 Week3 Week4 Status $;

else if Month=’MAR’ then input Week1 Week2 Week3 Week4 Week5 Status $;

format Week1-Week5 dollar6.;

run;

proc print data=WORK.NUMBERS;

run;

C.

data WORK.NUMBERS;

length Name $ 4 Month $ 3 Status $ 7;

infile ‘TEXTFILE.TXT’ dlm=’,’;

input Name $ Month $ @;

if Month=’FEB’ then input Week1 Week2 Week3 Week4 Status $;

else if Month=’MAR’ then input Week1 Week2 Week3 Week4 Week5 Status $;

format Week1-Week5 dollar6.;

run;

proc print data=WORK.NUMBERS;

run;

D.

data WORK.NUMBERS;

length Name $ 4 Month $ 3 Status $ 7;

infile ‘TEXTFILE.TXT’ dsd @;

input Name $ Month $;

if Month=’FEB’ then input Week1 Week2 Week3 Week4 Status $;

else if Month=’MAR’ then input Week1 Week2 Week3 Week4 Week5 Status $;

format Week1-Week5 dollar6.;

run;

proc print data=WORK.NUMBERS;

run;

Check Answer
Answer: C

注解:

DSD:默认“,”为分隔符,将2个连续的分隔符视为一个missing value。比如将数据“a,b,,d”视为:’a’ ‘b’ missing value ‘d’

DLM:等价于DELIMITER,用于替换默认分隔符(空格)。比如DLM=’*’,将分隔符由空格替换成‘*’

MISSOVER:如果一行数据中的数据个数少于需要定义的变量数量,MISSOVER将防止SAS去下一行寻找数据,并将多出来的变量的值设为missing。比如,一行数据中仅有3个数据“a,b,c”,但INPUT中定义了4个变量(variable1-variable4)。如果没有MISSOVER,SAS会去新的一行寻找数据并为variable4赋值。加上MISSOVER,SAS就不会去下一行,而是将variable4的值设为missing。

@:默认情况下,每出现一次INPUT,SAS都会去新的一行读取数据,而@的作用是让SAS继续在当前行读取数据。比如这个例子:
data d1;
input v1 $ v2 $ @;
input v3 $ v4 $;
datalines;
a b c d
e f g h
;
run;
有@的输出为:

Obs v1 v2 v3 v4
1 a b c d
2 e f g h

去掉@则为:

Obs v1 v2 v3 v4
1 a b e f

与@类似的还有@@。区别在于,在同一个DATA步骤中阻止换行用@,而在不同的DATA步骤中则用@@。何为同一DATA步骤?上面这个例子中,声明了4个变量,那么定义一遍v1, v2, v3, v4为一个DATA步骤。下面举一个使用@@的例子:
data d2;
input v1 $ v2 $ @@;
datalines;
a b c d
;
run;
这个例子中定义一遍v1, v2为一个DATA步骤,@@能够阻止SAS在下一个DATA步骤中去新的一行读取数据。
有@@输出的结果为:

Obs v1 v2
1 a b
2 c d

不使用@或仅使用一个@的输出结果为:

Obs v1 v2
1 a b

SAS Base (1)

The following SAS program is submitted:

data WORK.TOTAL;
set WORK.SALARY;
by Department Gender;
if First.<_insert_code_> then Payroll=0;
Payroll+Wagerate;
if Last.<_insert_code_>;
run;

The SAS data set WORK.SALARY is currently ordered by Gender within Department.

Which inserted code will accumulate subtotals for each Gender within Department?
A. Gender
B. Department
C. Gender Department
D. Department Gender

Check Answer
Answer: A

注解:SAS通过FIRST.variable和LAST.variable来判断一个BY group的开始和结束。当SAS在读取一个BY group中的第一条记录时,FIRST.variable被赋值为“1”,其余情况赋值为“0”。LAST.variable则在读取最后一条记录是赋值为“1”,其余情况赋值为“0”。我们用一个例子来说明。

首先虚拟一个data set:

data d;
input department $ gender $ wagerate ;
datalines;
D1 F 10
D1 F 12
D1 M 9
D2 F 8
D2 M 5
D3 F 7
D3 F 15
D3 F 3
D4 F 20
;
run;

接下来为这组数据排序:

proc sort data = d out = salary;
by department gender;
run;

sort之后的数据:

Obs department gender wagerate
1 D1 F 10
2 D1 F 12
3 D1 M 9
4 D2 F 8
5 D2 M 5
6 D3 F 7
7 D3 F 15
8 D3 F 3
9 D4 F 20

首先C和D分别给了2个变量,但FIRST.variable1 variable2不符合SAS的语法。前面的FIRST.variable1,SAS可以理解,但单独的一个variable2,前面没有任何逻辑运算符(AND、OR),SAS在编译代码的时候会出错。排除C和D之后,A和B的区别是按照不同的BY group求和。题目告诉我们,数据是先按Department排序,然后在同一Department中再按Gender排序,那么FIRST.department会对每个Department中无论男女、所有人的Wagerate求和,输出结果为:

Obs department gender wagerate payroll
1 D1 M 9 31
2 D2 M 5 13
3 D3 F 3 25
4 D4 F 20 20

相反,由于Gender是二级排序变量,FIRST.gender会对每一个Department中不同性别的人群的Wagerate求和,而非仅仅按性别求和。所以Gender符合题目的要求:accumulate subtotals for each Gender within Department。

最后输出的数据:

Obs department gender wagerate payroll
1 D1 F 12 22
2 D1 M 9 9
3 D2 F 8 8
4 D2 M 5 5
5 D3 F 3 25
6 D4 F 20 20