SAS Base (37)

The SAS data set Fed.Banks contains a variable Open_Date which has been assigned a permanent label of “Open Date”. Which SAS program temporarily replaces the label “Open Date” with the label “Starting Date” in the output?

A.
proc print data=SASUSER.HOUSES label;
label Open_Date “Starting Date”;
run;

B.
proc print data=SASUSER.HOUSES label;
label Open_Date=”Starting Date”;
run;

C.
proc print data=SASUSER.HOUSES;
label Open_Date=”Starting Date”;
run;

D.
proc print data=SASUSER.HOUSES;
Open_Date=”Starting Date”;
run;

Check Answer
Answer: B

注解:LABEL statement用于给变量定义标签,格式为:
LABEL variable = ‘label‘;
而PROC PRINT中的LABEL argument则是告诉SAS在打印数据时,输出该变量的label,而非默认的变量名。

SAS Base (36)

Given the contents of the raw data file ‘EMPLOYEE.TXT’:

----|----10---|----20---|----30--
Xing           2 19 2004 ACCT
Bob            5 22 2004 MKTG
Jorge          3 14 2004 EDUC

The following SAS program is submitted:

data WORK.EMPLOYEE;
infile ‘EMPLOYEE.TXT’;
input
@1 FirstName $
@15 StartDate
@25 Department $;
run;

Which SAS informat correctly completes the program?
A. date9.
B. mmddyy10.
C. ddmmyy10.
D. mondayyr10.

Check Answer
Answer: B

注解:FORMAT是指用于输出数据(例如:PROC PRINT)的格式,而INFORMAT则是读取数据采用的格式。’ 2 19 2004’这种格式的日期,月日年由MMDDYY.读取,10则告诉SAS一共有10个字符,其中包括3个空格,2前面一个,19前面一个,2004前面一个。程序中的@用于指定读取数据的位置,@n指从第n个字符开始读。StartDate是从第15个字符开始读,而第15个字符是一个空格,2是第16个字符,这就解释了为什么在读入的日期里,2前有会有一个空格。

SAS Base (35)

given the SAS data set SASDATA.TWO:

X Y
5 2
3 1
5 6

The following SAS program is submitted:

data SASUSER.ONE SASUSER.TWO OTHER;
set SASDATA.TWO;
if X eq 5 then output SASUSER.ONE;
if Y lt 5 then output SASUSER.TWO;
output;
run;

What is the result?

A.
data set SASUSER.ONE has 5 observations
data set SASUSER.TWO has 5 observations
data set WORK.OTHER has 3 observations

B.
data set SASUSER.ONE has 2 observations
data set SASUSER.TWO has 2 observations
data set WORK.OTHER has 1 observations

C.
data set SASUSER.ONE has 2 observations
data set SASUSER.TWO has 2 observations
data set WORK.OTHER has 5 observations

D. No data sets are output. The DATA step fails execution due to syntax errors.

Check Answer
Answer: A

注解:DATA statement可以一次建立多个data set,可以通过OUTPUT statement来指定所需写入数据的data set。如果程序中没有出现OUTPUT statement,那么SAS会自动在DATA step的最后加入OUTPUT,即一下2个程序是一样的:

data d;
infile datalines;
input x;
output;
datalines;
1
;
run;
data d;
infile datalines;
input x;
datalines;
1
;
run;

本题中,出现了3个OUTPUT,第一个将SASDATA.TWO中X等于5的观测值写入SASUSER.ONE中(2个观测值),第二个将SASDATA.TWO中Y小于5的观测值写入SASUSER.TWO中(2个观测值)。第三个OUTPUT statement等价于:
OUTPUT SASUSER.ONE SASUSER.TWO OTHER;
将SASDATA.TWO中所有的数据同时写入SASUSER.ONE、SASUSER.TWO和OTHER这3个data set(3个观测值)。

SASUSER.ONE:

Obs X Y
1 5 2
2 5 2
3 3 1
4 5 6
5 5 6

SASDATA.TWO:

Obs X Y
1 5 2
2 5 2
3 3 1
4 3 1
5 5 6

OTHER:

Obs X Y
1 5 2
2 3 1
3 5 6

SAS Base (34)

A user-defined format has been created using the FORMAT procedure.How is it stored?
A. in a SAS catalog
B. in a memory resident lookup table
C. in a SAS dataset in the WORK library
D. in a SAS dataset in a permanent SAS data library

Check Answer
Answer: A

注解:存储自定义FORMAT的文件格式叫作catalog。FORMAT的存储位置和名字可以通过PROC FORMAT statement中的LIBRARY option设定,缩写为LIB,格式为:LIBRARY = lebref.catalog_name。例如:
PROC FORMAT LIBRARY = SASLIB.self_defined_format;
会在SASLIB library中建立名为self_defined_format的catalog。如果不定义libref,默认library为WORK 。如果不定义catalog_name,默认名字为Formats。如果不设置LIBRARY option,FORMAT会存储在WORK library下,文件名为Formats。

SAS Base (33)

The following SAS program is submitted:

data WORK.TEST;
set WORK.PILOTS;
if Jobcode=’Pilot2′ then Description=’Senior Pilot’;
else Description=’Unknown’;
run;

The value for the variable Jobcode is: PILOT2. What is the value of the variable Description?
A. PILOT2
B. Unknown
C. Senior Pilot
D. ‘ ‘ (missing character value)

Check Answer
Answer: B

注解:字符串的大小写是有区别的,所以PILOT不等于Pilot,执行ELSE之后的表达式。

SAS Base (32)

Consider the following data step:

data WORK.NEW;
set WORK.OLD;
Count+1;
run;

The variable Count is created using a sum statement. Which statement regarding this variable is true?
A. It is assigned a value 0 when the data step begins execution.
B. It is assigned a value of missing when the data step begins execution.
C. It is assigned a value 0 at compile time.
D. It is assigned a value of missing at compile time.

Check Answer
Answer: C

注解:Count + 1为一个Sum Statement,等价于:
RETAIN Count 0;
Count = SUM(Count, 1);

RETAIN的作用是初始化一个变量,并且在每次进入新的DATA step时,保留该变量的值。值得注意的是,RETAIN statement在编译代码时执行,即在执行DATA step(读取数据)之前,所以RETAIN statement只执行一次。SUM function的作用是求合,并且忽略missing value。上面程序第一句的意思为初始化一个叫Count的变量,并且其初始值为0。第二行的作用是将Count的值增加1,与Count = Count + 1类似,但两者有一个重要的区别,SUM function会忽略missing value,而后一种表达式则不会。假设Count的值为missing,Sum(Count, 1)的结果为1,而Count + 1的结果则为missing,missing value加上一个值其结果还是missing。

SAS Base (31)

Given the following raw data records in DATAFILE.TXT:

----|----10---|----20---|----30
Kim,Basketball,Golf,Tennis
Bill,Football
Tracy,Soccer,Track

The following program is submitted:

data WORK.SPORTS_INFO;
length Fname Sport1-Sport3 $ 10;
infile ‘DATAFILE.TXT’ dlm=’,’;
input Fname $ Sport1 $ Sport2 $ Sport3 $;
run;

proc print data=WORK.SPORTS_INFO;
run;

Which output is correct based on the submitted program?

A.

Obs Fname Sport1 Sport2 Sport3
1 Kim Basketball Golf Tennis
2 Bill Football
3 Tracy Soccer Track

B.

Obs Fname Sport1 Sport2 Sport3
1 Kim Basketball Golf Tennis
2 Bill Football  Football  Football
3 Tracy Soccer Track  Track

C.

Obs Fname Sport1 Sport2 Sport3
1 Kim Basketball Golf Tennis
2 Bill Football Tracy Soccer

D.

Obs Fname Sport1 Sport2 Sport3
1 Kim Basketball Golf Tennis
2 Bill Football
Check Answer
Answer: C

注解:当一行数据中变量值的数量少于变量的数量时,SAS会去下一行接着读取数据。当所有变量都得到赋值之后,无论该行数据中是否还有未使用的变量值,SAS都会前往下一行开始新的DATA step,即开始读取新的观测值。所以在题目中,当SAS在执行第二次DATA step时,由于第二行数据只有2个值,SAS会去第三行寻找值并赋给Sport2和Sport3。当4个变量都得到赋值之后,SAS忽略第三行中余下的值Track。如果这时DATAFILE.TXT中还有第四行的话,SAS就会前往第四行开始读取第三个观测值。如果要得到A中的结果,在INFILE statement末尾加上MISSOVER即可。DLM和MISSOVER具体的含义请查看SAS Base (2)

SAS Base (30)

You’re attempting to read a raw data file and you see the following messages displayed in the SAS Log:

NOTE: Invalid data for Salary in line 4 15-23.
RULE:     ----|----10---|----20---|----30---|----40---|----50-
4         120104   F    46#30     11MAY1954 33
Employee_Id=120104 employee_gender=F Salary=. birth_date=-2061 _ERROR_=1 _N_=4
NOTE: 20 records were read from the infile ‘c:employees.dat’.
      The minimum record length was 33.
      The maximum record length was 33.
NOTE: The data set WORK.EMPLOYEES has 20 observations and 4 variables.

What does it mean?
A. A compiler error, triggered by an invalid character for the variable Salary.
B. An execution error, triggered by an invalid character for the variable Salary.
C. The 1st of potentially many errors, this one occurring on the 4th observation.
D. An error on the INPUT statement specification for reading the variable Salary.

Check Answer
Answer: B

注解:Compiler error是指SAS在编译代码时出错,例如代码写错了。Execution error是指代码没错,但是在执行代码是发生错误,例如试图将字符复制给数值型变量。本题中Salary是一个数值型变量,但46#30不是一个数值,所以出现错误。由于程序发生的是Execution error,数据依然会输出并建立data set,Salary的值为missing。例如一下程序:
data execution_error;
input Num1 Num2;
datalines;
1 2#3
3 4
;
run;
生成的data set为:

Obs Num1 Num2
1 1 .
2 3 4

如果是Compiler error,就不会建立新data set,例如以下例子,INPUT statement末尾漏了一个分号:
data compiler_error;
input Num1 Num2
datalines;
1 2
3 4
;
run;

具体的SAS错误类型请参考官方手册

SAS Base (29)

The following SAS program is sumbitted:

data WORK.INFO;
infile ‘DATAFILE.TXT’;
input @1 Company $20. @25 State $2. @;
if State=’ ‘ then input @30 Year;
else input @30 City Year;
input NumEmployees;
run;

How many raw data records are read during each iteration of the DATA step?

A. 1
B. 2
C. 3
D. 4

Check Answer
Answer: B

注解:问几行数据构成一个观测值,每次DATA step循环一共会执行3个INPUT statement,默认需要3行来完成一个观测值。但第一个INPUT statement末尾有一个@,意味着在执行第二个INPUT时不换行读取数据,每个观测值需要2行数据。@的具体解释详见SAS Base (2)

SAS Base (28)

Given the SAS data set WORK.P2000:

Location Pop2000
Alaska 626931
Delaware 783595
Vermont 608826
Wyoming 493782

and the SAS data set WORK.P2008:

State Pop2008
Alaska 686293
Delaware 873092
Wyoming 532668

The following output is desired:

Obs State Pop2000 Pop2008 Difference
1 Alaska 626931 686293 59362
2 Delaware 783595 873092 89497
3 Wyoming 493782 532668 38886

Which SAS program correctly combines the data?

A.
data compare;
merge WORK.P2000(in=_a Location=State)
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

B.
data compare;
merge WORK.P2000(rename=(Location=State))
WORK.P2008;
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

C.
data compare;
merge WORK.P2000(in=_a rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

D.
data compare;
merge WORK.P2000(in=_a) (rename=(Location=State))
WORK.P2008(in=_b);
by State;
if _a and _b;
Difference=Pop2008-Pop2000;
run;

Check Answer
Answer: C

注解:MERGE的作用是将来自于多个data set中的观测值合并为一个观测值(相当于SQL中的JOIN)。IN option用于生成一个布尔型变量,变量名为等号右边的变量(题目中分别为_a和_b),当Merge后的观测值有来自于当前data set的数据时,该变量的值为1,否则为0。比如题目中新data set的第一个观测值的State为Alaska,WORK.P2000和WORK.P2008中均有Alaska这个观测值,那么_a和_b在第一次DATA step循环中均为1。以下程序可以显示各个DATA step中_a和_b的值(由于_a和_b并不会被输出到新的data set中,需要把其值赋值给新的变量,以下程序中分别为a和b):
data compare;
merge P2000(in=_a rename=(Location=State))
P2008(in=_b);
by State;
a = _a;
b = _b;
Difference=Pop2008-Pop2000;
run;

Obs State Pop2000 Pop2008 a b Difference
1 Alaska 626931 686293 1 1 59362
2 Delaware 783595 873092 1 1 89497
3 Vermont 608826 . 1 0 .
4 Wyoming 493782 532668 1 1 38886

RENAME option的作用是重新对变量命名,格式为:RENAME = (旧变量名 = 新变量名)。

本题MERGE statement的语法格式,MERGE dataset1 dataset2 (option1 option2);

最后的IF statement则用于输出_a和_b都为1的那些观测值,即WORK.Pop2000和WORK.Pop2008共同拥有的观测值。