SAS Base (31)

Given the following raw data records in DATAFILE.TXT:

----|----10---|----20---|----30
Kim,Basketball,Golf,Tennis
Bill,Football
Tracy,Soccer,Track

The following program is submitted:

data WORK.SPORTS_INFO;
length Fname Sport1-Sport3 $ 10;
infile ‘DATAFILE.TXT’ dlm=’,’;
input Fname $ Sport1 $ Sport2 $ Sport3 $;
run;

proc print data=WORK.SPORTS_INFO;
run;

Which output is correct based on the submitted program?

A.

Obs Fname Sport1 Sport2 Sport3
1 Kim Basketball Golf Tennis
2 Bill Football
3 Tracy Soccer Track

B.

Obs Fname Sport1 Sport2 Sport3
1 Kim Basketball Golf Tennis
2 Bill Football  Football  Football
3 Tracy Soccer Track  Track

C.

Obs Fname Sport1 Sport2 Sport3
1 Kim Basketball Golf Tennis
2 Bill Football Tracy Soccer

D.

Obs Fname Sport1 Sport2 Sport3
1 Kim Basketball Golf Tennis
2 Bill Football
Check Answer
Answer: C

注解:当一行数据中变量值的数量少于变量的数量时,SAS会去下一行接着读取数据。当所有变量都得到赋值之后,无论该行数据中是否还有未使用的变量值,SAS都会前往下一行开始新的DATA step,即开始读取新的观测值。所以在题目中,当SAS在执行第二次DATA step时,由于第二行数据只有2个值,SAS会去第三行寻找值并赋给Sport2和Sport3。当4个变量都得到赋值之后,SAS忽略第三行中余下的值Track。如果这时DATAFILE.TXT中还有第四行的话,SAS就会前往第四行开始读取第三个观测值。如果要得到A中的结果,在INFILE statement末尾加上MISSOVER即可。DLM和MISSOVER具体的含义请查看SAS Base (2)

SAS Base (24)

Given the following raw data records:

----|----10---|----20
Susan*12/29/1970*10
Michael**6

The following output is desired:

Obs employee bdate years
1 Susan 4015 10
2 Michael . 6

Which SAS program correctly reads in the raw data?
A.
data employees;
infile ‘file specification’ dlm=’*’;
input employee $ bdate : mmddyy10. years;
run;

B.
data employees;
infile ‘file specification’ dsd=’*’;
input employee $ bdate mmddyy10. years;
run;

C.
data employees;
infile ‘file specification’ dlm dsd;
input employee $ bdate mmddyy10. years;
run;

D.
data employees;
infile ‘file specification’ dlm=’*’ dsd;
input employee $ bdate : mmddyy10. years;
run;

Check Answer
Answer: D

注解:考点是DLM和DSD连用,具体参考SAS Base (2)

SAS Base (2)

Given the following raw data records in TEXTFILE.TXT:

----|----10---|----20---|----30
John,FEB,13,25,14,27,Final
John,MAR,26,17,29,11,23,Current
Tina,FEB,15,18,12,13,Final
Tina,MAR,29,14,19,27,20,Current

The following output is desired:

Obs Name Month Status Week1 Week2 Week3 Week4 Week5
1 John FEB Final $13 $25 $14 $27 .
2 John MAR Current $26 $17 $29 $11 $23
3 Tina FEB Final $15 $18 $12 $13 .
4 Tina MAR Current $29 $14 $19 $27 $20

Which SAS program correctly produces the desired output?

A.
data WORK.NUMBERS;

length Name $ 4 Month $ 3 Status $ 7;

infile ‘TEXTFILE.TXT’ dsd;

input Name $ Month $;

if Month=’FEB’ then input Week1 Week2 Week3 Week4 Status $;

else if Month=’MAR’ then input Week1 Week2 Week3 Week4 Week5 Status $;

format Week1-Week5 dollar6.;

run;

proc print data=WORK.NUMBERS;

run;

B.

data WORK.NUMBERS;

length Name $ 4 Month $ 3 Status $ 7;

infile ‘TEXTFILE.TXT’ dlm=’,’ missover;

input Name $ Month $;

if Month=’FEB’ then input Week1 Week2 Week3 Week4 Status $;

else if Month=’MAR’ then input Week1 Week2 Week3 Week4 Week5 Status $;

format Week1-Week5 dollar6.;

run;

proc print data=WORK.NUMBERS;

run;

C.

data WORK.NUMBERS;

length Name $ 4 Month $ 3 Status $ 7;

infile ‘TEXTFILE.TXT’ dlm=’,’;

input Name $ Month $ @;

if Month=’FEB’ then input Week1 Week2 Week3 Week4 Status $;

else if Month=’MAR’ then input Week1 Week2 Week3 Week4 Week5 Status $;

format Week1-Week5 dollar6.;

run;

proc print data=WORK.NUMBERS;

run;

D.

data WORK.NUMBERS;

length Name $ 4 Month $ 3 Status $ 7;

infile ‘TEXTFILE.TXT’ dsd @;

input Name $ Month $;

if Month=’FEB’ then input Week1 Week2 Week3 Week4 Status $;

else if Month=’MAR’ then input Week1 Week2 Week3 Week4 Week5 Status $;

format Week1-Week5 dollar6.;

run;

proc print data=WORK.NUMBERS;

run;

Check Answer
Answer: C

注解:

DSD:默认“,”为分隔符,将2个连续的分隔符视为一个missing value。比如将数据“a,b,,d”视为:’a’ ‘b’ missing value ‘d’

DLM:等价于DELIMITER,用于替换默认分隔符(空格)。比如DLM=’*’,将分隔符由空格替换成‘*’

MISSOVER:如果一行数据中的数据个数少于需要定义的变量数量,MISSOVER将防止SAS去下一行寻找数据,并将多出来的变量的值设为missing。比如,一行数据中仅有3个数据“a,b,c”,但INPUT中定义了4个变量(variable1-variable4)。如果没有MISSOVER,SAS会去新的一行寻找数据并为variable4赋值。加上MISSOVER,SAS就不会去下一行,而是将variable4的值设为missing。

@:默认情况下,每出现一次INPUT,SAS都会去新的一行读取数据,而@的作用是让SAS继续在当前行读取数据。比如这个例子:
data d1;
input v1 $ v2 $ @;
input v3 $ v4 $;
datalines;
a b c d
e f g h
;
run;
有@的输出为:

Obs v1 v2 v3 v4
1 a b c d
2 e f g h

去掉@则为:

Obs v1 v2 v3 v4
1 a b e f

与@类似的还有@@。区别在于,在同一个DATA步骤中阻止换行用@,而在不同的DATA步骤中则用@@。何为同一DATA步骤?上面这个例子中,声明了4个变量,那么定义一遍v1, v2, v3, v4为一个DATA步骤。下面举一个使用@@的例子:
data d2;
input v1 $ v2 $ @@;
datalines;
a b c d
;
run;
这个例子中定义一遍v1, v2为一个DATA步骤,@@能够阻止SAS在下一个DATA步骤中去新的一行读取数据。
有@@输出的结果为:

Obs v1 v2
1 a b
2 c d

不使用@或仅使用一个@的输出结果为:

Obs v1 v2
1 a b