Character Functions in SAS – I

EXTRACTING TEXT

SCAN Function – is used to separate character values into words. There are several delimiters that separate each word in a character value. SAS recognizes some default delimiters : blank . < ( + | & $ * ) ; / – , %

General form –

SCAN ( argument, n, <delimiters>)

where, argument = character variable,

n = position of word,

delimiter = delimiter specified in single quotes, if no delimiter is specified default delimiters are assumed.

Specifying LENGTH of Target variable – By default, Length of 200 to each target variable. This uses much more space than necessary. Hence, we must use length statement to specify the length of the target variable. Length should be specified before any character function is run as SAS sets the length of a new variable the first time it is encountered.

scan

scan_result

 

 

SUBSTR Function – is similar to SCAN Function. SUBSTR is more suitable when the exact position of the words to be extracted is known. For example – it is best for extraction from ID.

General form –

SUBSTR ( argument, position, <n>)

where, argument = character variable,

position = position of character to start from,

<n> = number of characters to extract, if n is not specified all characters are included

substr

REPLACING TEXT

Using SUBSTR Function

To replace initial values in all values of a variable

substr_replace_all

Using SUBSTR and IF-THEN Statement

To replace certain character value with some other value

Using TRANWRD Function

TRANWRD Function –   is used to replace characters with a string.

TRANWRD ( source, ‘target’, ‘replacement’)

where, source = variable/expression ,

target = substring to be replaced in quotation marks,

replacement = substring that replaces target in quotation marks.

tranwrd

CONCATENATION

Using || Operator

trim

Using CATX Function

  CATX Function – is used to concatenate values of variables

catx

SEARCHING FOR SPECIFIC SUBSTRING

INDEX Function – is used to look for occurrences of a specific string value. It returns 0 if string is not found.

General form –

INDEX ( source, ‘substring’)

where, source = variable/expression ,

substring = substring to be searched in quotation marks

index

 

Note: Index Function is case sensitive. To overcome the problem of case sensitive, use UPCASE or LOWCASE within Index function. Such as in the above example :

index(upcase(keywords),’MEDICAL PATIENT’)  or

index(lowcase(keywords),’medical patient’)

FIND Function – is used to search for specific substring values within a specified character string. FIND function is similar to INDEX function.

General form –

FIND ( source, ‘substring’, modifiers,startpos)

where, source = variable/expression ,

substring = substring to be searched in quotation marks,

modifiers = i to ignore the case of source, t to trim the length,

startpos = starting position. If not mentioned beginning of the source is assumed as starting position. If startpos is positive, FIND searched from startpos to right and vice versa.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s