In the last post, we learnt how to import an external data set in SAS. In this post we are going to see a simple analysis of Diamonds data set. The source of this data set is ggplot2 package in R. Given below is a description of the variables(http://www.inside-r.org/packages/cran/ggplot2/docs/diamonds) in the data set :
- price : price in US dollars
- carat : weight of the diamond
- cut : quality of the cut
- color : diamond color, from J (worst) to D (best)
- clarity : a measurement of how clear the diamond is (I1 (worst), SI1, SI2, VS1, VS2, VVS1, VVS2, IF (best))
- x : length in mm
- y : width in mm
- z : depth in mm
- depth : total depth percentage = z / mean(x, y) = 2 * z / (x + y)
- table : width of top of diamond relative to widest point
Let’s get started,
First, we get the diamonds data set in the work library for temporary use. Hence, we begin the program by creating a temporary diamonds data set from permanent data set in mine library.
Looks good. Here, in the next part of the program we introduce PROC MEANS to do some simple data analysis. By default Means Procedure gives number of observations, mean , standard deviation, minimum and maximum for each Numeric variable. However, if we want Median, Mode and Range, we can add them as options to Means Procedure. Notice that when we specify options to Means Procedure, we do not get the default options.
PROC MEANS ignores the character variables. We introduce PROC FREQ here to get summary of frequencies and proportion of components in character variables.
In the next post we will learn some easy data visualization skills in SAS.