Before practicing data visualization skills, we must see how the variables relate to each other. Correlation is one of the first steps to understand the relationship between variables. To compute correlation in SAS, we use PROC CORR:
Corr Procedure calculates pairwise correlation for Numeric variables. This procedure also provides some summary statistics by default – Mean, Standard Deviation, Sum, Minimum and Maximum. In the following results we can see that carat i.e. weight of the diamond is highly correlated with x, y and z i.e. length, width and depth respectively. x, y and z together represent the size of the diamond. It is very certain for the size of a diamond to be highly correlated with its weight. Therefore, we can say that carat can be a good representative of x, y and z. Price and carat are also highly correlated.
In the following scatter plot we see how price is related to carat for each type of cut. There is a clear distinction between prices of diamonds with very good and fair cuts but similar carat.
In the following scatter plots we see how price is related to carat for each type of Color. Notice that diamonds at data points for D, E and F (with turquoise,purple, violet color) are highly priced as compared to diamonds at data points for H, I and J(with green, brown, maroon color) with similar carat.
In the following scatter plots we see how price is related to carat for different degrees of Clarity. We can see that the diamonds with I1 type clarity are relatively low priced than other diamonds of similar carat.
In the next post we will see how different features(cut, clarity and color) of diamonds are related with each other. Our aim will be to see whether these features are uniformly spread or there exists a pattern among them. We will also see how depth and table are related to the price of diamonds.