Practice 1.2 – Python Pandas Cookbook by Alfred Essa

import pandas as pd
import datetime as dt
#creating list containing dates from 9-01 to 9-10
start  = dt.datetime(2013,9,1)
end = dt.datetime(2013,9,11)
step = dt.timedelta(days = 1)
dates = []
#populate the list
while start < end:
    dates.append(start.strftime('%m-%d'))
    start += step
dates
[’09-01′,
’09-02′,
’09-03′,
’09-04′,
’09-05′,
’09-06′,
’09-07′,
’09-08′,
’09-09′,
’09-10′]
d = {'Date' : dates, 'Tokyo':[3,4,5,4,6,3,32,2,3,13], 'Paris':[45,2,4,5,46,4,7,85,12,9], 'Mumbai':[23,32,12,45,3,6,7,8,1,9]} 
d
{‘Date’: [’09-01′,
’09-02′,
’09-03′,
’09-04′,
’09-05′,
’09-06′,
’09-07′,
’09-08′,
’09-09′,
’09-10′],
‘Mumbai’: [23, 32, 12, 45, 3, 6, 7, 8, 1, 9],
‘Paris’: [45, 2, 4, 5, 46, 4, 7, 85, 12, 9],
‘Tokyo’: [3, 4, 5, 4, 6, 3, 32, 2, 3, 13]}
Creating dataframe using dictionary with equal length of lists
temp = pd.DataFrame(d)
temp
Date Mumbai Paris Tokyo
0 09-01 23 45 3
1 09-02 32 2 4
2 09-03 12 4 5
3 09-04 45 5 4
4 09-05 3 46 6
5 09-06 6 4 3
6 09-07 7 7 32
7 09-08 8 85 2
8 09-09 1 12 3
9 09-10 9 9 13
temp['Tokyo']
0     3
1     4
2     5
3     4
4     6
5     3
6    32
7     2
8     3
9    13
Name: Tokyo, dtype: int64
temp = temp.set_index('Date')
temp
Mumbai Paris Tokyo
Date
09-01 23 45 3
09-02 32 2 4
09-03 12 4 5
09-04 45 5 4
09-05 3 46 6
09-06 6 4 3
09-07 7 7 32
09-08 8 85 2
09-09 1 12 3
09-10 9 9 13
import os as os
os.getcwd()
'C:\\Anaconda'
tb = pd.read_csv('C:/Anaconda/TB_outcomes.csv')
tb.head()
country iso2 iso3 iso_numeric g_whoregion year rep_meth new_sp_coh new_sp_cur new_sp_cmplt mdr_coh mdr_succ mdr_fail mdr_died mdr_lost xdr_coh xdr_succ xdr_fail xdr_died xdr_lost
0 Afghanistan AF AFG 4 EMR 1994 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Afghanistan AF AFG 4 EMR 1995 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Afghanistan AF AFG 4 EMR 1996 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Afghanistan AF AFG 4 EMR 1997 100 2001 786 108 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 Afghanistan AF AFG 4 EMR 1998 100 2913 772 199 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 72 columns

tb.tail()
country iso2 iso3 iso_numeric g_whoregion year rep_meth new_sp_coh new_sp_cur new_sp_cmplt mdr_coh mdr_succ mdr_fail mdr_died mdr_lost xdr_coh xdr_succ xdr_fail xdr_died xdr_lost
4052 Zimbabwe ZW ZWE 716 AFR 2008 100 10370 6973 734 0 NaN NaN NaN NaN 0 NaN NaN NaN NaN
4053 Zimbabwe ZW ZWE 716 AFR 2009 100 10195 7131 868 1 1 0 0 0 0 0 0 0 0
4054 Zimbabwe ZW ZWE 716 AFR 2010 100 11654 8377 1116 6 4 0 2 0 0 0 0 0 0
4055 Zimbabwe ZW ZWE 716 AFR 2011 NaN 12596 9208 995 70 57 0 9 2 0 0 0 0 0
4056 Zimbabwe ZW ZWE 716 AFR 2012 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 72 columns

To get unique values

tb['country'].unique()
array(['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra',
       'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia',
       'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
       'Bermuda', 'Bhutan', 'Bolivia (Plurinational State of)',
       'Bonaire, Saint Eustatius and Saba', 'Bosnia and Herzegovina',
       'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei Darussalam',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cambodia',
       'Cameroon', 'Canada', 'Cayman Islands', 'Central African Republic',
       'Chad', 'Chile', 'China', 'China, Hong Kong SAR',
       'China, Macao SAR', 'Colombia', 'Comoros', 'Congo', 'Cook Islands',
       'Costa Rica', "C\xc3\xb4te d'Ivoire", 'Croatia', 'Cuba',
       'Cura\xc3\xa7ao', 'Cyprus', 'Czech Republic',
       "Democratic People's Republic of Korea",
       'Democratic Republic of the Congo', 'Denmark', 'Djibouti',
       'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
       'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Fiji',
       'Finland', 'France', 'French Polynesia', 'Gabon', 'Gambia',
       'Georgia', 'Germany', 'Ghana', 'Greece', 'Greenland', 'Grenada',
       'Guam', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti',
       'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia',
       'Iran (Islamic Republic of)', 'Iraq', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati',
       'Kuwait', 'Kyrgyzstan', "Lao People's Democratic Republic",
       'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Lithuania',
       'Luxembourg', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives',
       'Mali', 'Malta', 'Marshall Islands', 'Mauritania', 'Mauritius',
       'Mexico', 'Micronesia (Federated States of)', 'Monaco', 'Mongolia',
       'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar',
       'Namibia', 'Nauru', 'Nepal', 'Netherlands Antilles', 'Netherlands',
       'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria',
       'Niue', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan',
       'Palau', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru',
       'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar',
       'Republic of Korea', 'Republic of Moldova', 'Romania',
       'Russian Federation', 'Rwanda', 'Saint Kitts and Nevis',
       'Saint Lucia', 'Saint Vincent and the Grenadines', 'Samoa',
       'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal',
       'Serbia & Montenegro', 'Serbia', 'Seychelles', 'Sierra Leone',
       'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia',
       'Solomon Islands', 'Somalia', 'South Africa', 'South Sudan',
       'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Swaziland', 'Sweden',
       'Switzerland', 'Syrian Arab Republic', 'Tajikistan', 'Thailand',
       'The Former Yugoslav Republic of Macedonia', 'Timor-Leste', 'Togo',
       'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey',
       'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda',
       'Ukraine', 'United Arab Emirates',
       'United Kingdom of Great Britain and Northern Ireland',
       'United Republic of Tanzania', 'United States of America',
       'Uruguay', 'US Virgin Islands', 'Uzbekistan', 'Vanuatu',
       'Venezuela (Bolivarian Republic of)', 'Viet Nam',
       'Wallis and Futuna Islands', 'West Bank and Gaza Strip', 'Yemen',
       'Zambia', 'Zimbabwe'], dtype=object)

Counting number of Unique values

tb.country.value_counts() 
Botswana                            19
Bolivia (Plurinational State of)    19
Greenland                           19
Armenia                             19
China                               19
Togo                                19
Mongolia                            19
Saint Kitts and Nevis               19
Cuba                                19
Benin                               19
Cook Islands                        19
Malawi                              19
Norway                              19
Nauru                               19
Solomon Islands                     19
...
US Virgin Islands                    19
China, Hong Kong SAR                 19
Denmark                              19
Philippines                          19
Canada                               19
China, Macao SAR                     19
Netherlands Antilles                 15
Timor-Leste                          11
Serbia & Montenegro                  10
Montenegro                            8
Serbia                                8
Bonaire, Saint Eustatius and Saba     4
Sint Maarten (Dutch part)             4
Curaçao                               4
South Sudan                           3
Length: 219, dtype: int64
tb.describe()
iso_numeric year rep_meth new_sp_coh new_sp_cur new_sp_cmplt new_sp_died new_sp_fail new_sp_def c_new_sp_tsr mdr_coh mdr_succ mdr_fail mdr_died mdr_lost xdr_coh xdr_succ xdr_fail xdr_died xdr_lost
count 4057.000000 4057.000000 3037.000000 3053.000000 2944.000000 2943.00000 2993.000000 2876.000000 2955.000000 3004.000000 1050.000000 1017.000000 959.000000 1000.000000 987.000000 562.000000 525.000000 524.000000 525.000000 524.000000
mean 433.592310 2003.042149 100.271320 10867.512611 7897.903533 963.62827 430.973939 184.123088 613.043655 75.767643 139.985714 71.208456 14.385819 22.544000 22.217832 6.181495 1.390476 0.837786 2.230476 0.776718
std 254.908076 5.485677 0.647391 45621.976594 37520.862855 3325.39556 1615.996031 812.662201 2386.874910 16.305073 726.653931 342.387797 106.821966 138.383012 113.607426 48.815990 9.570645 5.019886 20.085652 5.790293
min 4.000000 1994.000000 100.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 212.000000 1998.000000 100.000000 124.000000 66.750000 13.00000 7.000000 0.000000 4.000000 69.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 430.000000 2003.000000 100.000000 1229.000000 721.500000 124.00000 60.000000 15.000000 90.000000 79.000000 6.000000 3.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 646.000000 2008.000000 100.000000 5366.000000 3401.500000 580.50000 257.000000 99.000000 393.000000 87.000000 43.000000 24.000000 1.000000 6.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000
max 894.000000 2012.000000 102.000000 642321.000000 544731.000000 64938.00000 27005.000000 12505.000000 35469.000000 100.000000 15896.000000 5895.000000 2916.000000 3037.000000 2344.000000 751.000000 116.000000 64.000000 305.000000 94.000000

8 rows × 68 columns

 

Practice 1.1 – Python Pandas Cookbook by Alfred Essa

Following Alfred Essa’s Python Pandas Cookbook on YouTube
Different Ways to Construct Series
import pandas as pd
import numpy as np
Using Series Constructor
s1 = pd.Series([463,3,-728,236,32,-773])
s1
0    463
1      3
2   -728
3    236
4     32
5   -773
dtype: int64
type(s1)
pandas.core.series.Series
s1.values
array([ 463,    3, -728,  236,   32, -773], dtype=int64)
type(s1.values)
numpy.ndarray
s1.index
Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')
s1[3]
236

Defining data and index

data1 = [3.5,5,343,9.3,23]
index1 = ['Mon','Tue','Wed','Thur','Fri']

Creating Series

s2 = pd.Series(data1, index = index1)
s2
Mon       3.5
Tue       5.0
Wed     343.0
Thur      9.3
Fri      23.0
dtype: float64
s2[4]
23.0
s2.index
Index([u'Mon', u'Tue', u'Wed', u'Thur', u'Fri'], dtype='object')
s2.name = 'Daily numbers'
s2.index.name = 'Working days'
s2
Working days
Mon               3.5
Tue               5.0
Wed             343.0
Thur              9.3
Fri              23.0
Name: Daily numbers, dtype: float64
Creating Dictionary
dict1 = {'Jan': -7,'Feb': 2,'March': 12,'April': -9,'May': 3,'June': 4}
s3 = pd.Series(dict1)
s3
April    -9
Feb       2
Jan      -7
June      4
March    12
May       3
dtype: int64
Vectorized Operations
s3 * 2
April   -18
Feb       4
Jan     -14
June      8
March    24
May       6
dtype: int64
np.log(s3)
April         NaN
Feb      0.693147
Jan           NaN
June     1.386294
March    2.484907
May      1.098612
dtype: float64
Slicing
s3['Feb':'May']
Feb       2
Jan      -7
June      4
March    12
May       3
dtype: int64
s3[3:5]
June      4
March    12
dtype: int64
Offset value
s3[3] = 54
s3
April    -9
Feb       2
Jan      -7
June     54
March    12
May       3
dtype: int64
s3.median()
2.5
s3.min()
-9
s3.max()
54
s3.cumsum()
April    -9
Feb      -7
Jan     -14
June     40
March    52
May      55
dtype: int64

Making Looping Clearer – enumerate() returns iterators

for i, v in enumerate(s3):
    print i,v
0 -9
1 2
2 -7
3 54
4 12
5 3
new_s3 = [x**2 for x in s3]
new_s3
[81, 4, 49, 2916, 144, 9]
Series using dictionary
s3['Feb']
2
'Feb' in s3
True
Assignment using key
s3['May'] = 45.8
s3
April    -9
Feb       2
Jan      -7
June     54
March    12
May      45
dtype: int64

Looping over dictionary keys and values

for k,v in s3.iteritems():
    print k,v
April -9
Feb 2
Jan -7
June 54
March 12
May 45

Pycurl and Pandas – Get csv file and explore it

Importing libraries

import pandas as pd
import os as os
import pycurl
import csv

To get the location of current working directory

os.getcwd()

‘C:\\Anaconda’ To change the working directory

os.chdir('C:\\Anaconda\\abalone')
os.getcwd()

‘C:\\Anaconda\\abalone’ Use pycurl to get a datafile from https and write it to a csv file

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
c = pycurl.Curl()
c.setopt(c.URL, url)
with open('abalone.csv', 'w+') as s:
    c.setopt(c.WRITEFUNCTION, s.write)
    c.perform()

To read csv file into abalone object

abalone = pd.read_csv('abalone.csv')
abalone
M 0.455 0.365 0.095 0.514 0.2245 0.101 0.15 15
0 M 0.350 0.265 0.090 0.2255 0.0995 0.0485 0.0700 7
1 F 0.530 0.420 0.135 0.6770 0.2565 0.1415 0.2100 9
2 M 0.440 0.365 0.125 0.5160 0.2155 0.1140 0.1550 10
3 I 0.330 0.255 0.080 0.2050 0.0895 0.0395 0.0550 7
4 I 0.425 0.300 0.095 0.3515 0.1410 0.0775 0.1200 8
5 F 0.530 0.415 0.150 0.7775 0.2370 0.1415 0.3300 20
6 F 0.545 0.425 0.125 0.7680 0.2940 0.1495 0.2600 16
7 M 0.475 0.370 0.125 0.5095 0.2165 0.1125 0.1650 9
8 F 0.550 0.440 0.150 0.8945 0.3145 0.1510 0.3200 19
9 F 0.525 0.380 0.140 0.6065 0.1940 0.1475 0.2100 14
10 M 0.430 0.350 0.110 0.4060 0.1675 0.0810 0.1350 10
11 M 0.490 0.380 0.135 0.5415 0.2175 0.0950 0.1900 11
12 F 0.535 0.405 0.145 0.6845 0.2725 0.1710 0.2050 10
13 F 0.470 0.355 0.100 0.4755 0.1675 0.0805 0.1850 10
14 M 0.500 0.400 0.130 0.6645 0.2580 0.1330 0.2400 12
15 I 0.355 0.280 0.085 0.2905 0.0950 0.0395 0.1150 7
16 F 0.440 0.340 0.100 0.4510 0.1880 0.0870 0.1300 10
17 M 0.365 0.295 0.080 0.2555 0.0970 0.0430 0.1000 7
18 M 0.450 0.320 0.100 0.3810 0.1705 0.0750 0.1150 9
19 M 0.355 0.280 0.095 0.2455 0.0955 0.0620 0.0750 11
20 I 0.380 0.275 0.100 0.2255 0.0800 0.0490 0.0850 10
21 F 0.565 0.440 0.155 0.9395 0.4275 0.2140 0.2700 12
22 F 0.550 0.415 0.135 0.7635 0.3180 0.2100 0.2000 9
23 F 0.615 0.480 0.165 1.1615 0.5130 0.3010 0.3050 10
24 F 0.560 0.440 0.140 0.9285 0.3825 0.1880 0.3000 11
25 F 0.580 0.450 0.185 0.9955 0.3945 0.2720 0.2850 11
26 M 0.590 0.445 0.140 0.9310 0.3560 0.2340 0.2800 12
27 M 0.605 0.475 0.180 0.9365 0.3940 0.2190 0.2950 15
28 M 0.575 0.425 0.140 0.8635 0.3930 0.2270 0.2000 11
29 M 0.580 0.470 0.165 0.9975 0.3935 0.2420 0.3300 10
4146 M 0.695 0.550 0.195 1.6645 0.7270 0.3600 0.4450 11
4147 M 0.770 0.605 0.175 2.0505 0.8005 0.5260 0.3550 11
4148 I 0.280 0.215 0.070 0.1240 0.0630 0.0215 0.0300 6
4149 I 0.330 0.230 0.080 0.1400 0.0565 0.0365 0.0460 7
4150 I 0.350 0.250 0.075 0.1695 0.0835 0.0355 0.0410 6
4151 I 0.370 0.280 0.090 0.2180 0.0995 0.0545 0.0615 7
4152 I 0.430 0.315 0.115 0.3840 0.1885 0.0715 0.1100 8
4153 I 0.435 0.330 0.095 0.3930 0.2190 0.0750 0.0885 6
4154 I 0.440 0.350 0.110 0.3805 0.1575 0.0895 0.1150 6
4155 M 0.475 0.370 0.110 0.4895 0.2185 0.1070 0.1460 8
4156 M 0.475 0.360 0.140 0.5135 0.2410 0.1045 0.1550 8
4157 I 0.480 0.355 0.110 0.4495 0.2010 0.0890 0.1400 8
4158 F 0.560 0.440 0.135 0.8025 0.3500 0.1615 0.2590 9
4159 F 0.585 0.475 0.165 1.0530 0.4580 0.2170 0.3000 11
4160 F 0.585 0.455 0.170 0.9945 0.4255 0.2630 0.2845 11
4161 M 0.385 0.255 0.100 0.3175 0.1370 0.0680 0.0920 8
4162 I 0.390 0.310 0.085 0.3440 0.1810 0.0695 0.0790 7
4163 I 0.390 0.290 0.100 0.2845 0.1255 0.0635 0.0810 7
4164 I 0.405 0.300 0.085 0.3035 0.1500 0.0505 0.0880 7
4165 I 0.475 0.365 0.115 0.4990 0.2320 0.0885 0.1560 10
4166 M 0.500 0.380 0.125 0.5770 0.2690 0.1265 0.1535 9
4167 F 0.515 0.400 0.125 0.6150 0.2865 0.1230 0.1765 8
4168 M 0.520 0.385 0.165 0.7910 0.3750 0.1800 0.1815 10
4169 M 0.550 0.430 0.130 0.8395 0.3155 0.1955 0.2405 10
4170 M 0.560 0.430 0.155 0.8675 0.4000 0.1720 0.2290 8
4171 F 0.565 0.450 0.165 0.8870 0.3700 0.2390 0.2490 11
4172 M 0.590 0.440 0.135 0.9660 0.4390 0.2145 0.2605 10
4173 M 0.600 0.475 0.205 1.1760 0.5255 0.2875 0.3080 9
4174 F 0.625 0.485 0.150 1.0945 0.5310 0.2610 0.2960 10
4175 M 0.710 0.555 0.195 1.9485 0.9455 0.3765 0.4950 12
To add column names
abalone.columns = ['Sex', 'Length','Diameter','Height','Whole weight','Shucked weight','Viscera weight','Shell weight','Rings']
 To write data to a csv file
abalone.to_csv('abalone.csv')
 To get 4 top-most observations
abalone.head(4)
Sex Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
0 M 0.35 0.265 0.090 0.2255 0.0995 0.0485 0.070 7
1 F 0.53 0.420 0.135 0.6770 0.2565 0.1415 0.210 9
2 M 0.44 0.365 0.125 0.5160 0.2155 0.1140 0.155 10
3 I 0.33 0.255 0.080 0.2050 0.0895 0.0395 0.055 7

To get 4 bottom-most observations

abalone.tail(4)
Sex Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
4172 M 0.590 0.440 0.135 0.9660 0.4390 0.2145 0.2605 10
4173 M 0.600 0.475 0.205 1.1760 0.5255 0.2875 0.3080 9
4174 F 0.625 0.485 0.150 1.0945 0.5310 0.2610 0.2960 10
4175 M 0.710 0.555 0.195 1.9485 0.9455 0.3765 0.4950 12
 To get basic statistics for all numeric variables
abalone.describe()
Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
count 4176.000000 4176.000000 4176.000000 4176.000000 4176.00000 4176.000000 4176.000000 4176.000000
mean 0.524009 0.407892 0.139527 0.828818 0.35940 0.180613 0.238852 9.932471
std 0.120103 0.099250 0.041826 0.490424 0.22198 0.109620 0.139213 3.223601
min 0.075000 0.055000 0.000000 0.002000 0.00100 0.000500 0.001500 1.000000
25% 0.450000 0.350000 0.115000 0.441500 0.18600 0.093375 0.130000 8.000000
50% 0.545000 0.425000 0.140000 0.799750 0.33600 0.171000 0.234000 9.000000
75% 0.615000 0.480000 0.165000 1.153250 0.50200 0.253000 0.329000 11.000000
max 0.815000 0.650000 1.130000 2.825500 1.48800 0.760000 1.005000 29.000000
To get covariance
abalone.cov()
Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
Length 0.014425 0.011763 0.004157 0.054499 0.023938 0.011889 0.015009 0.215697
Diameter 0.011763 0.009850 0.003461 0.045046 0.019678 0.009789 0.012509 0.183968
Height 0.004157 0.003461 0.001749 0.016804 0.007195 0.003660 0.004759 0.075251
Whole weight 0.054499 0.045046 0.016804 0.240515 0.105533 0.051953 0.065225 0.854995
Shucked weight 0.023938 0.019678 0.007195 0.105533 0.049275 0.022678 0.027275 0.301440
Viscera weight 0.011889 0.009789 0.003660 0.051953 0.022678 0.012017 0.013851 0.178196
Shell weight 0.015009 0.012509 0.004759 0.065225 0.027275 0.013851 0.019380 0.281839
Rings 0.215697 0.183968 0.075251 0.854995 0.301440 0.178196 0.281839 10.391606
 To get pairwise-correlation coefficients for all numeric variables
abalone.corr()
Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
Length 1.000000 0.986813 0.827552 0.925255 0.897905 0.903010 0.897697 0.557123
Diameter 0.986813 1.000000 0.833705 0.925452 0.893159 0.899726 0.905328 0.575005
Height 0.827552 0.833705 1.000000 0.819209 0.774957 0.798293 0.817326 0.558109
Whole weight 0.925255 0.925452 0.819209 1.000000 0.969403 0.966372 0.955351 0.540818
Shucked weight 0.897905 0.893159 0.774957 0.969403 1.000000 0.931956 0.882606 0.421256
Viscera weight 0.903010 0.899726 0.798293 0.966372 0.931956 1.000000 0.907647 0.504274
Shell weight 0.897697 0.905328 0.817326 0.955351 0.882606 0.907647 1.000000 0.628031
Rings 0.557123 0.575005 0.558109 0.540818 0.421256 0.504274 0.628031 1.000000
 To get unique values of ‘Rings’ column
abalone['Rings'].unique()
array([ 7, 9, 10, 8, 20, 16, 19, 14, 11, 12, 15, 18, 13, 5, 4, 6, 21, 17, 22, 1, 3, 26, 23, 29, 2, 27, 25, 24], dtype=int64)
To subset – have only ‘Length’,’Diameter’ and ‘Height’ in data set abalone1
abalone1 = abalone[['Length','Diameter','Height']]
Inspect abalone1 by checking head and tail
abalone1.head(3)
Length Diameter Height
0 0.35 0.265 0.090
1 0.53 0.420 0.135
2 0.44 0.365 0.125
abalone1.tail(3)
Length Diameter Height
4173 0.600 0.475 0.205
4174 0.625 0.485 0.150
4175 0.710 0.555 0.195

For code : http://nbviewer.ipython.org/gist/sunakshi132/4791b6838e7bf3fde38b