Review Tuples !

my_list = [2,3]
my_tuple = (4,5)
other_tuple = 6,7
my_list[1] = 10
my_list
[2, 10]
my_tuple[1] = 11 #see that tuple cannot be modified
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-ce8c7e54784a> in <module>()
----> 1 my_tuple[1] = 11

TypeError: 'tuple' object does not support item assignment
def sum_product(x, y):
    return (x + y),(x * y)
sp = sum_product(11,12)
sp
(23, 132)
s, p = sum_product(11,12)
s
23
p
132
x,y = 1,2 # multiple assignment is possible both in lists and tuples
x,y = 3,4

Studying Data Science from Scratch by Joel Grus.

Advertisements

Practice 1.2 – Python Pandas Cookbook by Alfred Essa

import pandas as pd
import datetime as dt
#creating list containing dates from 9-01 to 9-10
start  = dt.datetime(2013,9,1)
end = dt.datetime(2013,9,11)
step = dt.timedelta(days = 1)
dates = []
#populate the list
while start < end:
    dates.append(start.strftime('%m-%d'))
    start += step
dates
[’09-01′,
’09-02′,
’09-03′,
’09-04′,
’09-05′,
’09-06′,
’09-07′,
’09-08′,
’09-09′,
’09-10′]
d = {'Date' : dates, 'Tokyo':[3,4,5,4,6,3,32,2,3,13], 'Paris':[45,2,4,5,46,4,7,85,12,9], 'Mumbai':[23,32,12,45,3,6,7,8,1,9]} 
d
{‘Date’: [’09-01′,
’09-02′,
’09-03′,
’09-04′,
’09-05′,
’09-06′,
’09-07′,
’09-08′,
’09-09′,
’09-10′],
‘Mumbai’: [23, 32, 12, 45, 3, 6, 7, 8, 1, 9],
‘Paris’: [45, 2, 4, 5, 46, 4, 7, 85, 12, 9],
‘Tokyo’: [3, 4, 5, 4, 6, 3, 32, 2, 3, 13]}
Creating dataframe using dictionary with equal length of lists
temp = pd.DataFrame(d)
temp
Date Mumbai Paris Tokyo
0 09-01 23 45 3
1 09-02 32 2 4
2 09-03 12 4 5
3 09-04 45 5 4
4 09-05 3 46 6
5 09-06 6 4 3
6 09-07 7 7 32
7 09-08 8 85 2
8 09-09 1 12 3
9 09-10 9 9 13
temp['Tokyo']
0     3
1     4
2     5
3     4
4     6
5     3
6    32
7     2
8     3
9    13
Name: Tokyo, dtype: int64
temp = temp.set_index('Date')
temp
Mumbai Paris Tokyo
Date
09-01 23 45 3
09-02 32 2 4
09-03 12 4 5
09-04 45 5 4
09-05 3 46 6
09-06 6 4 3
09-07 7 7 32
09-08 8 85 2
09-09 1 12 3
09-10 9 9 13
import os as os
os.getcwd()
'C:\\Anaconda'
tb = pd.read_csv('C:/Anaconda/TB_outcomes.csv')
tb.head()
country iso2 iso3 iso_numeric g_whoregion year rep_meth new_sp_coh new_sp_cur new_sp_cmplt mdr_coh mdr_succ mdr_fail mdr_died mdr_lost xdr_coh xdr_succ xdr_fail xdr_died xdr_lost
0 Afghanistan AF AFG 4 EMR 1994 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 Afghanistan AF AFG 4 EMR 1995 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 Afghanistan AF AFG 4 EMR 1996 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 Afghanistan AF AFG 4 EMR 1997 100 2001 786 108 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 Afghanistan AF AFG 4 EMR 1998 100 2913 772 199 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 72 columns

tb.tail()
country iso2 iso3 iso_numeric g_whoregion year rep_meth new_sp_coh new_sp_cur new_sp_cmplt mdr_coh mdr_succ mdr_fail mdr_died mdr_lost xdr_coh xdr_succ xdr_fail xdr_died xdr_lost
4052 Zimbabwe ZW ZWE 716 AFR 2008 100 10370 6973 734 0 NaN NaN NaN NaN 0 NaN NaN NaN NaN
4053 Zimbabwe ZW ZWE 716 AFR 2009 100 10195 7131 868 1 1 0 0 0 0 0 0 0 0
4054 Zimbabwe ZW ZWE 716 AFR 2010 100 11654 8377 1116 6 4 0 2 0 0 0 0 0 0
4055 Zimbabwe ZW ZWE 716 AFR 2011 NaN 12596 9208 995 70 57 0 9 2 0 0 0 0 0
4056 Zimbabwe ZW ZWE 716 AFR 2012 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 72 columns

To get unique values

tb['country'].unique()
array(['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra',
       'Angola', 'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia',
       'Aruba', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
       'Bermuda', 'Bhutan', 'Bolivia (Plurinational State of)',
       'Bonaire, Saint Eustatius and Saba', 'Bosnia and Herzegovina',
       'Botswana', 'Brazil', 'British Virgin Islands', 'Brunei Darussalam',
       'Bulgaria', 'Burkina Faso', 'Burundi', 'Cabo Verde', 'Cambodia',
       'Cameroon', 'Canada', 'Cayman Islands', 'Central African Republic',
       'Chad', 'Chile', 'China', 'China, Hong Kong SAR',
       'China, Macao SAR', 'Colombia', 'Comoros', 'Congo', 'Cook Islands',
       'Costa Rica', "C\xc3\xb4te d'Ivoire", 'Croatia', 'Cuba',
       'Cura\xc3\xa7ao', 'Cyprus', 'Czech Republic',
       "Democratic People's Republic of Korea",
       'Democratic Republic of the Congo', 'Denmark', 'Djibouti',
       'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt', 'El Salvador',
       'Equatorial Guinea', 'Eritrea', 'Estonia', 'Ethiopia', 'Fiji',
       'Finland', 'France', 'French Polynesia', 'Gabon', 'Gambia',
       'Georgia', 'Germany', 'Ghana', 'Greece', 'Greenland', 'Grenada',
       'Guam', 'Guatemala', 'Guinea', 'Guinea-Bissau', 'Guyana', 'Haiti',
       'Honduras', 'Hungary', 'Iceland', 'India', 'Indonesia',
       'Iran (Islamic Republic of)', 'Iraq', 'Ireland', 'Israel', 'Italy',
       'Jamaica', 'Japan', 'Jordan', 'Kazakhstan', 'Kenya', 'Kiribati',
       'Kuwait', 'Kyrgyzstan', "Lao People's Democratic Republic",
       'Latvia', 'Lebanon', 'Lesotho', 'Liberia', 'Libya', 'Lithuania',
       'Luxembourg', 'Madagascar', 'Malawi', 'Malaysia', 'Maldives',
       'Mali', 'Malta', 'Marshall Islands', 'Mauritania', 'Mauritius',
       'Mexico', 'Micronesia (Federated States of)', 'Monaco', 'Mongolia',
       'Montenegro', 'Montserrat', 'Morocco', 'Mozambique', 'Myanmar',
       'Namibia', 'Nauru', 'Nepal', 'Netherlands Antilles', 'Netherlands',
       'New Caledonia', 'New Zealand', 'Nicaragua', 'Niger', 'Nigeria',
       'Niue', 'Northern Mariana Islands', 'Norway', 'Oman', 'Pakistan',
       'Palau', 'Panama', 'Papua New Guinea', 'Paraguay', 'Peru',
       'Philippines', 'Poland', 'Portugal', 'Puerto Rico', 'Qatar',
       'Republic of Korea', 'Republic of Moldova', 'Romania',
       'Russian Federation', 'Rwanda', 'Saint Kitts and Nevis',
       'Saint Lucia', 'Saint Vincent and the Grenadines', 'Samoa',
       'San Marino', 'Sao Tome and Principe', 'Saudi Arabia', 'Senegal',
       'Serbia & Montenegro', 'Serbia', 'Seychelles', 'Sierra Leone',
       'Singapore', 'Sint Maarten (Dutch part)', 'Slovakia', 'Slovenia',
       'Solomon Islands', 'Somalia', 'South Africa', 'South Sudan',
       'Spain', 'Sri Lanka', 'Sudan', 'Suriname', 'Swaziland', 'Sweden',
       'Switzerland', 'Syrian Arab Republic', 'Tajikistan', 'Thailand',
       'The Former Yugoslav Republic of Macedonia', 'Timor-Leste', 'Togo',
       'Tokelau', 'Tonga', 'Trinidad and Tobago', 'Tunisia', 'Turkey',
       'Turkmenistan', 'Turks and Caicos Islands', 'Tuvalu', 'Uganda',
       'Ukraine', 'United Arab Emirates',
       'United Kingdom of Great Britain and Northern Ireland',
       'United Republic of Tanzania', 'United States of America',
       'Uruguay', 'US Virgin Islands', 'Uzbekistan', 'Vanuatu',
       'Venezuela (Bolivarian Republic of)', 'Viet Nam',
       'Wallis and Futuna Islands', 'West Bank and Gaza Strip', 'Yemen',
       'Zambia', 'Zimbabwe'], dtype=object)

Counting number of Unique values

tb.country.value_counts() 
Botswana                            19
Bolivia (Plurinational State of)    19
Greenland                           19
Armenia                             19
China                               19
Togo                                19
Mongolia                            19
Saint Kitts and Nevis               19
Cuba                                19
Benin                               19
Cook Islands                        19
Malawi                              19
Norway                              19
Nauru                               19
Solomon Islands                     19
...
US Virgin Islands                    19
China, Hong Kong SAR                 19
Denmark                              19
Philippines                          19
Canada                               19
China, Macao SAR                     19
Netherlands Antilles                 15
Timor-Leste                          11
Serbia & Montenegro                  10
Montenegro                            8
Serbia                                8
Bonaire, Saint Eustatius and Saba     4
Sint Maarten (Dutch part)             4
Curaçao                               4
South Sudan                           3
Length: 219, dtype: int64
tb.describe()
iso_numeric year rep_meth new_sp_coh new_sp_cur new_sp_cmplt new_sp_died new_sp_fail new_sp_def c_new_sp_tsr mdr_coh mdr_succ mdr_fail mdr_died mdr_lost xdr_coh xdr_succ xdr_fail xdr_died xdr_lost
count 4057.000000 4057.000000 3037.000000 3053.000000 2944.000000 2943.00000 2993.000000 2876.000000 2955.000000 3004.000000 1050.000000 1017.000000 959.000000 1000.000000 987.000000 562.000000 525.000000 524.000000 525.000000 524.000000
mean 433.592310 2003.042149 100.271320 10867.512611 7897.903533 963.62827 430.973939 184.123088 613.043655 75.767643 139.985714 71.208456 14.385819 22.544000 22.217832 6.181495 1.390476 0.837786 2.230476 0.776718
std 254.908076 5.485677 0.647391 45621.976594 37520.862855 3325.39556 1615.996031 812.662201 2386.874910 16.305073 726.653931 342.387797 106.821966 138.383012 113.607426 48.815990 9.570645 5.019886 20.085652 5.790293
min 4.000000 1994.000000 100.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 212.000000 1998.000000 100.000000 124.000000 66.750000 13.00000 7.000000 0.000000 4.000000 69.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
50% 430.000000 2003.000000 100.000000 1229.000000 721.500000 124.00000 60.000000 15.000000 90.000000 79.000000 6.000000 3.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
75% 646.000000 2008.000000 100.000000 5366.000000 3401.500000 580.50000 257.000000 99.000000 393.000000 87.000000 43.000000 24.000000 1.000000 6.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000
max 894.000000 2012.000000 102.000000 642321.000000 544731.000000 64938.00000 27005.000000 12505.000000 35469.000000 100.000000 15896.000000 5895.000000 2916.000000 3037.000000 2344.000000 751.000000 116.000000 64.000000 305.000000 94.000000

8 rows × 68 columns

 

Practice 1.1 – Python Pandas Cookbook by Alfred Essa

Following Alfred Essa’s Python Pandas Cookbook on YouTube
Different Ways to Construct Series
import pandas as pd
import numpy as np
Using Series Constructor
s1 = pd.Series([463,3,-728,236,32,-773])
s1
0    463
1      3
2   -728
3    236
4     32
5   -773
dtype: int64
type(s1)
pandas.core.series.Series
s1.values
array([ 463,    3, -728,  236,   32, -773], dtype=int64)
type(s1.values)
numpy.ndarray
s1.index
Int64Index([0, 1, 2, 3, 4, 5], dtype='int64')
s1[3]
236

Defining data and index

data1 = [3.5,5,343,9.3,23]
index1 = ['Mon','Tue','Wed','Thur','Fri']

Creating Series

s2 = pd.Series(data1, index = index1)
s2
Mon       3.5
Tue       5.0
Wed     343.0
Thur      9.3
Fri      23.0
dtype: float64
s2[4]
23.0
s2.index
Index([u'Mon', u'Tue', u'Wed', u'Thur', u'Fri'], dtype='object')
s2.name = 'Daily numbers'
s2.index.name = 'Working days'
s2
Working days
Mon               3.5
Tue               5.0
Wed             343.0
Thur              9.3
Fri              23.0
Name: Daily numbers, dtype: float64
Creating Dictionary
dict1 = {'Jan': -7,'Feb': 2,'March': 12,'April': -9,'May': 3,'June': 4}
s3 = pd.Series(dict1)
s3
April    -9
Feb       2
Jan      -7
June      4
March    12
May       3
dtype: int64
Vectorized Operations
s3 * 2
April   -18
Feb       4
Jan     -14
June      8
March    24
May       6
dtype: int64
np.log(s3)
April         NaN
Feb      0.693147
Jan           NaN
June     1.386294
March    2.484907
May      1.098612
dtype: float64
Slicing
s3['Feb':'May']
Feb       2
Jan      -7
June      4
March    12
May       3
dtype: int64
s3[3:5]
June      4
March    12
dtype: int64
Offset value
s3[3] = 54
s3
April    -9
Feb       2
Jan      -7
June     54
March    12
May       3
dtype: int64
s3.median()
2.5
s3.min()
-9
s3.max()
54
s3.cumsum()
April    -9
Feb      -7
Jan     -14
June     40
March    52
May      55
dtype: int64

Making Looping Clearer – enumerate() returns iterators

for i, v in enumerate(s3):
    print i,v
0 -9
1 2
2 -7
3 54
4 12
5 3
new_s3 = [x**2 for x in s3]
new_s3
[81, 4, 49, 2916, 144, 9]
Series using dictionary
s3['Feb']
2
'Feb' in s3
True
Assignment using key
s3['May'] = 45.8
s3
April    -9
Feb       2
Jan      -7
June     54
March    12
May      45
dtype: int64

Looping over dictionary keys and values

for k,v in s3.iteritems():
    print k,v
April -9
Feb 2
Jan -7
June 54
March 12
May 45

Pycurl and Pandas – Get csv file and explore it

Importing libraries

import pandas as pd
import os as os
import pycurl
import csv

To get the location of current working directory

os.getcwd()

‘C:\\Anaconda’ To change the working directory

os.chdir('C:\\Anaconda\\abalone')
os.getcwd()

‘C:\\Anaconda\\abalone’ Use pycurl to get a datafile from https and write it to a csv file

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
c = pycurl.Curl()
c.setopt(c.URL, url)
with open('abalone.csv', 'w+') as s:
    c.setopt(c.WRITEFUNCTION, s.write)
    c.perform()

To read csv file into abalone object

abalone = pd.read_csv('abalone.csv')
abalone
M 0.455 0.365 0.095 0.514 0.2245 0.101 0.15 15
0 M 0.350 0.265 0.090 0.2255 0.0995 0.0485 0.0700 7
1 F 0.530 0.420 0.135 0.6770 0.2565 0.1415 0.2100 9
2 M 0.440 0.365 0.125 0.5160 0.2155 0.1140 0.1550 10
3 I 0.330 0.255 0.080 0.2050 0.0895 0.0395 0.0550 7
4 I 0.425 0.300 0.095 0.3515 0.1410 0.0775 0.1200 8
5 F 0.530 0.415 0.150 0.7775 0.2370 0.1415 0.3300 20
6 F 0.545 0.425 0.125 0.7680 0.2940 0.1495 0.2600 16
7 M 0.475 0.370 0.125 0.5095 0.2165 0.1125 0.1650 9
8 F 0.550 0.440 0.150 0.8945 0.3145 0.1510 0.3200 19
9 F 0.525 0.380 0.140 0.6065 0.1940 0.1475 0.2100 14
10 M 0.430 0.350 0.110 0.4060 0.1675 0.0810 0.1350 10
11 M 0.490 0.380 0.135 0.5415 0.2175 0.0950 0.1900 11
12 F 0.535 0.405 0.145 0.6845 0.2725 0.1710 0.2050 10
13 F 0.470 0.355 0.100 0.4755 0.1675 0.0805 0.1850 10
14 M 0.500 0.400 0.130 0.6645 0.2580 0.1330 0.2400 12
15 I 0.355 0.280 0.085 0.2905 0.0950 0.0395 0.1150 7
16 F 0.440 0.340 0.100 0.4510 0.1880 0.0870 0.1300 10
17 M 0.365 0.295 0.080 0.2555 0.0970 0.0430 0.1000 7
18 M 0.450 0.320 0.100 0.3810 0.1705 0.0750 0.1150 9
19 M 0.355 0.280 0.095 0.2455 0.0955 0.0620 0.0750 11
20 I 0.380 0.275 0.100 0.2255 0.0800 0.0490 0.0850 10
21 F 0.565 0.440 0.155 0.9395 0.4275 0.2140 0.2700 12
22 F 0.550 0.415 0.135 0.7635 0.3180 0.2100 0.2000 9
23 F 0.615 0.480 0.165 1.1615 0.5130 0.3010 0.3050 10
24 F 0.560 0.440 0.140 0.9285 0.3825 0.1880 0.3000 11
25 F 0.580 0.450 0.185 0.9955 0.3945 0.2720 0.2850 11
26 M 0.590 0.445 0.140 0.9310 0.3560 0.2340 0.2800 12
27 M 0.605 0.475 0.180 0.9365 0.3940 0.2190 0.2950 15
28 M 0.575 0.425 0.140 0.8635 0.3930 0.2270 0.2000 11
29 M 0.580 0.470 0.165 0.9975 0.3935 0.2420 0.3300 10
4146 M 0.695 0.550 0.195 1.6645 0.7270 0.3600 0.4450 11
4147 M 0.770 0.605 0.175 2.0505 0.8005 0.5260 0.3550 11
4148 I 0.280 0.215 0.070 0.1240 0.0630 0.0215 0.0300 6
4149 I 0.330 0.230 0.080 0.1400 0.0565 0.0365 0.0460 7
4150 I 0.350 0.250 0.075 0.1695 0.0835 0.0355 0.0410 6
4151 I 0.370 0.280 0.090 0.2180 0.0995 0.0545 0.0615 7
4152 I 0.430 0.315 0.115 0.3840 0.1885 0.0715 0.1100 8
4153 I 0.435 0.330 0.095 0.3930 0.2190 0.0750 0.0885 6
4154 I 0.440 0.350 0.110 0.3805 0.1575 0.0895 0.1150 6
4155 M 0.475 0.370 0.110 0.4895 0.2185 0.1070 0.1460 8
4156 M 0.475 0.360 0.140 0.5135 0.2410 0.1045 0.1550 8
4157 I 0.480 0.355 0.110 0.4495 0.2010 0.0890 0.1400 8
4158 F 0.560 0.440 0.135 0.8025 0.3500 0.1615 0.2590 9
4159 F 0.585 0.475 0.165 1.0530 0.4580 0.2170 0.3000 11
4160 F 0.585 0.455 0.170 0.9945 0.4255 0.2630 0.2845 11
4161 M 0.385 0.255 0.100 0.3175 0.1370 0.0680 0.0920 8
4162 I 0.390 0.310 0.085 0.3440 0.1810 0.0695 0.0790 7
4163 I 0.390 0.290 0.100 0.2845 0.1255 0.0635 0.0810 7
4164 I 0.405 0.300 0.085 0.3035 0.1500 0.0505 0.0880 7
4165 I 0.475 0.365 0.115 0.4990 0.2320 0.0885 0.1560 10
4166 M 0.500 0.380 0.125 0.5770 0.2690 0.1265 0.1535 9
4167 F 0.515 0.400 0.125 0.6150 0.2865 0.1230 0.1765 8
4168 M 0.520 0.385 0.165 0.7910 0.3750 0.1800 0.1815 10
4169 M 0.550 0.430 0.130 0.8395 0.3155 0.1955 0.2405 10
4170 M 0.560 0.430 0.155 0.8675 0.4000 0.1720 0.2290 8
4171 F 0.565 0.450 0.165 0.8870 0.3700 0.2390 0.2490 11
4172 M 0.590 0.440 0.135 0.9660 0.4390 0.2145 0.2605 10
4173 M 0.600 0.475 0.205 1.1760 0.5255 0.2875 0.3080 9
4174 F 0.625 0.485 0.150 1.0945 0.5310 0.2610 0.2960 10
4175 M 0.710 0.555 0.195 1.9485 0.9455 0.3765 0.4950 12
To add column names
abalone.columns = ['Sex', 'Length','Diameter','Height','Whole weight','Shucked weight','Viscera weight','Shell weight','Rings']
 To write data to a csv file
abalone.to_csv('abalone.csv')
 To get 4 top-most observations
abalone.head(4)
Sex Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
0 M 0.35 0.265 0.090 0.2255 0.0995 0.0485 0.070 7
1 F 0.53 0.420 0.135 0.6770 0.2565 0.1415 0.210 9
2 M 0.44 0.365 0.125 0.5160 0.2155 0.1140 0.155 10
3 I 0.33 0.255 0.080 0.2050 0.0895 0.0395 0.055 7

To get 4 bottom-most observations

abalone.tail(4)
Sex Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
4172 M 0.590 0.440 0.135 0.9660 0.4390 0.2145 0.2605 10
4173 M 0.600 0.475 0.205 1.1760 0.5255 0.2875 0.3080 9
4174 F 0.625 0.485 0.150 1.0945 0.5310 0.2610 0.2960 10
4175 M 0.710 0.555 0.195 1.9485 0.9455 0.3765 0.4950 12
 To get basic statistics for all numeric variables
abalone.describe()
Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
count 4176.000000 4176.000000 4176.000000 4176.000000 4176.00000 4176.000000 4176.000000 4176.000000
mean 0.524009 0.407892 0.139527 0.828818 0.35940 0.180613 0.238852 9.932471
std 0.120103 0.099250 0.041826 0.490424 0.22198 0.109620 0.139213 3.223601
min 0.075000 0.055000 0.000000 0.002000 0.00100 0.000500 0.001500 1.000000
25% 0.450000 0.350000 0.115000 0.441500 0.18600 0.093375 0.130000 8.000000
50% 0.545000 0.425000 0.140000 0.799750 0.33600 0.171000 0.234000 9.000000
75% 0.615000 0.480000 0.165000 1.153250 0.50200 0.253000 0.329000 11.000000
max 0.815000 0.650000 1.130000 2.825500 1.48800 0.760000 1.005000 29.000000
To get covariance
abalone.cov()
Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
Length 0.014425 0.011763 0.004157 0.054499 0.023938 0.011889 0.015009 0.215697
Diameter 0.011763 0.009850 0.003461 0.045046 0.019678 0.009789 0.012509 0.183968
Height 0.004157 0.003461 0.001749 0.016804 0.007195 0.003660 0.004759 0.075251
Whole weight 0.054499 0.045046 0.016804 0.240515 0.105533 0.051953 0.065225 0.854995
Shucked weight 0.023938 0.019678 0.007195 0.105533 0.049275 0.022678 0.027275 0.301440
Viscera weight 0.011889 0.009789 0.003660 0.051953 0.022678 0.012017 0.013851 0.178196
Shell weight 0.015009 0.012509 0.004759 0.065225 0.027275 0.013851 0.019380 0.281839
Rings 0.215697 0.183968 0.075251 0.854995 0.301440 0.178196 0.281839 10.391606
 To get pairwise-correlation coefficients for all numeric variables
abalone.corr()
Length Diameter Height Whole weight Shucked weight Viscera weight Shell weight Rings
Length 1.000000 0.986813 0.827552 0.925255 0.897905 0.903010 0.897697 0.557123
Diameter 0.986813 1.000000 0.833705 0.925452 0.893159 0.899726 0.905328 0.575005
Height 0.827552 0.833705 1.000000 0.819209 0.774957 0.798293 0.817326 0.558109
Whole weight 0.925255 0.925452 0.819209 1.000000 0.969403 0.966372 0.955351 0.540818
Shucked weight 0.897905 0.893159 0.774957 0.969403 1.000000 0.931956 0.882606 0.421256
Viscera weight 0.903010 0.899726 0.798293 0.966372 0.931956 1.000000 0.907647 0.504274
Shell weight 0.897697 0.905328 0.817326 0.955351 0.882606 0.907647 1.000000 0.628031
Rings 0.557123 0.575005 0.558109 0.540818 0.421256 0.504274 0.628031 1.000000
 To get unique values of ‘Rings’ column
abalone['Rings'].unique()
array([ 7, 9, 10, 8, 20, 16, 19, 14, 11, 12, 15, 18, 13, 5, 4, 6, 21, 17, 22, 1, 3, 26, 23, 29, 2, 27, 25, 24], dtype=int64)
To subset – have only ‘Length’,’Diameter’ and ‘Height’ in data set abalone1
abalone1 = abalone[['Length','Diameter','Height']]
Inspect abalone1 by checking head and tail
abalone1.head(3)
Length Diameter Height
0 0.35 0.265 0.090
1 0.53 0.420 0.135
2 0.44 0.365 0.125
abalone1.tail(3)
Length Diameter Height
4173 0.600 0.475 0.205
4174 0.625 0.485 0.150
4175 0.710 0.555 0.195

For code : http://nbviewer.ipython.org/gist/sunakshi132/4791b6838e7bf3fde38b

Creating a Dictionary in Python

Dictionary is similar to a list. It contains key:value pairs. Here, we will learn about creating and modifying dictionaries.
Creating Dictionary
dictionary_name = {key:value, key:value,…..}

We create a dictionary with colors of jams as keys and fruit names as values.

jam = {'red':'strawberry', 'yellow': 'mango', 'orange':'orange'}
print jam['orange']
print jam['red']

orange
strawberry

Insertion – Here, we add a new key/value pair to the existing dictionary.

jam['blue'] = 'blueberry'

{‘blue’: ‘blueberry’, ‘red’: ‘strawberry’, ‘yellow’: ‘mango’, ‘orange’: ‘orange’}

Deletion – Here, we delete a key/value pair

del jam['orange']
jam

{‘blue’: ‘blueberry’, ‘red’: ‘strawberry’, ‘yellow’: ‘mango’}

Replacing values – We replace value for key ‘red’ from strawberry to cherry.

jam ['red'] = 'cherry'
jam

{‘blue’: ‘blueberry’, ‘red’: ‘cherry’, ‘yellow’: ‘mango’}

Nesting : We nest a new dictionary within an existing dictionary.

 jam['red'] = {'light' : 'cherry', 'dark' : 'strawberry' }

{‘blue’: ‘blueberry’, ‘red’: {‘dark’: ‘strawberry’, ‘light’: ‘cherry’}, ‘yellow’: ‘mango’}
We set a list as a value for key ‘blue’.

jam['blue'] = ['blueberry','plum']
jam

{‘blue’: [‘blueberry’, ‘plum’], ‘red’: {‘dark’: ‘strawberry’, ‘light’: ‘cherry’}, ‘yellow’: ‘mango’}

Append to a list

jam['blue'].append('jamun')  
print jam['blue']

[‘blueberry’, ‘plum’, ‘jamun’]

Indexing the nested dictionary

print jam['red']['dark']

strawberry

jam

{‘blue’: [‘blueberry’, ‘plum’, ‘jamun’], ‘red’: {‘dark’: ‘strawberry’, ‘light’: ‘cherry’}, ‘yellow’: ‘mango’}

Build-In function in Python : map()

map(function, sequence)

map() has two parameters : 1. function 2. sequence. It calls the function for each element in the sequence.

To see how map() works, let us first define a function called square.

def square(i): return i*i
square(4)

16

Now, apply map() on square function for range 0 to 5.

map(square, range(6))

[0, 1, 4, 9, 16, 25]

Now, apply map() on square function for range 13 to 23.

map(square,range(13,24))

[169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529]

Next, we define a remainder function.

def remainder(j): return j % 3
remainder(14)

2

Now, apply map() on remainder function for range 1 to 10.

map(remainder,range(1,11))

[1, 2, 0, 1, 2, 0, 1, 2, 0, 1]

Now, apply map() on remainder function for range 91 to 100.

map(remainder,range(91,101))

[1, 2, 0, 1, 2, 0, 1, 2, 0, 1]

Using Lists as Stacks and Queues in Python

Before reading about stacks and queues, you may wish to read about – How to create and modify lists in python.

What is a Stack?

A stack is a data structure that could be represented by a pile where insertion and deletion of items takes place only at a single end called top of the stack. The basic way to access data in stack is by Last In First Out(LIFO) method. To understand the structure of stack, imagine a pile of books (call it stack). One can use only the top end of the pile to add or remove a book to the stack. Also, index numbers are not assigned to elements in a stack, hence, the elements in the middle of a stack cannot be accessed directly.

Using List as a Stack

We create a list called stack.

stack = ["geography","statistics","biology","linear algebra"]
stack

[‘geography’, ‘statistics’, ‘biology’, ‘linear algebra’]

Insertion to a Stack
Now we use append() to add new elements to the stack. Here the last element being added is “economics”.

stack.append("physics")
stack.append("history")
stack.append("economics")
stack

[‘geography’,’statistics’, ‘biology’, ‘linear algebra’, ‘physics’, ‘history’, ‘economics’]

Deletion from a Stack
Now we remove the element last added by using pop() without an index number. Remember elements in a stack do not have index numbers.

stack.pop()

‘economics’

stack.pop()

‘history’

We see that our stack follows the principle of LIFO (as mentioned above). Lets check our stack after removing the last two elements.

stack

[‘geography’, ‘statistics’, ‘biology’, ‘linear algebra’, ‘physics’]

What is a Queue?

A queue is a data structure that could be represented by a queue(sequence of people) at a ticket counter. It has a front and a back. At a ticket counter queue, new persons join at the back and the first person buys the tickets first and leaves first. Similarly, the data structure queue follows the principle of First In First Out(FIFO). Addition of elements is called “Enqueue” and Removal of elements is called “Dequeue”. Enqueue takes place at the back, while Dequeue takes place at the front.

 

Using List as a Queue
To use list as a queue, we use collections.deque. It was designed to have faster appends and pops from both ends of a list.
from collections import deque

Now, we create a queue by using deque()on a list.

queue = deque(["rohan", "sameer", "adil", "saksham"])
queue

deque([‘rohan’, ‘sameer’, ‘adil’, ‘saksham’])

Insertion to a Queue
We use append() to add an element at the end of the queue.

queue.append("priya")
queue.append("aashi")
queue

deque([‘rohan’, ‘sameer’, ‘adil’, ‘saksham’, ‘priya’, ‘aashi’])

Deletion from a Queue
We use popleft() to remove the element at the beginning of the queue.

queue.popleft()

‘rohan’

queue.popleft()

‘sameer’

We see that our queue follows the principle of FIFO (as mentioned above). Lets check our queue after removing the first two elements.

queue

deque([‘adil’, ‘saksham’, ‘priya’, ‘aashi’])

Creating and Modifying Lists in Python

List stores pieces of information in a sequence within a variable.

Creating – We create a list and store it under a variable called my_list. In python, to create a list, we need to mention the items within square brackets “[]”. Here we create a list of string items, hence each item is written within double quotes.

my_list = ["frog", "dino", "lion", "rabbit"]
my_list

[‘frog’, ‘dino’, ‘lion’, ‘rabbit’]

Appending – We use append to add an item at the end of the list. Here, we append “tiger” to the list.

my_list.append("tiger")
my_list

[‘frog’, ‘dino’, ‘lion’, ‘rabbit’, ‘tiger’]

Creating – Now we create another list called new_list.

new_list = ["red","blue","orange"]

Extending – We use extend to add one list to another list. Here we extend new_list by my_list.

new_list.extend(my_list) 
new_list

[‘red’, ‘blue’, ‘orange’, ‘frog’, ‘dino’, ‘lion’, ‘rabbit’, ‘tiger’]

Insertion – We use insert to add a new item at a specified position in a list. Here, we insert a new item “snake” at position 2 of my_list. Index number in python lists starts from 0. Here, “frog” is at 0 index number, “dino” at 1 index number and so on.

my_list.insert(2,"snake")
my_list

[‘frog’, ‘dino’, ‘snake’, ‘lion’, ‘rabbit’, ‘tiger’]

Removing by item name – Remove looks for the first mention of the specified item name in the list and deletes it. Here, remove looks for first mention of “dino” in my_list and removes it.

my_list.remove("dino")
my_list

[‘frog’, ‘snake’, ‘lion’, ‘rabbit’, ‘tiger’]

Removing by index number – Pop removes the item at the specified index number. Here, item mentioned at the index number 3 i.e. “rabbit” is removed.

my_list.pop(3)
my_list

[‘frog’, ‘snake’, ‘lion’, ‘tiger’]

Finding index number – Index finds the index number of the first mention of specified item. Here, we find index number for item “lion” in my_list.

my_list.index("lion")

2

Counting item mentions – Count gives the number of mentions of the specified item in the list. Here, we get the count of item “snake” in my_list.

my_list.count("snake")

1

Sorting(Ascending) – Sort helps to sort list in ascending or descending order. By default it sorts the list in ascending order. Here, we sort my_list in ascending order.

my_list.sort()
my_list

[‘frog’, ‘lion’, ‘snake’, ‘tiger’]

Sorting(Descending) – To sort the list in descending order, we set reverse option as TRUE. Here, we sort my_list in descending order.

my_list.sort(reverse=True)
my_list

[‘tiger’, ‘snake’, ‘lion’, ‘frog’]

Reversing – We use reverse to reverse the items of a list such that the item at position 0 takes position len(list), item at position 1 takes position len(list-1) and so on. Here, we reverse new_list

new_list.reverse()
new_list

[‘tiger’, ‘rabbit’, ‘lion’, ‘dino’, ‘frog’, ‘orange’, ‘blue’, ‘red’]

Next, you may like to read about How to use lists as Stacks and Queues.