New York Flights Dataset

Let us understand the New York Flights Dataset. This dataset is available in the package called nycflights13. I already have it installed. You can install the package by the command install.packages(“nycflights13”).

2015-02-4

Package

> library(nycflights13)

library command loads the package

Data Inspection

> dim(flights)

dim command gives the dimensions of the dataset = no. of observation * no. of variables

[1] 336776     16
> head(flights)

head gives the first few observations

  year month day dep_time dep_delay arr_time arr_delay carrier tailnum
1 2013     1   1      517         2      830        11      UA  N14228
2 2013     1   1      533         4      850        20      UA  N24211
3 2013     1   1      542         2      923        33      AA  N619AA
4 2013     1   1      544        -1     1004       -18      B6  N804JB
5 2013     1   1      554        -6      812       -25      DL  N668DN
6 2013     1   1      554        -4      740        12      UA  N39463
  flight origin dest air_time distance hour minute
1   1545    EWR  IAH      227     1400    5     17
2   1714    LGA  IAH      227     1416    5     33
3   1141    JFK  MIA      160     1089    5     42
4    725    JFK  BQN      183     1576    5     44
5    461    LGA  ATL      116      762    5     54
6   1696    EWR  ORD      150      719    5     54
> tail(flights)

tail gives the last few observations

With the help of head and tail  commands we inspect whether the data has loaded properly or not.

       year month day dep_time dep_delay arr_time arr_delay carrier
336771 2013     9  30       NA        NA       NA        NA      EV
336772 2013     9  30       NA        NA       NA        NA      9E
336773 2013     9  30       NA        NA       NA        NA      9E
336774 2013     9  30       NA        NA       NA        NA      MQ
336775 2013     9  30       NA        NA       NA        NA      MQ
336776 2013     9  30       NA        NA       NA        NA      MQ
       tailnum flight origin dest air_time distance hour minute
336771  N740EV   5274    LGA  BNA       NA      764   NA     NA
336772           3393    JFK  DCA       NA      213   NA     NA
336773           3525    LGA  SYR       NA      198   NA     NA
336774  N535MQ   3461    LGA  BNA       NA      764   NA     NA
336775  N511MQ   3572    LGA  CLE       NA      419   NA     NA
336776  N839MQ   3531    LGA  RDU       NA      431   NA     NA
> str(flights)

str command gives the structure of dataset. It briefs us about the variable names and variable types.

Classes 'tbl_df', 'tbl' and 'data.frame':   336776 obs. of  16 variables:
 $ year     : int  2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
 $ month    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ day      : int  1 1 1 1 1 1 1 1 1 1 ...
 $ dep_time : int  517 533 542 544 554 554 555 557 557 558 ...
 $ dep_delay: num  2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
 $ arr_time : int  830 850 923 1004 812 740 913 709 838 753 ...
 $ arr_delay: num  11 20 33 -18 -25 12 19 -14 -8 8 ...
 $ carrier  : chr  "UA" "UA" "AA" "B6" ...
 $ tailnum  : chr  "N14228" "N24211" "N619AA" "N804JB" ...
 $ flight   : int  1545 1714 1141 725 461 1696 507 5708 79 301 ...
 $ origin   : chr  "EWR" "LGA" "JFK" "JFK" ...
 $ dest     : chr  "IAH" "IAH" "MIA" "BQN" ...
 $ air_time : num  227 227 160 183 116 150 158 53 140 138 ...
 $ distance : num  1400 1416 1089 1576 762 ...
 $ hour     : num  5 5 5 5 5 5 5 5 5 5 ...
 $ minute   : num  17 33 42 44 54 54 55 57 57 58 ...
> summary(flights)

summary gives the minimum and maximum values, mean and median and quartiles of all variables.

      year          month             day           dep_time   
 Min.   :2013   Min.   : 1.000   Min.   : 1.00   Min.   :   1  
 1st Qu.:2013   1st Qu.: 4.000   1st Qu.: 8.00   1st Qu.: 907  
 Median :2013   Median : 7.000   Median :16.00   Median :1401  
 Mean   :2013   Mean   : 6.549   Mean   :15.71   Mean   :1349  
 3rd Qu.:2013   3rd Qu.:10.000   3rd Qu.:23.00   3rd Qu.:1744  
 Max.   :2013   Max.   :12.000   Max.   :31.00   Max.   :2400  
                                                 NA's   :8255  
   dep_delay          arr_time      arr_delay          carrier         
 Min.   : -43.00   Min.   :   1   Min.   : -86.000   Length:336776     
 1st Qu.:  -5.00   1st Qu.:1104   1st Qu.: -17.000   Class :character  
 Median :  -2.00   Median :1535   Median :  -5.000   Mode  :character  
 Mean   :  12.64   Mean   :1502   Mean   :   6.895                     
 3rd Qu.:  11.00   3rd Qu.:1940   3rd Qu.:  14.000                     
 Max.   :1301.00   Max.   :2400   Max.   :1272.000                     
 NA's   :8255      NA's   :8713   NA's   :9430                         
   tailnum              flight        origin              dest          
 Length:336776      Min.   :   1   Length:336776      Length:336776     
 Class :character   1st Qu.: 553   Class :character   Class :character  
 Mode  :character   Median :1496   Mode  :character   Mode  :character  
                    Mean   :1972                                        
                    3rd Qu.:3465                                        
                    Max.   :8500                                        

    air_time        distance         hour           minute     
 Min.   : 20.0   Min.   :  17   Min.   : 0.00   Min.   : 0.00  
 1st Qu.: 82.0   1st Qu.: 502   1st Qu.: 9.00   1st Qu.:16.00  
 Median :129.0   Median : 872   Median :14.00   Median :31.00  
 Mean   :150.7   Mean   :1040   Mean   :13.17   Mean   :31.76  
 3rd Qu.:192.0   3rd Qu.:1389   3rd Qu.:17.00   3rd Qu.:49.00  
 Max.   :695.0   Max.   :4983   Max.   :24.00   Max.   :59.00  
 NA's   :9430                   NA's   :8255    NA's   :8255   

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s