4.1 Head and Tail¶

How to quickly assess dataframe properties, using `head` and `tail`¶

Watch this video from 2:25 to 4:12

# To load the video, execute this cell by pressing shift + enter

from IPython.display import YouTubeVideo
from datetime import timedelta
start=int(timedelta(hours=0, minutes=2, seconds=25).total_seconds())
end=int(timedelta(hours=0, minutes=4, seconds=12).total_seconds())

YouTubeVideo("jEQRU55x0e4",start=start,end=end,width=640,height=360)

The following is a transcript of the video.

💡 Remember: Import pandas and read in the dataset below to complete this lesson.

# Import pandas

import pandas as pd

# Download the dataset from the
# Jupyter Book to read in locally or 
# read in from GitHub, below:

data = pd.read_csv('https://raw.githubusercontent.com/DanChitwood/PlantsAndPython/master/co2_mlo_weekly.csv')

Now let’s see how to look at our dataframe using head and tail.

So it’s important to know what you’re working with, to “see” the dataframe. Head and tail allow us to do that. Remember that the data is stored in the variable data. We use .head() to see a preview of the first five rows and we’ll also get the column names.

# It's important to know what you are working with
# to "see" the dataframe
# .head() shows the first rows and column names

data.head()

	date	running_date	month	year	CO2ppm
0	8/13/17	1	aug	2017	405.2
1	8/14/17	2	aug	2017	405.2
2	8/15/17	3	aug	2017	405.2
3	8/16/17	4	aug	2017	405.2
4	8/17/17	5	aug	2017	405.2

We can also use tail. And tail, if the head is at the beginning, then the tail is the end. You can see that we get the last rows in our dataset. Tail is very useful to see just how many rows that you have overall. Remember that we’re beginning with zero, so we have 714 rows or data points in this dataset.

# .tail() shows the las rows

data.tail()

	date	running_date	month	year	CO2ppm
709	7/23/19	710	jul	2019	410.87
710	7/24/19	711	jul	2019	410.87
711	7/25/19	712	jul	2019	410.87
712	7/26/19	713	jul	2019	410.87
713	7/27/19	714	jul	2019	410.87

Describe is a very useful function that gives you back summary statistics for your continuous variables. If we use describe on our data we get back information for running date, which is just a number that is increasing to keep track of day; year is included as a continuous variable, even though we do not want it to be a continuous variable; and CO2 parts per million, which is of course a continuous variable as well. We get back how many entries of each variable that we have. We have 714 of each. We get the mean of each; the standard deviation; the minimum; the quartiles at the 25th, 50th, and 75th percentiles; and the maximum value as well.

# .describe() is very useful, showing summary statistics
# it provides stats for continuous variables

data.describe() 

	running_date	year	CO2ppm
count	714.000000	714.000000	714.000000
mean	357.500000	2018.093838	408.977059
std	206.258333	0.693299	3.189098
min	1.000000	2017.000000	402.760000
25%	179.250000	2018.000000	406.530000
50%	357.500000	2018.000000	409.010000
75%	535.750000	2019.000000	411.450000
max	714.000000	2019.000000	415.390000

So that’s how to read in a dataframe, how to quickly look at the dataframe and get summary statistics of the continuous variables.

Plants & Python

4.1 Head and Tail¶

How to quickly assess dataframe properties, using head and tail¶

How to quickly assess dataframe properties, using `head` and `tail`¶