Show Menu
Cheatography

PYTHON PANDAS Cheat Sheet by

install and import

installing pandas
pip install pandas
pip install pandas
import pandas as pd

Reading and describing

pd -> pandas
df-> dataframe
to read a file into a dataframe
df= pd.rea­d_c­sv(­'fi­len­ame')
look at the first 5 lines
df.head()
to describe df
df.des­cribe()
df.info()
to print all the column names
teleco­m_d­ata.co­lumns
to get the dimension of df
df.shape

Sorting and filtering

sort
sorting can be done column wise - default is ascending
df.sor­t_v­alu­es(­by=­'Total day charge')
df.sor­t_v­alu­es(­col1)
Sort values by col1 in ascending order (use ascending =False for descending sort)
df.sor­t_v­alu­es(­[co­l1,­col­2],­asc­end­ing­=[T­rue­,Fa­lse])
Sort values by col1 in ascending order then col2 in descending order
Filtering
df[con­dition]  ­ ­   #eg: df[df[­'co­l']­>5]
df[df[­'col'] > 0.5]
Rows where the column col is greater than 0.5
df[(df­[col] > 0.5) & (df[col] < 0.7)]
Rows where 0.7 > col > 0.5

Inplace

NOTE
df.mer­ge(df2)
gives you a copy of df merged with df2. you may save it to a new variable. ex
df3=df.me­rge­(df2)
if you want to merge df2 to df right away use inplace.
df.mer­ge(­df2­,in­pla­ce=­True)
 

Rows and columns

to delete a row - [axis=0 means rows]
new_df = df.dro­p([­2,3­],axis = 0)   #this drops the row with index 2,3
to delete a column- [axis=1 means columns]
new_df = df.dro­p([­'co­l1'­,'c­ol2­'],axis = 0)   #this drops the column with name col1 and col2

Df manipu­lation

create or edit a new column
df['ne­w_c­oln­ame'] = 5   #this creates a new new column with all values as 5
create a new column
df['ne­w_c­oln­ame'] = [list of values]   #this creates a new column with list of values assigned to each corres­ponding row 
NOTE :
df['ne­w_c­oln­ame'] = [list of values]
throws an error if the no of items in [list of values] doesn't match no of rows
create or edit a new row
df.loc­[in­dex­_of­_row] = [list of items]
NOTE :
df.loc­[in­dex­_of­_row] = [list of items]
throws an error if the no of items doesn't match no of rows

Selection

df[col]
Returns column with label col as Series
df[[col1, col2]]
Returns multiple columns as a new DataFrame
 ­  Country  ­ ­Capital  ­ ­ ­ ­ ­ ­Pop­ulation
1 Belgiu­m   Brussels  ­ ­ ­ ­111­90846
2 India  ­ ­ ­ ­ ­ ­ New Delhi  ­130­3171035
3 Brazil  ­ ­ ­ ­ ­ ­Bra­silia  ­ ­ ­ ­ ­ ­207­847528
df.ilo­c([0], [0])
--> 'Belgium'  | s.iloc[0] | Selection by position (0th position on row and column)
df.loc­([0],  ['Coun­try'])
--> 'Belgium'
df.ix[2]
-->
Country  ­ ­ ­ ­ ­  Brazil
Capital         Brasilia
Population   207847528
df.ix[1, 'Capital']
--> 'New Delhi'
df.ilo­c[0,:] | select First row
 

Data Cleaning

df.set­_in­dex­('c­olu­mn_­one')
Change the index with a new column
df.columns = ['new_­col­_na­me1­','­new­_co­l_n­ame­2',­'ne­w_c­ol_­name3']
Rename columns
pd.isn­ull()
Checks for null Values, Returns Boolean Arrray
pd.not­null()
Opposite of pd.isn­ull()
df.dro­pna()
Drop all rows that contain null values
df.dro­pna­(ax­is=1)
Drop all columns that contain null values
df.dro­pna­(ax­is=­1,t­hre­sh=n)
Drop all rows have have less than n non null
df.fil­lna(x)
Replace all null values with x

JOIN/C­OMBINE

df1.ap­pen­d(df2)
Adds the rows in df1 to the end of
df2 (columns should
be identical)
pd.con­cat­([df1, df2],a­xis=1)
Adds the columns in
df1 to the end
of df2 (rows should be
identical)
df1.jo­in(­df2­,on­=co­l1,­how­='i­nner')
joins the columns in df1 with the columns on
df2 where the rows
for col have identical
values. how can be
one of 'left',
'right', 'outer', 'inner'
left = takes the index of left df
right =takes the index of left
outer = union of both keys
inner = inters­ection of both keys
   
 

Comments

No comments yet. Add yours below!

Add a Comment

Your Comment

Please enter your name.

    Please enter your email address

      Please enter your Comment.

          Related Cheat Sheets

          PandasPlotting Cheat Sheet