InfinityCodeX

Unlock the power of Python, Data Science, Machine Learning, and Deep Learning with our comprehensive guides! Whether you're a beginner eager to dive into AI or an experienced professional looking to sharpen your skills, our blog offers easy-to-follow tutorials, insightful tips, and expert advice to fuel your AI career. Dive in today and start mastering the cutting-edge technologies shaping the future!

Pandas Python

In computer programming, pandas is a software library written for the Python programming language for data manipulation & analysis. In particular, it offers data structures & operators for manipulating numeric tables & time series.

It leverage the power & speed of numpy to make data analysis & preprocessing easy for data scientists.

Before diving into Pandas you must know Numpy then you can easy understand & use Pandas. Remember to use Pandas you must install Numpy.

Pandas Python Installation : To install pandas go to command prompt & type : pip install pandas

Now go to Anaconda > jupyternotebook > Click on ( New ) > Select Python3

Remember how we imported Numpy similarly we have to import Pandas. But while importing Pandas we also have to import Numpy.

import numpy as np
import pandas as pd

Click shift+enter to run the code.

Lets keep it simple & start with code :

code 1 :

import numpy as np
import pandas as pd
dict = {
         "Name" : ['John','Rony','Leo','Kane','Elbert'],
         "Roll_Num" : [111,112,113,114,115],
         "Marks" : [95,86,77,89,97]
}
df = pd.DataFrame(dict)        #DataFrame will help us to create our data dictionary into row's & column's just like excel sheet for faster indexing 
df                                            #print the dataframe

output :



code 2 :

if you want to export this DataFrame into excel sheet :
syntax : df.to_csv("_path_\\file_name.csv") 
example : df.to_csv("D:\\Dig\\stud_data.csv")

code 3 :

Lets count all numerical columns & find its with single command. 

  • count
  • mean
  • std(standard deviation)
  • min(minimum value)
  • 25%
  • 50%
  • 75%
  • max(maximum value)


code : df.describe()
output :

code 4 :

What if you want to drop index from the .csv file you had created.

code : df.to_csv("stud_data.csv",index=False)

suppose we create this file

import numpy as np
import pandas as pd
dict1 = {
         "Name" : ['Ritik','hardik','Roy','Jay','Krish'],
         "Roll_Num" : [111,119,113,114,115],
         "Marks" : [95,86,1,2,3]
}
df1=pd.DataFrame(dict1)        #helps to create Frame for our data wich includes index number & colums

df1

This is the output we will get :



Now we apply code for removing index :

df1.to_csv("D:\\Dig\\h_v.csv") #first we convert it into .csv
x1=df1.to_csv("D:\\Dig\\x_h.csv",index=False)  #it will remove index column check you .csv file there will be no column of index

Now check your x_h.csv :




The index number which were present are now gone.

code 5 :

Let's say you have a data set which is present in your machine but its of format .xlsx now you have to use that data & perform certain operations. But the problem is .xlsx folder is now much used in pandas that's why we have to first convert it into .csv(Comma delimited) format & then use it which will inturn make our job more easy.

Steps to convert .xlsx file into .csv :

Go to your .xlsx file > Click on Save As > Save as type .csv(Comma delimited)

congratulations it done now you can use it as you want.

Now you converted that file into .csv now suppose you want to read your data set which is quite big & you only want to see certain number of initial values & certain number of ending values

To read .csv file from folder :

import numpy as np
import pandas as pd
df=pd.read_csv(r"D:\Dig\car_pricing.csv")
df 

output :



Now suppose you want initial N number of values :

code :

df.head(5)      # N = 5(you can take any value)

output :



Now suppose you want last N number of values :

code :

df.tail(5)   # N = 5(you can take any value)

output :




code 6 :

What if you want to call a specific column & with specific index value.

code : (Specific column)

syntax : df['column_name']
example : df['Engine_HP']

code : (Specific column with index number)

syntax : df['column_name'][N]   #N = index number you want
example : df['Engine_HP'][15]

code 7 :

Lets say you want to change any particular value from any column's index.

code :

syntax : df['column_name'][N] = x  #N = index number you want & x is the value you want to replace
example : df['Engine_HP'][2]=222.5

output :

before:  

after :


No comments:

No Spamming and No Offensive Language

Powered by Blogger.