Data Structures in
Pandas
Basically the data structure of pandas is divided into 3 parts.
(i) Series
(ii) DataFrame
(iii) Panel Data
Let's starts to elaborate each type of Data Structure one by one.
(i) Series :
It's a one Dimensional array with indexes, it stores a single column or
row of data in Dataframe.
It is always used for 1-D array. If we pass 2-D array then it will be
changed to 1-D array.
Lets dive into examples :
code :
import numpy as np
import pandas as pd
a=np.array([[1,2,3,4]])
ans=pd.Series(a)
print(ans)
output :
0 1
1 2
2 3
3 4
dtype: int32
code :
import numpy as np
import pandas as pd
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
ans=pd.Series(a)
print(ans)
How to change index
name?
import numpy as np
import pandas as pd
a=np.array([1,2,3])
ans=pd.Series(a,index=['a','b','c'])
print(ans)
output
a 1
b 2
c 3
dtype: int32
code :
code :
import numpy as np
import pandas as pd
a=[1,2,3,4,5,6,7,8,9]
ans=pd.Series(a,index=['a','b','c','d','e','f','g','h','i'])
print(ans)
output :
All Values :
a 1
b 2
c 3
d 4
e 5
f 6
g 7
h 8
i 9
dtype: int64
Values from a to f :
a 1
b 2
c 3
d 4
e 5
f 6
dtype: int64
Noticed something?...In string based indexing last value is also
included. Which means from "a" to "f" the value "f" is also included.
code :
import numpy as np
import pandas as pd
ser=pd.Series(np.random.rand(5)) #gives
any random values
ser
output :
0 0.050999
1 0.573335
2 0.939892
3 0.743563
4 0.882146
dtype: float64
(ii) DataFrame :
It's a tabular spreadsheet like structure representing rows each of
which contains one or multiple columns.
Some of the features this include are :
·
Different Column Types
·
Mutable Size
·
Labeled Axes
·
Arithmetic Operation on rows &
columns
code :
import numpy as np
import pandas as pd
a=np.array([[1,2,3,4],[5,6,7]])
ans=pd.Series(a)
print(ans)
output :
0 1
2 3
0 1 2
3 4.0
1 5 6
7 NaN #when we does not specify value it shows NaN
code :
*Note : The size of Both of the dictionary should be same*
data={"Name":['Ray','Joy','Kane'],"Marks":[87,95,88]}
ans=pd.DataFrame(data)
print(ans)
output :
Name Marks
0 Ray 87
1 Joy 95
2 Kane 88
random.rand in DataFrame :
Syntax:
import numpy as np
import pandas as pd
x=pd.DataFrame(np.random.rand(num_of_row,num_of_col),index=np.arange(index_range))
print(x)
code :
import numpy as np
import pandas as pd
x=pd.DataFrame(np.random.rand(3,4),index=np.arange(3))
print(x)
output :
Convert into numpy :
code :
import pandas as pd
x=pd.DataFrame(np.random.rand(3,4), index=np.arange(3))
print("Original form :\n",x)
print("\n")
print("After Conversion into numpy :\n\n",x.to_numpy())
output :
Original form :
0 1 2 3
0
0.246232 0.883252 0.234517
0.215390
1
0.964039 0.443224 0.511315
0.620338
2
0.052850 0.572867 0.202916
0.838877
After Conversion into numpy :
[[0.24623215 0.88325169 0.2345175 0.21539004]
[0.96403939 0.4432239 0.51131478 0.62033791]
[0.05285017 0.5728666 0.20291621 0.83887688]]
Change rows & column position :
code :
import pandas as pd
x=pd.DataFrame(np.random.rand(3,4), index=np.arange(3))
print("Original :\n",x)
print("\n")
print("After changing the position of rows & columns
:\n",x.T)
output :
Original :
0 1 2 3
0
0.173452 0.558877 0.756212
0.984668
1
0.467775 0.387533 0.855786
0.104103
2
0.138967 0.629818 0.549208
0.785425
After changing the position of rows
& columns :
0 1 2
0
0.173452 0.467775 0.138967
1
0.558877 0.387533 0.629818
2
0.756212 0.855786 0.549208
3
0.984668 0.104103 0.785425
As we re understanding DataFrames let us understand a very basic topic.
Merge , Join & Concatenate :
1.) Inner Join :
Joining 2 tables based on the common columns & the common columns
will be absent in the resultant value.(matching
data in both of the table will be displayed)
2.) Left Join :
All the data from the left irrespective it is matching or not will be displayed.
3.) Right Join :
All the data from the right irrespective it is matching or not will be displayed.
4.) Outer Join :
All the data will be displayed
5.) Merge :
Merging 2 datasets is the process of bringing 2 datasets together into 1, & aligning the rows for each based on common attributes or columns.
6.) Concat :
Concat will work on the principle of Merge but not exactly work like that.
*Example for all the things mentioned above :
code :
*Example for all the things mentioned above :
code :
import numpy as np
import pandas as pd
df1=pd.DataFrame({"A":[1,2,3],"B":[2,3,4]})
print("First Data set:\n")
print(df1)
output :
print(df1)
output :
First Data set: A B 0 1 2 1 2 3 2 3 4
code :
import numpy as np
import pandas as pd
df2=pd.DataFrame({"A":[3,4,5],"B":[1,9,10]})
print("Second Data set:\n")
print(df2)
output :
print("Second Data set:\n")
print(df2)
output :
Second Data set: A C 0 3 1 1 4 9 2 5 10
code :
print(pd.concat([df1,df2])) output : Concatination is : A B C 0 1 2.0 NaN 1 2 3.0 NaN 2 3 4.0 NaN 0 3 NaN 1.0 1 4 NaN 9.0 2 5 NaN 10.0
code :
pd.concat([df1,df2],axis=1,join="inner")
output :
code :
pd.merge(df1,df2,on=["A"],how="inner")
output :
code :
pd.merge(df1,df2,on=["A"],how="right")
output :
code :
pd.merge(df1,df2,how="left")
output :
code :
pd.merge(df1,df2,how="outer")
output :
There are many thing which we can do with Pandas & Numpy i'll cover it while creating projects related to data science & machine learning for your better understanding. But this were the basics which is mandatory for you guys to understand that's why i covered this much only. Now guys try to code some stuff your self too. Which will help you to be a great Data scientist.
No comments:
No Spamming and No Offensive Language