Pandas Data Structure | (Data Science ss : 8.1)

Data Structures in Pandas

Basically the data structure of pandas is divided into 3 parts.

(i) Series

(ii) DataFrame

(iii) Panel Data

Let's starts to elaborate each type of Data Structure one by one.

(i) Series :

It's a one Dimensional array with indexes, it stores a single column or row of data in Dataframe.

It is always used for 1-D array. If we pass 2-D array then it will be changed to 1-D array.

Lets dive into examples :

code :

import numpy as np

import pandas as pd

a=np.array([[1,2,3,4]])

ans=pd.Series(a)

print(ans)

output :

0 1

1 2

2 3

3 4

dtype: int32

code :

import numpy as np

import pandas as pd

a=np.array([[1,2,3],[4,5,6],[7,8,9]])

ans=pd.Series(a)

print(ans)

output :

code :

How to change index name?

import numpy as np

import pandas as pd

a=np.array([1,2,3])

ans=pd.Series(a,index=['a','b','c'])

print(ans)

output

a 1

b 2

c 3

dtype: int32

code :

import numpy as np

import pandas as pd

a=[1,2,3,4,5,6,7,8,9]

ans=pd.Series(a,index=['a','b','c','d','e','f','g','h','i'])

print(ans)

output :

All Values :

a 1

b 2

c 3

d 4

e 5

f 6

g 7

h 8

i 9

dtype: int64

Values from a to f :

a 1

b 2

c 3

d 4

e 5

f 6

dtype: int64

Noticed something?...In string based indexing last value is also included. Which means from "a" to "f" the value "f" is also included.

code :

import numpy as np

import pandas as pd

ser=pd.Series(np.random.rand(5)) #gives any random values

ser

output :

0 0.050999

1 0.573335

2 0.939892

3 0.743563

4 0.882146

dtype: float64

(ii) DataFrame :

It's a tabular spreadsheet like structure representing rows each of which contains one or multiple columns.

Some of the features this include are :

· Different Column Types

· Mutable Size

· Labeled Axes

· Arithmetic Operation on rows & columns

code :

import numpy as np

import pandas as pd

a=np.array([[1,2,3,4],[5,6,7]])

ans=pd.Series(a)

print(ans)

output :

0 1 2 3

0 1 2 3 4.0

1 5 6 7 NaN #when we does not specify value it shows NaN

code :

*Note : The size of Both of the dictionary should be same*

data={"Name":['Ray','Joy','Kane'],"Marks":[87,95,88]}

ans=pd.DataFrame(data)

print(ans)

output :

Name Marks

0 Ray 87

1 Joy 95

2 Kane 88

random.rand in DataFrame :

Syntax:

import numpy as np

import pandas as pd

x=pd.DataFrame(np.random.rand(num_of_row,num_of_col),index=np.arange(index_range))

print(x)

code :

import numpy as np

import pandas as pd

x=pd.DataFrame(np.random.rand(3,4),index=np.arange(3))

print(x)

output :

Convert into numpy :

code :

import pandas as pd

x=pd.DataFrame(np.random.rand(3,4), index=np.arange(3))

print("Original form :\n",x)

print("\n")

print("After Conversion into numpy :\n\n",x.to_numpy())

output :

Original form :

0 1 2 3

0 0.246232 0.883252 0.234517 0.215390

1 0.964039 0.443224 0.511315 0.620338

2 0.052850 0.572867 0.202916 0.838877

After Conversion into numpy :

[[0.24623215 0.88325169 0.2345175 0.21539004]

[0.96403939 0.4432239 0.51131478 0.62033791]

[0.05285017 0.5728666 0.20291621 0.83887688]]

Change rows & column position :

code :

import pandas as pd

x=pd.DataFrame(np.random.rand(3,4), index=np.arange(3))

print("Original :\n",x)

print("\n")

print("After changing the position of rows & columns :\n",x.T)

output :

Original :

0 1 2 3

0 0.173452 0.558877 0.756212 0.984668

1 0.467775 0.387533 0.855786 0.104103

2 0.138967 0.629818 0.549208 0.785425

After changing the position of rows & columns :

0 1 2

0 0.173452 0.467775 0.138967

1 0.558877 0.387533 0.629818

2 0.756212 0.855786 0.549208

3 0.984668 0.104103 0.785425

As we re understanding DataFrames let us understand a very basic topic.

Merge , Join & Concatenate :

1.) Inner Join :

Joining 2 tables based on the common columns & the common columns will be absent in the resultant value.(matching data in both of the table will be displayed)

2.) Left Join :

All the data from the left irrespective it is matching or not will be displayed.

3.) Right Join :

All the data from the right irrespective it is matching or not will be displayed.

4.) Outer Join :

All the data will be displayed

5.) Merge :

Merging 2 datasets is the process of bringing 2 datasets together into 1, & aligning the rows for each based on common attributes or columns.

6.) Concat :

Concat will work on the principle of Merge but not exactly work like that.

*Example for all the things mentioned above :

code :

import numpy as np

import pandas as pd

df1=pd.DataFrame({"A":[1,2,3],"B":[2,3,4]})

print("First Data set:\n")
print(df1)

output :

First Data set:

   A  B
0  1  2
1  2  3
2  3  4
code :

import numpy as np

import pandas as pd

df2=pd.DataFrame({"A":[3,4,5],"B":[1,9,10]})
print("Second Data set:\n")
print(df2)

output :

Second Data set:

   A   C
0  3   1
1  4   9
2  5  10
code :
print(pd.concat([df1,df2]))

output :

Concatination is : 
    A    B     C
0  1  2.0   NaN
1  2  3.0   NaN
2  3  4.0   NaN
0  3  NaN   1.0
1  4  NaN   9.0
2  5  NaN  10.0


code :
pd.concat([df1,df2],axis=1,join="inner")
output :


code :
pd.merge(df1,df2,on=["A"],how="inner")
output :


code :
pd.merge(df1,df2,on=["A"],how="right")
output :


code :
pd.merge(df1,df2,how="left")
output :


code :
pd.merge(df1,df2,how="outer")
output :








There are many thing which we can do with Pandas & Numpy i'll cover it while creating projects related to data science & machine learning for your better understanding. But this were the basics which is mandatory for you guys to understand that's why i covered this much only. Now guys try to code some stuff your self too. Which will help you to be a great Data scientist.

InfinityCodeX

You May Also Like

No comments:

Subscribe

Categories

Blog Archive

Recent Posts

Pages

Random Posts

Tags

Popular Posts