InfinityCodeX

Unlock the power of Python, Data Science, Machine Learning, and Deep Learning with our comprehensive guides! Whether you're a beginner eager to dive into AI or an experienced professional looking to sharpen your skills, our blog offers easy-to-follow tutorials, insightful tips, and expert advice to fuel your AI career. Dive in today and start mastering the cutting-edge technologies shaping the future!
Data Structures in Pandas

Basically the data structure of pandas is divided into 3 parts.

(i)   Series
(ii)  DataFrame
(iii) Panel Data

Let's starts to elaborate each type of Data Structure one by one.

(i) Series :

It's a one Dimensional array with indexes, it stores a single column or row of data in Dataframe.

It is always used for 1-D array. If we pass 2-D array then it will be changed to 1-D array.

Lets dive into examples :

code :
import numpy as np
import pandas as pd
a=np.array([[1,2,3,4]])
ans=pd.Series(a)
print(ans)

output :

0    1
1    2
2    3
3    4
dtype: int32

code :
import numpy as np
import pandas as pd
a=np.array([[1,2,3],[4,5,6],[7,8,9]])
ans=pd.Series(a)
print(ans)


output :

code :
How to change index name?
import numpy as np
import pandas as pd
a=np.array([1,2,3])
ans=pd.Series(a,index=['a','b','c'])
print(ans)

output
a    1
b    2
c    3
dtype: int32

code :
import numpy as np
import pandas as pd
a=[1,2,3,4,5,6,7,8,9]
ans=pd.Series(a,index=['a','b','c','d','e','f','g','h','i'])
print(ans)

output :

All Values :

a    1
b    2
c    3
d    4
e    5
f    6
g    7
h    8
i    9
dtype: int64
Values from a to f :
a    1
b    2
c    3
d    4
e    5
f    6
dtype: int64

Noticed something?...In string based indexing last value is also included. Which means from "a" to "f"  the value "f" is also included.

code :
import numpy as np
import pandas as pd
ser=pd.Series(np.random.rand(5)) #gives any random values
ser

output :

0    0.050999
1    0.573335
2    0.939892
3    0.743563
4    0.882146
dtype: float64



(ii) DataFrame :


It's a tabular spreadsheet like structure representing rows each of which contains one or multiple columns.

Some of the features this include are :

·        Different Column Types
·        Mutable Size
·        Labeled Axes
·        Arithmetic Operation on rows & columns

code :

import numpy as np
import pandas as pd
a=np.array([[1,2,3,4],[5,6,7]])
ans=pd.Series(a)
print(ans)

output :


    0  1  2    3
0  1  2  3  4.0
1  5  6  7  NaN                        #when we does not specify value it shows NaN


code :

*Note : The size of Both of the dictionary should be same*

data={"Name":['Ray','Joy','Kane'],"Marks":[87,95,88]}
ans=pd.DataFrame(data)
print(ans)

output :

   Name  Marks
0   Ray     87
1   Joy     95
2  Kane     88

random.rand in DataFrame :

Syntax:
import numpy as np
import pandas as pd
x=pd.DataFrame(np.random.rand(num_of_row,num_of_col),index=np.arange(index_range))
print(x)

code :
import numpy as np
import pandas as pd
x=pd.DataFrame(np.random.rand(3,4),index=np.arange(3))
print(x)

output :



Convert into numpy :

code :

import pandas as pd
x=pd.DataFrame(np.random.rand(3,4), index=np.arange(3))
print("Original form :\n",x)
print("\n")
print("After Conversion into numpy :\n\n",x.to_numpy())

output :

Original form :
           0         1         2         3
0  0.246232  0.883252  0.234517  0.215390
1  0.964039  0.443224  0.511315  0.620338
2  0.052850  0.572867  0.202916  0.838877


After Conversion into numpy :

 [[0.24623215 0.88325169 0.2345175  0.21539004]
 [0.96403939 0.4432239  0.51131478 0.62033791]
 [0.05285017 0.5728666  0.20291621 0.83887688]]



Change rows & column position :

code :

import pandas as pd
x=pd.DataFrame(np.random.rand(3,4), index=np.arange(3))
print("Original :\n",x)
print("\n")
print("After changing the position of rows & columns :\n",x.T)

output :

Original :
           0         1         2         3
0  0.173452  0.558877  0.756212  0.984668
1  0.467775  0.387533  0.855786  0.104103
2  0.138967  0.629818  0.549208  0.785425


After changing the position of rows & columns :
           0         1         2
0  0.173452  0.467775  0.138967
1  0.558877  0.387533  0.629818
2  0.756212  0.855786  0.549208
3  0.984668  0.104103  0.785425


As we re understanding DataFrames let us understand a very basic topic.

Merge , Join & Concatenate :



1.) Inner Join :

Joining 2 tables based on the common columns & the common columns will be absent in the resultant value.(matching data in both of the table will be displayed)

2.) Left Join :

All the data from the left irrespective it is matching or not will be displayed.


3.) Right Join :

All the data from the right irrespective it is matching or not will be displayed.


4.) Outer Join :

All the data will be displayed


5.) Merge :

Merging 2 datasets is the process of bringing 2 datasets together into 1, & aligning the rows for each based on common attributes or columns.


6.) Concat :

Concat will work on the principle of Merge but not exactly work like that.

*Example for all the things mentioned above :

code :
import numpy as np
import pandas as pd
df1=pd.DataFrame({"A":[1,2,3],"B":[2,3,4]})
print("First Data set:\n")
print(df1)


output :

First Data set:

   A  B
0  1  2
1  2  3
2  3  4
code :
import numpy as np
import pandas as pd
df2=pd.DataFrame({"A":[3,4,5],"B":[1,9,10]})
print("Second Data set:\n")
print(df2)

output :

Second Data set:

   A   C
0  3   1
1  4   9
2  5  10
code :
print(pd.concat([df1,df2]))

output :

Concatination is : 
    A    B     C
0  1  2.0   NaN
1  2  3.0   NaN
2  3  4.0   NaN
0  3  NaN   1.0
1  4  NaN   9.0
2  5  NaN  10.0
code :
pd.concat([df1,df2],axis=1,join="inner")
output :
code :
pd.merge(df1,df2,on=["A"],how="inner")
output :
code :
pd.merge(df1,df2,on=["A"],how="right")
output :
code :
pd.merge(df1,df2,how="left")
output :
code :
pd.merge(df1,df2,how="outer")
output :
There are many thing which we can do with Pandas & Numpy i'll cover it while creating projects related to data science & machine learning for your better understanding. But this were the basics which is mandatory for you guys to understand that's why i covered this much only. Now guys try to code some stuff your self too. Which will help you to be a great Data scientist.

No comments:

No Spamming and No Offensive Language

Powered by Blogger.