Pandas is a fast, powerful, flexible and easy-to-use open-source data analysis and manipulation tool,
built on top of the Python programming language.

Pandas Series

A Pandas Series is like a column in a table. It is a 1-D array holding data of any type.

Datasets : Pandas Series (kaggle.com)

Importing necessary libraries

# Importing necessary libraries
import numpy as np
import pandas as pd

Explanation: In this section, essential libraries, NumPy and Pandas, are imported for numerical and data manipulation tasks.


Creating Series from Lists

# Series of string data
country = ['India', 'Pakistan', 'USA', 'Nepal', 'Sri Lanka']
pd.Series(country)

# Series of integer data
runs = [13, 24, 56, 78, 100]
runs_ser = pd.Series(runs)

# Series with custom index
marks = [67, 57, 89, 100]
subjects = ['maths', 'english', 'science', 'Nepali']
pd.Series(marks, index=subjects)

# Series with a specified name
marks = pd.Series(marks, index=subjects, name='Nabin ko marks')
marks

Explanation: This section demonstrates creating Pandas Series from lists, showcasing different scenarios such as string data, integer data, custom indexing, and specifying a name for the series.


Creating Series from a Dictionary

marks_dict = {'maths': 67, 'english': 57, 'science': 89, 'nepali': 100}
marks_series = pd.Series(marks_dict, name='nabin ko marks')
marks_series

Explanation: A Pandas Series is created from a dictionary, where keys become the index, and values become the data in the series.


Series Attributes

# Size of the series
marks_series.size

# Data type of the series
marks_series.dtype

# Name of the series
marks_series.name

# Check if values are unique
marks_series.is_unique

# Check uniqueness in another series
pd.Series([1, 1, 2, 3, 4, 5]).is_unique

# Index of the series
marks_series.index

# Values of the series
marks_series.values

Explanation: Various attributes of a Pandas Series, such as size, data type, name, uniqueness, index, and values, are explored.


Creating Series using read_csv

# Reading a CSV file with one column
subs = pd.read_csv('/content/subs.csv', squeeze=True)
subs

# Reading a CSV file with two columns and setting 'match_no' as the index
vk = pd.read_csv('/content/kohli_ipl.csv', index_col='match_no', squeeze=True)
vk

# Reading a CSV file with 'movie' as the index
movies = pd.read_csv('/content/bollywood.csv', index_col='movie', squeeze=True)
movies

Explanation: This section demonstrates reading data from CSV files into Pandas Series using the read_csv function.


Series Methods

# Displaying the first few rows
subs.head()

# Displaying the first three rows
vk.head(3)

# Displaying the last ten rows
vk.tail(10)

# Displaying a random sample of five rows
movies.sample(5)

# Counting occurrences of each value (movies)
movies.value_counts()

# Sorting values in descending order and getting the highest value
vk.sort_values(ascending=False).head(1).values[0]

# Sorting values in descending order
vk.sort_values(ascending=False)

# Sorting index in descending order (movies)
movies.sort_index(ascending=False, inplace=True)
movies

# Sorting values in ascending order (vk)
vk.sort_values(inplace=True)
vk

Explanation: This part covers various methods applied to Pandas Series, including displaying rows, counting occurrences, sorting, and sampling.


Series Math Methods

# Counting non-null values (vk)
vk.count()

# Calculating the sum (subs)
subs.sum()

# Calculating mean, median, mode, standard deviation, and variance (subs, vk, movies)
subs.mean()
print(vk.median())
print(movies.mode())
print(subs.std())
print(vk.var())

# Finding the maximum value (subs)
subs.max()

# Displaying descriptive statistics (subs)
subs.describe()

Explanation: This section illustrates mathematical operations that can be performed on Pandas Series, such as counting, summing, calculating mean, median, mode, standard deviation, variance, maximum value, and descriptive statistics.


Series Indexing

# Integer indexing (x)
x = pd.Series([12, 13, 14, 35, 46, 57, 58, 79, 9])
x

# Negative indexing (last element)
x[-1]

# Negative indexing with series (movies, vk, marks_series)
movies
vk[-1]
marks_series[-1]

# Slicing (vk, movies)
vk[5:16]
vk[-5:]
movies[::2]

# Fancy indexing (vk)
vk[[1, 3, 4, 5]]

# Indexing with labels (movies)
movies['2 States (2014 film)']

Explanation: This part covers different indexing techniques on Pandas Series, including integer indexing, negative indexing, slicing, fancy indexing, and indexing with labels.


Editing Series

# Modifying values using indexing (marks_series)
marks_series[1] = 100
marks_series

# Adding a new index and value (marks_series)
marks_series['evs'] = 100
marks_series

# Slicing and updating values (runs_ser)
runs_ser[2:4] = [100, 100]
runs_ser

# Fancy indexing and updating values (runs_ser)
runs_ser[[0, 3, 4]] = [0, 0, 0]
runs_ser

# Updating values using index label (movies)
movies['2 States (2014 film)'] = 'Alia Bhatt'
movies

Explanation: This section demonstrates methods to modify values in Pandas Series using indexing, slicing, and fancy indexing.


Series with Python Functionalities

# Various functions and methods (subs)
print(len(subs))
print(type(subs))
print(dir(subs))
print(sorted(subs))
print(min(subs))
print(max(subs))

# Type conversion (marks_series)
list(marks_series)
dict(marks_series)

# Membership operators (movies)
'2 States (2014 film)' in movies
'Alia Bhatt' in movies.values
movies

# Looping through index (movies)
for i in movies.index:
    print(i)

# Arithmetic Operators (Broadcasting) - Adding 100 to each value in marks_series
100 + marks_series

# Relational Operators (vk)
vk >= 50

Explanation: This part covers various Python functionalities applied to Pandas Series, including functions, type conversion, membership operators, looping, and arithmetic operations (broadcasting and relational operators).


Boolean Indexing on Series

# Counting the number of 50's and 100's scored by Kohli (vk)
vk[vk >= 50].size

# Counting the number of ducks scored by Kohli (vk)
vk[vk == 0].size

# Counting the number of days with more than 200 subs (subs)
subs[subs > 200].size

# Finding actors who have done more than 20 movies (movies)
num_movies = movies.value_counts()
num_movies[num_movies > 20]

Explanation: This section covers boolean indexing on Pandas Series, including counting occurrences based on specific conditions and filtering data.


Plotting Graphs on Series

# Plotting a line graph (subs)
subs.plot()

# Plotting a pie chart of top 20 movie counts (movies)
movies.value_counts().head(20).plot(kind='pie')

Explanation: This part demonstrates plotting graphs directly from Pandas Series, including a line graph and a pie chart.


Some Important Series Methods

# astype - Changing data type of vk (to int16)
import sys
sys.getsizeof(vk)
sys.getsizeof(vk.astype('int16'))

# between - Counting values between 51 and 99 (vk)
vk[vk.between(51, 99)].size

# clip - Clipping values in subs between 100 and 200
subs.clip(100, 200)

# drop_duplicates - Dropping duplicates, keeping the last occurrence (temp)
temp = pd.Series([1, 1, 2, 2, 3, 3, 4, 4])
temp.drop_duplicates(keep='last')

# Checking duplicated values (temp, vk)
temp.duplicated().sum()
vk.duplicated().sum()

# Dropping duplicates in movies
movies.drop_duplicates()

# Handling missing values in temp (temp)
temp = pd.Series([1, 2, 3, np.nan, 5, 6, np.nan, 8, np.nan, 10])
temp

# Counting non-null values and total size (temp)
temp.size
temp.count()

# Checking for null values in temp
temp.isnull().sum()

# Dropping null values in temp
temp.dropna()

# Filling null values with mean in temp
temp.fillna(temp.mean())

# Filtering values in vk with 49 or 99
vk[(vk == 49) | (vk == 99)]

# Filtering values in vk that are either 49 or 99
vk[vk.isin([49, 99])]

# Applying a function to capitalize the first word in each movie (movies)
movies.apply(lambda x: x.split()[0].upper())

# Applying a function to categorize days as 'good day' or 'bad day' based on subs mean (subs)
subs.apply(lambda x: 'good day' if x > subs.mean() else 'bad day')

# Copying a subset of vk (new) and modifying the copy
new = vk.head()
new[1] = 1

# Copying a subset of vk (new) and modifying the original remains unchanged (vk)
new = vk.head().copy()
new[1] = 100
new
vk

Explanation: This section covers important methods for Pandas Series, including data type conversion, value clipping, handling duplicates and missing values, filtering, applying functions, and copying subsets with modifications.


Series Methods

Overview:

Series methods are functions that can be applied to Pandas Series to perform various operations and obtain useful information about the data.

Examples:

# Displaying the first few rows
subs.head()

# Counting occurrences of each value (movies)
movies.value_counts()

# Sorting values in descending order and getting the highest value
vk.sort_values(ascending=False).head(1).values[0]

Explanation:

  • head(): Displays the first few rows of the Series, providing a quick look at the data.
  • value_counts(): Counts the occurrences of each unique value in the Series, useful for understanding the distribution of values.
  • sort_values(): Sorts the values in the Series, and in the example, it is used to find the highest value when combined with head().

Series Math Methods

Overview:

Series math methods involve mathematical operations that can be performed on a Series to calculate statistical measures or transform the data.

Examples:

# Counting non-null values (vk)
vk.count()

# Calculating sum (subs)
subs.sum()

# Calculating mean, median, mode, standard deviation, and variance (subs, vk, movies)
subs.mean()
print(vk.median())
print(movies.mode())
print(subs.std())
print(vk.var())

Explanation:

  • count(): Returns the number of non-null values in the Series.
  • sum(): Calculates the sum of all values in the Series.
  • mean(), median(), mode(), std(), var(): These functions calculate the mean, median, mode, standard deviation, and variance, respectively.

Series with Python Functionalities

Overview:

Pandas Series can leverage various Python functionalities, such as conversion, membership checks, looping, and arithmetic operations.

Examples:

# Various functions and methods (subs)
print(len(subs))
print(type(subs))
print(dir(subs))
print(sorted(subs))
print(min(subs))
print(max(subs))

# Type conversion (marks_series)
list(marks_series)
dict(marks_series)

# Membership operators (movies)
'2 States (2014 film)' in movies
'Alia Bhatt' in movies.values

Explanation:

  • len(): Returns the length of the Series.
  • type(): Returns the type of the Series.
  • dir(): Lists all attributes and methods available for the Series.
  • sorted(): Returns a sorted list of values in the Series.
  • min() and max(): Find the minimum and maximum values in the Series, respectively.

Boolean Indexing on Series

Overview:

Boolean indexing involves using logical conditions to filter values in a Series based on specified criteria.

Examples:

# Counting the number of 50's and 100's scored by Kohli (vk)
vk[vk >= 50].size

# Counting the number of ducks scored by Kohli (vk)
vk[vk == 0].size

# Counting the number of days with more than 200 subs (subs)
subs[subs > 200].size

Explanation:

  • Boolean conditions (vk >= 50, vk == 0, subs > 200) create masks that filter values satisfying the conditions.
  • The resulting Series contains only the values that meet the specified criteria.

Plotting Graphs on Series

Overview:

Pandas provides built-in functions for plotting graphs directly from Series data, offering a convenient way to visualize the data.

Examples:

# Plotting a line graph (subs)
subs.plot()

# Plotting a pie chart of top 20 movie counts (movies)
movies.value_counts().head(20).plot(kind='pie')

Explanation:

  • plot(): Plots a line graph for numerical data (e.g., time series).
  • value_counts().plot(kind='pie'): Plots a pie chart for categorical data, visualizing the distribution of values.

These topics cover various aspects of working with Pandas Series, from basic methods to more advanced operations like boolean indexing and plotting.

One Reply to “What is Pandas”

  1. Thank you for your sharing. I am worried that I lack creative ideas. It is your article that makes me full of hope. Thank you. But, I have a question, can you help me?

Leave a Reply

Your email address will not be published. Required fields are marked *