In Pandas, a MultiIndex is a way of creating hierarchical indices for pandas objects like Series and DataFrames. It allows you to have multiple levels of indices for rows and/or columns, providing a more sophisticated way to organize and represent your data.

1. Creating MultiIndex Objects:

  • From Tuples: index_val = [('cse', 2019), ('cse', 2020), ('cse', 2021), ('cse', 2022), ('ece', 2019), ('ece', 2020), ('ece', 2021), ('ece', 2022)] multiindex = pd.MultiIndex.from_tuples(index_val)
  • From Product:
    python multiindex = pd.MultiIndex.from_product([['cse', 'ece'], [2019, 2020, 2021, 2022]])

2. Understanding MultiIndex Levels:

  • After creating a MultiIndex, you can access its levels using the levels attribute.
    python multiindex.levels[1]

3. Creating MultiIndex Series:

  • You can use a MultiIndex when creating a Series.
    python s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=multiindex)

4. Accessing MultiIndex Elements:

  • You can access elements using the hierarchical index.
    python s['cse'] # Accessing all rows with the outer index 'cse'

5. Unstack and Stack:

  • Unstack: temp = s.unstack() # Converts a MultiIndex Series to a DataFrame
  • Stack:
    python temp.stack() # Converts a DataFrame to a MultiIndex Series

6. Creating MultiIndex DataFrames:

  • MultiIndex can also be applied to DataFrames.
    python branch_df1 = pd.DataFrame([[1, 2], [3, 4], ...], index=multiindex, columns=['avg_package', 'students'])

7. Using MultiIndex in DataFrames:

  • You can use MultiIndex for both rows and columns in a DataFrame.
    python branch_df2 = pd.DataFrame([[1, 2, 0, 0], ...], index=[2019, 2020, 2021, 2022], columns=pd.MultiIndex.from_product([['delhi', 'mumbai'], ['avg_package', 'students']]))

8. Stacking and Unstacking DataFrames:

  • You can use stack and unstack to reshape MultiIndex DataFrames.
    python branch_df3.stack().stack() # Example of stacking

9. Working with MultiIndex DataFrames:

  • Various operations such as head, shape, info, duplicated, and isnull can be performed.
    python branch_df3.head() branch_df3.shape branch_df3.info() branch_df3.duplicated() branch_df3.isnull()

10. Extracting Data from MultiIndex DataFrames:

  • You can extract rows, columns, or both using loc and iloc.
    python branch_df3.loc[('cse', 2022)] # Extracting a single row branch_df3.iloc[:, 1:3] # Extracting columns using iloc

11. Sorting MultiIndex DataFrames:

  • Sorting can be done based on index levels.
    python branch_df3.sort_index(ascending=False)

12. Transposing MultiIndex DataFrames:

  • You can transpose a MultiIndex DataFrame using transpose.
    python branch_df3.transpose()

13. Swaplevel in MultiIndex DataFrames:

  • You can swap levels of indices in MultiIndex DataFrames.
    python branch_df3.swaplevel(axis=1)

In summary, MultiIndex objects in Pandas provide a powerful way to work with hierarchical indices, allowing for more advanced and structured data representation. They are particularly useful when dealing with multi-dimensional data or datasets with nested categories.

Datasets : multi-index object pandas (kaggle.com)

1. MultiIndex Series and DataFrames

# Series is 1D and DataFrames are 2D objects
# But why?
# And what exactly is an index?

# Can we have multiple indices? Let's try
index_val = [('cse', 2019), ('cse', 2020), ('cse', 2021), ('cse', 2022), ('ece', 2019), ('ece', 2020), ('ece', 2021), ('ece', 2022)]
a = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=index_val)
a

# The problem?
a['cse']  # This will raise an error

# The solution -> MultiIndex Series (also known as Hierarchical Indexing)
# Multiple index levels within a single index

# How to create a MultiIndex object
# 1. Using pd.MultiIndex.from_tuples()
index_val = [('cse', 2019), ('cse', 2020), ('cse', 2021), ('cse', 2022), ('ece', 2019), ('ece', 2020), ('ece', 2021), ('ece', 2022)]
multiindex = pd.MultiIndex.from_tuples(index_val)
multiindex.levels[1]

# 2. Using pd.MultiIndex.from_product()
pd.MultiIndex.from_product([['cse', 'ece'], [2019, 2020, 2021, 2022]])

# Level inside MultiIndex object

# Creating a series with a MultiIndex object
s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index=multiindex)
s

# How to fetch items from such a series
s['cse']

# Unstack
temp = s.unstack()
temp

# Stack
temp.stack()

Theory:

  • This section introduces the concept of MultiIndex in Pandas, allowing for multiple levels of indices in a single index object.
  • It demonstrates how to create MultiIndex objects using tuples or product of sets.
  • The code shows the creation of a MultiIndex Series and how to access elements using multiple levels of indices.
  • The unstack and stack operations are explained, which can be used to reshape the MultiIndex Series into a DataFrame and vice versa.

2. MultiIndex DataFrames

# MultiIndex DataFrame

branch_df1 = pd.DataFrame(
    [
        [1, 2],
        [3, 4],
        [5, 6],
        [7, 8],
        [9, 10],
        [11, 12],
        [13, 14],
        [15, 16],
    ],
    index=multiindex,
    columns=['avg_package', 'students']
)

branch_df1

branch_df1['students']

# Are columns really different from the index?

# MultiIndex DataFrame from the columns' perspective
branch_df2 = pd.DataFrame(
    [
        [1, 2, 0, 0],
        [3, 4, 0, 0],
        [5, 6, 0, 0],
        [7, 8, 0, 0],
    ],
    index=[2019, 2020, 2021, 2022],
    columns=pd.MultiIndex.from_product([['delhi', 'mumbai'], ['avg_package', 'students']])
)

branch_df2

branch_df2.loc[2019]

# MultiIndex DataFrame in terms of both columns and index

branch_df3 = pd.DataFrame(
    [
        [1, 2, 0, 0],
        [3, 4, 0, 0],
        [5, 6, 0, 0],
        [7, 8, 0, 0],
        [9, 10, 0, 0],
        [11, 12, 0, 0],
        [13, 14, 0, 0],
        [15, 16, 0, 0],
    ],
    index=multiindex,
    columns=pd.MultiIndex.from_product([['delhi', 'mumbai'], ['avg_package', 'students']])
)

branch_df3

Theory:

  • This section explains the creation of MultiIndex DataFrames using different approaches.
  • The concept of having MultiIndex both for rows and columns is demonstrated.
  • Operations like accessing columns, rows, and elements in MultiIndex DataFrames are shown.

3. Stacking and Unstacking MultiIndex DataFrames

# Stacking and Unstacking
branch_df3.stack().stack()

Theory:

  • The code demonstrates the use of stack to move the innermost column index level to become the innermost row index level, creating a higher-dimensional DataFrame.
  • By stacking multiple times, you can achieve different levels of reshaping.

4. Working with MultiIndex DataFrames

# Working with MultiIndex DataFrames

# Head and tail
branch_df3.head()

# Shape
branch_df3.shape

# Info
branch_df3.info()

# Duplicated -> isnull
branch_df3.duplicated()
branch_df3.isnull()

# Extracting rows single
branch_df3.loc[('cse', 2022)]

# Multiple
branch_df3.loc[('cse', 2019):('ece', 2020):2]

# Using iloc
branch_df3.iloc[0:5:2]

# Extracting columns
branch_df3['delhi']['students']

branch_df3.iloc[:, 1:3]

# Extracting both
branch_df3.iloc[[0, 4], [1, 2]]

# Sort index
# Both -> descending -> different order
# Based on one level
branch_df3.sort_index(ascending=False)
branch_df3.sort_index(ascending=[False, True])
branch_df3.sort_index(level=0, ascending=[False])

Theory:

  • This section covers various operations and methods on MultiIndex DataFrames, such as head, shape, info, duplicated, and isnull.
  • Different methods of extracting rows, columns, or both are explained using loc and iloc.
  • Sorting the MultiIndex DataFrame based on different levels is demonstrated.

5. Wide and Long Format, Melt

# Wide format is where we have a single row for every data point with multiple columns to hold the values of various attributes.
# Long format is where, for each data point, we have as many rows as the number of attributes, and each row contains the value of a particular attribute for a given data point.

# Melt -> simple example branch
# Wide to long
pd.DataFrame({'cse': [120]}).melt()

# Melt -> branch with year
pd.DataFrame({'cse': [120], 'ece': [100], 'mech': [50]}).melt(var_name='branch', value_name='num_students')

pd.DataFrame(
    {
        'branch': ['cse', 'ece', 'mech'],
        '2020': [100, 150, 60],
        '2021': [120, 130, 80],
        '2022': [150, 140, 70]
    }
).melt(id_vars=['branch'], var_name='year', value_name='students')

# Melt -> real-world example
death = pd.read_csv('/content/time_series_covid19_deaths_global.csv')
confirm = pd.read_csv('/content/time_series_covid19_confirmed_global.csv')

death.head()

confirm.head()

death = death.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], var_name='date', value_name='num_deaths')
confirm = confirm.melt(id_vars=['Province/State', 'Country/Region', 'Lat', 'Long'], var_name='date', value_name='num_cases')

death.head()

confirm.merge(death, on=['Province/State', 'Country/Region', 'Lat', 'Long', 'date'])[['Country/Region', 'date', 'num_cases', 'num_deaths']]

Theory:

  • The concept of wide and long formats is explained, where wide format represents data in a single row with multiple columns, and long format represents data in multiple rows for each attribute.
  • The melt function is introduced as a method to convert wide format to long format.
  • Examples of melting a simple DataFrame and real-world COVID-19 datasets are provided.

6. Pivot Table

# Pivot Table
# The pivot table takes simple column-wise data as input, and groups the entries into a two-dimensional table that provides a multidimensional summarization of the data.

df = sns.load_dataset('tips')
df.head()

df.groupby('sex')[['total_bill']].mean()

df.groupby(['sex', 'smoker'])[['total_bill']].mean().unstack()

df.pivot_table(index='sex', columns='smoker', values='total_bill')

# aggfunc
df.pivot_table(index='sex', columns='smoker', values='total_bill', aggfunc='std')

# all cols together
df.pivot_table(index='sex', columns='smoker')['size']

# multidimensional
df.pivot_table(index=['sex', 'smoker'], columns=['day', 'time'],
               aggfunc={'size': 'mean', 'tip': 'max', 'total_bill': 'sum'}, margins=True)

# margins
df.pivot_table(index='sex', columns='smoker', values='total_bill', aggfunc='sum', margins=True)

Theory:

  • This section introduces the concept of a pivot table, which takes simple column-wise data and groups entries into a two-dimensional table providing multidimensional summarization.
  • The pivot_table method is demonstrated with various options like index, columns, values, and aggregation functions.
  • The use of aggfunc, handling all columns together, and creating a multidimensional pivot table are explained.
  • The inclusion of margins to add total values is demonstrated.

7. Plotting Graphs with Pivot Table

# Plotting graphs
df = pd.read_csv('/content/expense_data.csv')

df.head()

df['Category'].value_counts()

df.info()

df['Date'] = pd.to_datetime(df['Date'])

df.info()

df['month'] = df['Date'].dt.month_name()

df.head()

df.pivot_table(index='month', columns='Category', values='INR', aggfunc='sum', fill_value=0).plot()

df.pivot_table(index='month', columns='Income/Expense', values='INR', aggfunc='sum', fill_value=0).plot()

df.pivot_table(index='month', columns='Account', values='INR', aggfunc='sum', fill_value=0).plot()

Theory:

  • This section uses a pivot table for analyzing expense data.
  • The data is loaded, and information about the dataset is explored.
  • The to_datetime function is used to convert the ‘Date’ column to datetime format, and a new ‘month’ column is created.
  • Pivot tables are then used to analyze and plot the data, providing insights into expenses based on different categories, income/expense types, and accounts.

10 Replies to “MultiIndex objects Pandas”

  1. [u][b] Привет, друзья![/b][/u]
    Заказать диплом о высшем образовании
    [b]Мы предлагаем[/b] выгодно приобрести диплом, который выполняется на оригинальной бумаге и заверен мокрыми печатями, штампами, подписями. Наш диплом пройдет лубую проверку, даже при помощи профессионального оборудования. Достигайте своих целей быстро и просто с нашими дипломами.
    [b]Где приобрести диплом специалиста?[/b]
    http://www.fishing.ukrbb.net/posting.php?mode=post&f=21&sid=3bf3dc1659c33680579c2e1cac549114
    [u][b] Окажем помощь![u][b].

  2. Официальная покупка аттестата о среднем образовании в Москве и других городах
    [url=http://galantclub.od.ua/member.php?u=15983/]galantclub.od.ua/member.php?u=15983[/url]

  3. Как не попасть впросак при покупке диплома колледжа или ПТУ в России
    [url=http://fabnews.ru/forum/showthread.php?p=79774#post79774/]fabnews.ru/forum/showthread.php?p=79774#post79774[/url]

Leave a Reply

Your email address will not be published. Required fields are marked *