Univariate analysis is a statistical method used to analyze and summarize data that involves examining the distribution, characteristics, and properties of a single variable at a time. In simpler terms, it focuses on understanding and describing the patterns and features of individual variables without considering their relationships with other variables.

Here’s a more detailed explanation for beginners:

**Single Variable Focus:**

- Univariate analysis deals with one variable at a time. It allows you to explore and understand the characteristics of a specific variable in isolation.

2. **Types of Variables:**

- Variables can be broadly categorized into two types: categorical and numerical.
**Categorical variables**are those that represent categories or labels (e.g., gender, color).**Numerical variables**are those that represent measurable quantities (e.g., age, height).

3. **Categorical Univariate Analysis:**

- For categorical variables, univariate analysis often involves:
- Counting the frequency of each category using count plots or bar charts.
- Visualizing the proportions of different categories using pie charts.

4. **Numerical Univariate Analysis:**

- For numerical variables, univariate analysis often involves:
- Creating histograms to visualize the distribution of values.
- Using summary statistics such as mean, median, minimum, maximum, and standard deviation to describe the central tendency and spread of the data.
- Exploring boxplots to identify outliers and understand the spread of data.

5. **Purpose of Univariate Analysis:**

- Univariate analysis is useful for gaining insights into the characteristics and patterns of individual variables.
- It helps in identifying outliers, understanding the range of values, and detecting any potential issues with the data.

6. **Example:**

- Suppose you have a dataset with information about the ages of individuals. Univariate analysis of the ‘Age’ variable would involve creating a histogram to visualize the age distribution, calculating the average age (mean), and exploring any extreme values using a boxplot.

7 . **Limitations:**

- While univariate analysis provides valuable insights into individual variables, it may not capture complex relationships between variables. For a more comprehensive understanding of the data, multivariate analysis, which involves the simultaneous analysis of multiple variables, is often necessary.

In summary, univariate analysis is a fundamental step in the exploratory data analysis (EDA) process. It helps beginners grasp the characteristics and patterns of individual variables, laying the groundwork for more advanced analyses.

```
#import libraries
import pandas as pd
import seaborn as sns
#read the data
df = pd.read_csv(‘../input/data-science-day1-titanic/DSB_Day1_Titanic_train.csv’)
#show the first 5 elements of data
df.head()
```**1. Categorical Data**
# if there is categorical data, you mostly use a count plot and piechart.
a. Countplot
sns.countplot(df[‘Embarked’])
or
df[‘Survived’].value_counts().plot(kind=’bar’)
b. PieChart
df[‘Sex’].value_counts().plot(kind=’pie’,autopct=’%.2f’)
-----------------------------------------------------------------------------------------------------------------------------------------
2. Numerical Data
In the case of numerical value, you use the following function
a. Histogram
#matplotlib is used to make graphs or for visualization.
import matplotlib.pyplot as plt
plt.hist(df[‘Age’],bins=5)
b. Distplot
#it is a histogram with KDE(Kernel Density Estimation)
sns.distplot(df[‘Age’])
c. Boxplot
#It is especially used for outliers.
sns.boxplot(df[‘Age’])
# to find the minimum age present in data
df[‘Age’].min()
#to find the maximum age present in the data
df[‘Age’].max()
#to find the average age present in the data
df[‘Age’].mean()
# It is used for how much data deviates from the mean.
df[‘Age’].skew()

Excellent write-up

Excellent write-up