Statistics
This page provides an introduction to Statistics.
Overview
Statistics involves summarizing and describing the main features of a dataset, as well as drawing conclusions and making decisions based on data.
There are three types of data as below:
Ungrouped Data
A data where each observation is separate and distinct, with no grouping or classification.
For example, marks of students:
.
Discrete Frequency Data
A data where observations are grouped into distinct categories, with frequencies representing the number of times each category appears.
For example, student scores in a math test:
Continuous Frequency Data
A data where observations are grouped into continuous intervals or ranges, with frequencies representing the number of observations in each interval.
For example, Heights of people:
Measure of Central Tendency
Mean, Median & mode are measures of central tendency. It is single number representing the whole data.
Mean
Mean is an average value.
For Ungrouped data, mean formula is:
Where,
is observation,
is number of observations.
For discrete frequency data, mean formula is:
Where,
is observation group,
is frequency of observation group,
is number of observation groups.
For continuous frequency data, mean formula is:
Where,
is observation group,
is frequency of observation group,
is number of observation groups.
Median
Median is central value.
For ungrouped data, median is calculated as below:
First arrange the given data in ascending order or descending order.
- If total number of observations are odd then median is term.
- If total number of observations are even then median is arithematic mean of and terms.
For discrete frequency data, median is calculated as below:
- First arrange all observations in increasing order.
- Now calculate cummulative frequency ()
- Median is that observation () whose () is equal to or just greater than
For continuous frequency data, median is calculated as below:
- First arrange all observations in increasing order.
- Now calculate cummulative frequency ()
- Median is that observation () whose () is equal to or just greater than
Mode
Mode is most frequent value.
For ungrouped data, mode is:
An observation occuring maximum number of times.
For discrete frequency data, mode is:
An observation which has highest value of .
Where,
is value of observation group,
is frequency of observation group.
For continuous frequency data, mode formula is:
Where,
is lower limit of model class,
is frequency of the class above the model class,
is frequency of the model class,
is frequency of the class below the class,
is width of the class interval.
Model class is the class interval whose frequency is greatest.
If model class is the the last class internval, then value of will be .
Measure of Dispersion
It tells us if measure of central tendency is reliable or not
There are measures of dispersion:
- Mean deviation about , where can be mean, median or mode
- Variance()
- Standard Deviation()
Range
Range is difference between largest and smallest value in dataset.
For all types of data, range formula is:
Mean Deviation
Mean deviation is average distance between each value in a dataset and the mean value. It can be also be calculated around median and mode.
For ungrouped data, mean deviation formula is:
Where,
is observation,
is mean of all the observations,
is total number of observations.
By replacing in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively
For discrete frequency data, mean deviation formula is:
Where,
is observation group,
is mean of all the observations for discreate frequency data
is frequency of observation group.
By replacing in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively.
For continuous frequency data, mean deviation formula is:
Where,
is midpoint of observation class interval,
is mean of all the observations for continuous frequency data,
is frequency of observation class interval.
By replacing in above formula with Median and Mode value we can calculate Mean deviation about Median and Mean deviation about Mode respectively.
Variance
The average of the squared differences between each value in a dataset and the mean value. It is denoted as Variance().
For ungrouped data, variance formula is:
Where,
is observation,
is mean of all the observations,
is total number of observations.
For discrete frequency data, variance formula is:
Where,
is observation group,
is frequency of observation group,
is number of observation groups.
For continuous frequency data, variance formula is:
Where,
is midpoint of observation class interval,
is frequency of observation class interval,
is number of observation class interval.
Standard Deviation.
Standard deviation is square root of the variance, representing the spread or dispersion of a dataset. It is represented as ().
Coefficient of Variation
This indicator tells you how much variation you have in your data.
Higher coefficient of variation mean more variable, and lower coefficient of variation mean more consistent so more reliable.
Important Points
- If every observation in a dataset is increased or decreased by the same constant value α, then:
- If all observations multiplied by same non-zero number , then:
- Sum of squares of the deviations from the mean is minimum.
- Sum of deviations from the mean is zero.
-
Extreme values do not affect the median as strongly as they affect the mean value. For example for dataset median will be , and mean will be .
-
Sum of the absolute differences between each observation and the median is smallest.
- Maximum value of Variance for given data will be: