Introduction
Measures of central tendency like mean, median and mode might not give you the complete picture of the data. They do not give you the info about variability and dispersion of the data. Understanding data distribution is a crucial aspect of statistics, especially in medical research, where making sense of data is often the key to discovering new insights. One of the most informative and robust measures of dispersion is the Interquartile Range (IQR).
In this post, we’ll dive deep into what the IQR is, why it’s important, and how to calculate it using various tools like Google Sheets, Excel, and even manually. Whether you’re just starting out or brushing up on your statistical knowledge, this guide will help you master the IQR.
- Introduction
- What is the Interquartile Range?
- Understanding Quartiles
- Why is the Interquartile Range Important?
- Note Before you learn Calculating the Interquartile Range
- How do we calculate the percentiles?
- Using Statistical Software (R, Python)
- Conclusion
What is the Interquartile Range?
The Interquartile Range (IQR) is a measure of dispersion or variability that describes the range within which the middle 50% of your data lies.
- Unlike the range, which considers all data points, the IQR focuses on the central portion, making it less sensitive to outliers and more reflective of the data’s overall spread.
- Just like median is a robust measure of central tendency which is often used in skewed data, interquartile range is especially useful in skewed data to measure dispersion or variability of the data.
- Example of a graphical representation of the data set: 10, 15, 20, 35, 40, 50, 55, 70.
Understanding Quartiles
To grasp the IQR, you need to understand quartiles. Arrange the values in your dataset in ascending order and imagine dividing the entire data set into four equal parts. The first quartile or Q1
- Q1 (First Quartile): This is the median of the lower half of the data set. This lowest quartile or Q1 represents the 25th percentile of the dataset.
- Q2 (Second Quartile): This is simply the median of the data set, marking the 50th percentile.
- Q3 (Third Quartile): This is the median of the upper half of the data set, marking the 75th percentile.
The IQR is calculated as:
This formula subtracts the first quartile from the third quartile, giving you the range of the middle 50% of your data.
Why is the Interquartile Range Important?
- The IQR is a robust measure of variability and is particularly useful when dealing with skewed distributions or data with outliers.
- Since it focuses on the central portion of the data, it provides a better sense of the typical spread than the full range, which might be distorted by extreme values.
- Example: Imagine you’re analyzing the blood pressure readings of a group of patients. If a few patients have abnormally high or low blood pressure, the IQR will give you a better understanding of the “normal” range for most patients, rather than being skewed by the extremes.
Note Before you learn Calculating the Interquartile Range
- The smallest value that is greater than k percent of the values.
- The smallest value that is greater than or equal to k percent of values.
- An interpolated value between the two closest ranks
As you have learnt by now, the Interquartile range depends on the quartiles – Q1, Q2, Q3. These quartiles are nothing but percentiles in the dataset. However there is no consensus among statisticians about the exact formula or definition to calculate percentiles. The three calculation methods define the kth percentile in the following slightly different ways:
By using these methods, one can get slightly different values for the same percentile. So different methods and statistical software programs will find slightly different Q1 and Q3 values, which affects the interquartile range. These variations stem from alternate ways of finding percentiles.
How do we calculate the percentiles?
When you calculate quartiles using excel or google sheets or just any statistical software, you will come across terms like quartile inclusive and quartile exclusive Let me try to make this stuff simpler.
Consider the following dataset: 10, 15, 20, 35, 40, 50, 55, 70.
- In the above dataset, by following the definition of median, 40 is determined as median.
- In the exclusive method, the median is excluded from the calculation of Q1 and Q3. This method divides the data set into two halves, excluding the median (40 in the above example) and then calculates the quartiles from these halves.
- Q1 (25th percentile): The median of the lower half, excluding the median of the entire data set is 17.5
- Q3 (75th percentile): The median of the upper half, excluding the median of the entire data set is 52.5
- The excel or google sheets formula for exclusive quartiles is = QUARTILE.EXC (data range, quart).
- Eg, = QUARTILE.EXC (A1:A9, 1)
- The exclusive method also doesn’t consider the extreme value of each half. So by using this formula you simple cannot calculate Q0 or Q4. Both of them will result in an error.
The exclusive method is more common in inferential statistics and is preferred when working with larger data sets. It provides a more focused view of the data’s distribution by excluding the central value and emphasizing the spread of the data.
In the inclusive method, the quartiles are calculated by including the median in the calculation of both the lower and upper quartiles. This method treats the data set as a whole and ensures that all values, including the median, contribute to the calculation of Q1 (first quartile) and Q3 (third quartile).
- Q1 (25th percentile): Using the inclusive method, Q1 is the median of the lower half of the data set, including the median of the entire data set. 20 is the Q1 in above example
- Q3 (75th percentile): Q3 is the median of the upper half, again including the median of the entire data set. 50 is Q3 in above example
The inclusive method is often used in descriptive statistics and when dealing with smaller data sets, as it provides a more comprehensive view of the data distribution by including the central value in the calculation of quartiles.
Using Statistical Software (R, Python)
For those who work with large datasets or prefer programming, statistical software like R or Python offers powerful tools for calculating the IQR.
In R: #RStudiodata <- c(10, 15, 20, 35, 40, 45, 50, 55, 70)IQR(data)In Python (using pandas): #Python
import pandas as pddata = pd.Series([10, 15, 20, 35, 40, 45, 50, 55, 70])IQR = data.quantile(0.75) - data.quantile(0.25)print(IQR)Both methods will give you the IQR quickly, and they’re especially useful when handling large datasets where manual calculation is impractical.
Conclusion
The Interquartile Range is a vital tool in any statistician’s toolkit. It helps you understand the spread of your data, especially when outliers are present. Whether you’re calculating it manually, using spreadsheets, or employing software like R or Python, mastering the IQR will enhance your ability to analyze data effectively.
This comprehensive guide should arm you with everything you need to know about the IQR, making you well-prepared to tackle statistical challenges with confidence.
https://geekysteth.com/master-statistics-101-interquartile-range-iqr/



