Missing data is a common challenge that students or researchers usually face when conducting surveys or statistical analysis, whether you are working on your first statistics dissertation or analysing complex datasets for research.
It is necessary to understand how to handle missing values that determine the validity of your findings using pairwise deletion. To maximise data utilisation while maintaining analytical integrity despite the absence of datasets in variables for analysis.
What is Pairwise Deletion?
Pairwise deletion, also known as available-case analysis, is a statistical technique for handling data where only the cases with complete information for the specific variables being analysed are excluded. It doesn’t remove entire observations if a dataset is missing in a variable, as is the case with listwise deletion.
How Does Pairwise Deletion Work?
If you are calculating the correlation between Variable A and B, pairwise deletion only removes cases where either A or B (or both) have missing values. When you move forward to analyse Variable A and Variable C, it uses all cases where both A and C have complete data.
Pairwise deletion is best for working with large datasets where missing data patterns vary across variables. For example, students learning statistical analysis techniques can use this method to produce precise research outcomes.
What is Listwise Deletion?
Listwise deletion removes an entire observation row of the dataset if it contains even a single missing value across any variable. It creates a “complete cases only dataset” where every observation has data for all variables under consideration.
How Does Listwise Deletion Work?
Suppose you are analysing survey data from 200 Australian university students for a systematic review using statistical analysis. You would have collected study hours, exam scores, attendance rates, and stress levels to find the right results. If only one student didn’t report their stress level, then the listwise deletion will exclude their entire record.
It can eliminate their study hours, exam scores, and attendance data to ensure consistency across all analyses. Listwise deletion also reduces the sample size and statistical power automatically when missing data is scattered across different variables.
What are the Key Differences Between Pairwise and Listwise Deletion?
| Feature | Pairwise Deletion | Listwise Deletion |
|---|---|---|
| Data Usage | Maximises data usage by analysing each variable pair separately. | Uses only complete cases, potentially discarding missing datasets. |
| Sample Size Consistency | Variable sample size across different analyses. | Consistent sample size for all analyses. |
| Statistical Power | Higher power due to larger effective sample sizes. | Reduced power with smaller sample sizes. |
| Bias risk under MCAR | Unbiased if MCAR and other assumptions hold. | Unbiased if MCAR and other assumptions hold. |
| Risk under MAR/ MNAR | Low but still risk of bias. | High bias risk. |
| Ease of Implementation | More complex (especially for modelling beyond simple correlations). | Very easy. |
| Use in advanced modelling | Risks like non-positive definite covariance matrices. | Preferred for consistency of sample size. |
| Best use cases | Exploratory analysis, correlation matrices, descriptive statistics. | Multivariate modelling, regression analysis, and structural equation modelling. |
| Matrix issues | Can produce non-positive definite covariance matrices. | Produces positive definite matrices. |
Common Assumptions About Missing Data Mechanisms
Both pairwise and listwise deletion rely on a critical assumption about why data is missing. That’s why understanding missing data mechanisms is essential for choosing appropriate analytical methods. Let us walk you through different conditions of missing data to understand them better.
- Missing Completely at Random (MCAR)
When participants randomly skip questions without a pattern when filling out a survey form, the data is considered MCAR. It reflects the probability of missingness and shows that it is not related to any observed or unobserved variables.
For example
Students at the University of Sydney may lose interest when completing an online survey about a futuristic classroom due to internet connectivity issues at random times. This may cause random missing responses, creating MCAR data.
Missing at Random (MAR)
Data is considered MAR when missingness is related to observed variables but not the missing values.
For example
In a study of work-life balance, higher-income professionals are less likely to report their income. This tendency can be captured by their occupation type (which is observed), and once you account for occupation, the missingness is random.
Missing Not at Random (MNAR)
Data is MNAR when the probability of missingness is related to the unobserved values.
For example
Students with lower GPAs might be more likely to skip reporting their grades in the best graduate schools in Australia survey. This missingness directly relates to the missing value itself in the data sheet.
How to Do Pairwise Deletion in R Studio?
R Studio provides excellent built-in support for pairwise deletion across various statistical functions. You can follow the steps below to implement it effectively for your statistical analysis.
Create a sample dataset
student_data <- data.frame(
study_hours = c(15, 20, NA, 18, 22, 25, NA, 19),
assignment = c(78, NA, 85, 82, 90, 88, 84, NA),
attendance = c(92, 88, 95, NA, 90, 93, 87, 91),
final_exam = c(82, 85, 88, 80, 92, 89, 83, 87)
)
# calculate correlation matrix using pairwise deletion
cor_matrix <- cor(student_data, use = “pairwise.complete.obs”)
print(cor_matrix)
Advantages and Disadvantages of Pairwise Deletion
| Advantages of Pairwise Deletion | Disadvantages of Pairwise Deletion |
|---|---|
| Retains more data per analysis. | Varying sample size across analyses. |
| Higher statistical power. | Potentially biased if not MCAR. |
| Simple to apply for pairwise statistics. | Incorrect or inconsistent standard errors. |
| Useful in exploratory analysis. | Can produce non-positive-definite covariance matrices. |
| Preserves information from partially complete cases. | Not ideal for multivariate modelling. |
| Computationally cheap. | Harder to document and reproduce. |
When to Use Pairwise Deletion vs Listwise Deletion?
It is ideal to use pairwise deletion when you have large sample sizes, missing data are minimal, missingness is likely MCAR, and your primary analyses are pairwise correlations or covariances (e.g., state-wide student survey in Australia, where only a small proportion skipped a question).
However, it is not best to use pairwise deletion when you have modelling (multiple regression, logistic regression, factor analysis) with multiple covariates, and missingness might relate to key covariates’ outcomes (for example, low-income respondents are less likely to answer a work-hours question).
Frequently Asked Questions
Pairwise deletion analyses all available data for each variable pair, excluding only cases missing those specific variables. Complete (listwise) deletion removes any case with missing values in any variable. Pairwise retains more data but produces inconsistent sample sizes, whereas listwise offers consistency but reduces usable data.
Pairwise tools (e.g., SPSS excludes cases pairwise) include all valid observations for each variable pair, optimising data usage. Listwise tools (e.g., SPSS default) drop any case with missing data entirely.
The Bonferroni correction adjusts significance levels to prevent false positives in multiple comparisons. It divides the overall alpha (e.g., 0.05) by the number of tests, producing a stricter threshold.

