Target Audience: New Graduates/Analysts
Introduction
Comparable sales analysis plays a crucial role in determining a property's value in real estate valuation. However, the traditional approach of making subjective adjustments to comparable
sales data tends to raise questions about the reliability of the final value
conclusions. To address this challenge, I will use this three-part blog post to
delve into a comprehensive methodology that combines statistical rigor with
traditional valuation principles to enhance the accuracy and explainability of
property valuations.
The blog post will outline a
structured three-step process, redefining comparable sales analysis. The first step involves using a correlation matrix to examine the relationships between the sale price of properties and six key independent variables. This initial analysis scrutinizes potential collinearity and multicollinearity among these variables, setting the foundation for a more robust regression model.
The second step of the process
employs multiple regression analysis to generate consistent coefficients that
form the basis of an adjustment matrix. This matrix facilitates the systematic
adjustment of comparable sales data to align with the subject property's
characteristics accurately. By leveraging statistical methods, this approach
aims to minimize the subjective nature of adjustments and provide a more
objective and reliable valuation model.
Finally, moving beyond the
statistical realm, the third step incorporates the art of traditional
comparable sales analysis. This aspect involves selecting comparable sales
based on criteria such as least adjustments and sales recency. It emphasizes
the importance of applying logic and expertise in identifying genuinely
comparable properties, thereby enhancing the accuracy and credibility of the
valuation process.
By combining the precision of
regression modeling with the artistry of traditional valuation principles, this
three-step approach promises to deliver value conclusions that are accurate,
transparent, and logical. Through this blog post series, I aim to showcase a
more informed and systematic method of conducting comparable sales analysis,
ultimately elevating the standards of property valuation practices.
![]() |
(Click on the image to enlarge) |
Dataset and Variables
This
dataset, leading to the above correlation matrix, comprises eighteen months of
home sales data from a particular town, specifically from January 2023 to June
2024, to value the subject properties as of July 1, 2024.
Sale
Price will be the dependent variable in the regression model. One of the six
independent variables, "Months Since," represents the number of
months since the sale. For instance, a sale in January 2023 will receive a
value of 18 (July 2024 minus January 2023), while a sale in June 2024 will be
assigned a value of 1. The "Exterior Wall" variable has been
effect-coded by zeroing in on the deviation of each category from the town's
median sale price. Bldg Age is a synthetic variable calculated by subtracting
the year the property was built from the prediction year 2024. The other
variables are quantitative data obtained from public records. No location variable will be used since all subjects and comps will come from specific neighborhoods within this town.
Analysis
Looking at
the correlation matrix, we observe some moderate to moderately high
correlations between Sale Price and the independent variables.
- A moderate positive correlation (0.4117)
between Land SF and Sale Price is expected, as larger lots tend to be
associated with higher-priced homes.
- As expected, a moderate negative
correlation (-0.2033) exists between building age and Sale Price, which
aligns with the general understanding that older buildings tend to be less
expensive than newer ones in the real estate market.
- A strong positive correlation (0.7780)
exists between Heated SF and Sale Price, which is also expected, as larger
buildings tend to fetch higher prices.
- There is a moderate positive correlation
(0.5123) between Bathrooms and Sale Price, which is also to be expected.
Multicollinearity
Multicollinearity occurs when two
or more independent variables in a regression model are highly correlated,
leading to unstable and unreliable estimates of the coefficients.
Therefore, when looking for
multicollinearity, we are concerned with correlations between the
independent variables, not their correlation with the dependent
variable (Sale Price in this case).
Here's how to
assess multicollinearity among independent variables:
1. Look
for correlations exceeding 0.8, a general guideline for strong correlation.
2. Pay attention to the overall pattern in the correlation matrix. The
presence of multiple high correlations between independent variables is a
strong indicator of multicollinearity.
Examining the
correlation matrix, we can see that there are some moderate correlations among
the independent variables:
- Land SF and Heated SF (0.4659)
- Heated SF and Bathroom (0.6174)
The strongest
correlation among independent variables is between Heated SF and Bathrooms (0.6174),
which is moderate and not a cause of concern.
It's
important to note that there is no one-size-fits-all answer to dealing with
multicollinearity. The best approach will depend on your specific data and
research question, allowing you to choose the method that best
suits your needs.
Important to Know
Here are some ways to address
multicollinearity:
- Drop
one of the highly correlated variables: This is a simple solution but
can also remove valuable information from the model. Before dropping a
variable, carefully consider which (variable) is less critical to your
analysis.
- Combine
the correlated variables into a single variable: If the
correlated variables represent the same underlying concept, you can create
a new variable that combines them. For example, you could create a new
variable, which is the square footage of the house (Heated SF + basement
SF).
- Use
Ridge Regression:
This regression technique can reduce the impact of multicollinearity on
model coefficients.
It is important to note that
multicollinearity may still be a concern even if correlation coefficients are
not extremely high, especially in small sample sizes. In such a scenario, it
would be advisable to proceed with fitting the regression model and checking
for other diagnostics, such as variance inflation factors (VIF), to further
assess multicollinearity and ensure the stability of the regression estimates.
Conclusion
Examining the
correlation matrix before running a regression model is a common and beneficial
practice. This preliminary step offers several advantages:
- Understanding variable
relationships:
The correlation matrix reveals the strength and direction of the
relationships between the dependent variable (e.g., Sale Price) and each
independent variable, and the relationships between the independent
variables. This information helps analysts understand which independent
variables might be the most significant predictors of the target variable.
- Identifying potential
multicollinearity:
Multicollinearity arises when independent variables are highly correlated,
potentially causing problems with interpreting the regression coefficients
and leading to inaccurate results. The correlation matrix aids in
identifying potential cases of multicollinearity, which can be further
investigated using tests like the Variance Inflation Factor (VIF).
- Variable selection: While
correlation alone shouldn't be the sole criterion for choosing variables,
it is a helpful starting point. Strong correlations between independent
variables and the dependent variable suggest they might be significant
predictors for inclusion in the model.
- Guiding further analysis: The
correlation matrix can highlight unexpected relationships or outliers that
warrant further investigation, leading to a more nuanced understanding of
the data and potentially improving the final regression model.
In
conclusion, examining the correlation matrix is a simple but powerful technique
to gain valuable insights into the data before running a regression model. This
preliminary analysis helps build a stronger foundation for the analysis and
potentially avoids issues with multicollinearity or misleading results.
Coming Soon: Part 2 of 3 – Regression modeling to help develop the adjustment matrix.
Sid's Bookshelf: Elevate Your Personal and Business Potential