Sunday, June 30, 2024

The Art and Science of Comparable Sales Analysis (Part 1 of 3)

Part 1 of 3

Comparable sales analysis plays a crucial role in determining a property's value in real estate valuation. However, the traditional approach of making subjective adjustments to comparable sales data tends to raise questions about the reliability of the final value conclusions. To address this challenge, I will use this three-part blog post to delve into a comprehensive methodology that combines statistical rigor with traditional valuation principles to enhance the accuracy and explainability of property valuations.

The post will outline a structured three-step process that redefines comparable sales analysis. The first step involves using a correlation matrix to examine the relationships between property sale prices and six key independent variables. This initial analysis scrutinizes potential collinearity and multicollinearity among these variables, setting the foundation for a more robust regression model.

The second step of the process employs multiple regression analysis to derive consistent coefficients that serve as the basis for an adjustment matrix. This matrix facilitates the systematic adjustment of comparable sales data to accurately align with the subject property's characteristics. By leveraging statistical methods, this approach aims to minimize the subjective nature of adjustments and provide a more objective and reliable valuation model.

Finally, moving beyond the statistical realm, the third step incorporates the art of traditional comparable sales analysis. This aspect involves selecting comparable sales based on criteria such as the least adjustments and sales recency. It emphasizes the importance of applying logic and expertise in identifying genuinely comparable properties, thereby enhancing the accuracy and credibility of the valuation process.

By combining the precision of regression modeling with the artistry of traditional valuation principles, this three-step approach promises to deliver value conclusions that are accurate, transparent, and logical. Through this blog post series, I aim to showcase a more informed and systematic method of conducting comparable sales analysis, ultimately elevating the standards of property valuation practices.

(Click on the image to enlarge)

Dataset and Variables

This dataset, which led to the above correlation matrix, comprises 18 months of home sales data from a particular town, specifically from January 2023 to June 2024, to value the subject properties as of July 1, 2024.

Sale Price will be the dependent variable in the regression model. One of the six independent variables, "Months Since," represents the number of months since the sale. For instance, a sale in January 2023 will receive a value of 18 (July 2024 minus January 2023), while a sale in June 2024 will be assigned a value of 1. The "Exterior Wall" variable has been effect-coded by centering each category's deviation from the town's median sale price. Bldg Age is a synthetic variable calculated by subtracting the property's year built from the prediction year 2024. The other variables are quantitative data obtained from public records. No location variable will be used since all subjects and comps will come from specific neighborhoods within this town.

Analysis

Looking at the correlation matrix, we observe moderate-to-high correlations between Sale Price and the independent variables.

  • A moderate positive correlation (0.4117) between Land SF and Sale Price is expected, as larger lots tend to be associated with higher-priced homes.
  • As expected, a moderate negative correlation (-0.2033) exists between building age and Sale Price, which aligns with the general understanding that older buildings tend to be less expensive than newer ones in the real estate market.
  • A strong positive correlation (0.7780) exists between Heated SF and Sale Price, as expected, since larger buildings tend to fetch higher prices.
  • There is a moderate positive correlation (0.5123) between Bathrooms and Sale Price, as expected.

Multicollinearity

Multicollinearity occurs when two or more independent variables in a regression model are highly correlated, leading to unstable and unreliable coefficient estimates.

Therefore, when assessing multicollinearity, we are concerned with correlations among the independent variables, not with their correlations with the dependent variable (Sale Price in this case).

Here's how to assess multicollinearity among independent variables:

1. Look for correlations exceeding 0.8, a general guideline for strong correlation.

2. Pay attention to the overall pattern in the correlation matrix. The presence of multiple highly correlated independent variables is a strong indicator of multicollinearity.

Examining the correlation matrix, we can see that there are some moderate correlations among the independent variables:

  • Land SF and Heated SF (0.4659)
  • Heated SF and Bathroom (0.6174)

The strongest correlation among independent variables is between Heated SF and Bathrooms (0.6174), which is moderate and not a cause for concern.

It's important to note that there is no one-size-fits-all answer to dealing with multicollinearity. The best approach will depend on your specific data and research question, allowing you to choose the method that best suits your needs.

Important to Know

Here are some ways to address multicollinearity:

  • Drop one of the highly correlated variables: This is a simple solution, but it can also remove valuable information from the model. Before dropping a variable, carefully consider which variable is less critical to your analysis.
  • Combine the correlated variables into a single variable: If the correlated variables represent the same underlying concept, you can create a new variable that combines them. For example, you could create a new variable for the house's square footage (Heated SF + basement SF).
  • Use Ridge Regression: This regression technique can reduce the impact of multicollinearity on model coefficients.

It is important to note that multicollinearity may still be a concern even if correlation coefficients are not extremely high, especially in small sample sizes. In such a scenario, it would be advisable to proceed with fitting the regression model and checking additional diagnostics, such as variance inflation factors (VIFs), to further assess multicollinearity and ensure the stability of the regression estimates.

Conclusion

Examining the correlation matrix before running a regression model is a common and beneficial practice. This preliminary step offers several advantages:

  • Understanding variable relationships: The correlation matrix reveals the strength and direction of relationships between the dependent variable (e.g., Sale Price) and each independent variable, as well as among the independent variables. This information helps analysts identify which independent variables are the most significant predictors of the target variable.
  • Identifying potential multicollinearity: Multicollinearity arises when independent variables are highly correlated, which can complicate the interpretation of regression coefficients and lead to inaccurate results. The correlation matrix helps identify potential multicollinearity, which can be further investigated with tests such as the Variance Inflation Factor (VIF).
  • Variable selection: While correlation alone shouldn't be the sole criterion for choosing variables, it is a helpful starting point. Strong correlations between independent variables and the dependent variable suggest they might be significant predictors for inclusion in the model.
  • Guiding further analysis: The correlation matrix can highlight unexpected relationships or outliers that warrant further investigation, leading to a more nuanced understanding of the data and potentially improving the final regression model.

In conclusion, examining the correlation matrix is a simple yet powerful technique for gaining valuable insights into the data before running a regression model. This preliminary analysis helps build a stronger foundation for the analysis and potentially avoids issues with multicollinearity or misleading results.

Coming Soon: Part 2 of 3 – Regression modeling to help develop the adjustment matrix.

Sid's Bookshelf: Elevate Your Personal and Business Potential

No comments:

Post a Comment

Book: Challenging Your Property Assessment: The Art of the Rebuttal: (A Comprehensive Guide to Winning Property Tax Appeals)

Link to the Kindle version Book Summary Your property tax bill arrives — and it’s higher than it should be. The assessor’s valuation feels w...