Wednesday, June 5, 2024

Representative Sales Sampling: The Not-So-Secret Ingredient of AVM Success

Target Audience: New Graduates/Analysts

When constructing an AVM, it's of utmost importance that the sales sample used in modeling accurately mirrors the population from which it's derived. This accuracy is a key factor in the AVM model's reliability. To achieve this, analysts can employ several methods:

1. Random Sampling: Analysts can use random sampling techniques, ensuring that each property in the population has an equal chance of being included in the sample. This reduces bias and ensures a more representative sample, giving them confidence in the data they are working with.

2. Stratified Sampling: If the population can be divided into different strata (such as different property types or locations), stratified sampling can be used to ensure that each stratum is represented proportionally in the sample.

3. Cluster Sampling: In cases where the population is divided into clusters (such as neighborhoods), they can use cluster sampling to select clusters randomly and include all properties within those clusters in the sample.

4. Weighting: They can apply weighting techniques to adjust the influence of different properties in the sample to better reflect the overall population, which is particularly useful when specific population segments are underrepresented in the sample.

5. Validation and Calibration: They can validate the AVM by comparing the model's estimates to actual sales prices for a separate set of properties. The model can then be calibrated to improve its accuracy and ensure validity when significant discrepancies arise.

Employing these techniques and ensuring that the sales sample represents the population can help analysts create a more robust and reliable model.

Data Selection and Stratification

Geographic location: The sample should reflect the population's geographic distribution, which is especially important when property values vary significantly across areas.

Property characteristics: Analysts should choose a sample that includes properties with characteristics similar to those of the population of interest, including factors such as land and building sizes, age, property type, and amenities, which can be achieved by categorizing the data based on these characteristics and ensuring that each category is proportionally represented in the final sample.

Time: Analysts must ensure that the data used to create the sales sample reflects current market conditions, choose a specific timeframe that represents a stable market, or consider making time-series adjustments. 

Sample Size: The sample size should be large enough to provide reliable estimates of property values, which depend on the model's complexity and data variability. 

By following these steps, analysts can increase the likelihood that the sales sample used in an AVM accurately represents the population from which it is derived, resulting in more accurate property valuations.

Example

Splitting a Sales Sample between Modeling (80%) and Holdout (20%) Samples

Dataset

Since an AVM will be developed using a sample of 11,767 home sales from a specific county, it is statistically necessary to divide the sample into modeling and holdout sets, with an 80%/20 % split. The table above displays the median sale prices and other key independent variables for both samples and the original (pre-split) sample.

Analysis

Analysts need to consider whether there are discernible patterns or biases in the data distribution between the modeling and holdout samples to assess the randomness of the split. In this case, the split has been implemented to maintain the overall characteristics of the entire dataset.

1. Count: The count of observations in the modeling and holdout samples reflects the intended 80-20 split, which suggests that the split was executed as planned without any noticeable bias.

2. Sale Price: The median sale prices in the modeling and holdout samples, $395,000 and $394,500, respectively, are remarkably close to each other and to the pre-split dataset's median. This negligible difference, likely due to random chance, indicates that the split has not significantly altered the sale price distribution between the two samples.

3. Land SF, Bldg Age, Living SF, Other SF, and Bath Count: The median values of independent variables such as Land SF, Bldg Age, Living SF, Other SF, and Bath Count in the modeling and holdout samples closely mirror those of the pre-split dataset. This similarity, particularly the identical square footage and building age, underscores the split's success in maintaining similar distributions of these variables in both samples.

As for the effectiveness of the split, an 80/20 split is a commonly used ratio in the AVM world. In this case, it provides a good balance between having enough data for developing the model (modeling sample) and having enough data to evaluate the model's performance (holdout sample).

In conclusion, the split is random and effective. It maintains the dataset's overall distribution and characteristics in both the modeling and holdout samples, enabling the development of a robust, accurate AVM model that can be validated on the holdout sample.

Sales Sampling in Mass Appraisal

Analysts should carefully consider the implications of using stratified or cluster sampling when developing an AVM to generate a countywide assessment roll to ensure fairness and equity. While these sampling techniques can help ensure a more representative sample and improve the model's accuracy, they can also introduce breaks in values along the stratification or cluster lines, such as towns or neighborhoods within the county.

When creating a countywide assessment roll, breaks in values along stratification or cluster lines may not be desirable, potentially leading to inconsistencies or disparities in property assessments within the same jurisdiction. This, in turn, can potentially result in unfair treatment or unequal tax burdens for property owners in different areas, a situation we all strive to avoid.

Therefore, analysts may need to balance the benefits of using stratified or cluster sampling to achieve a more representative sample with the need to maintain fairness and equity in the assessment roll. They may need to carefully evaluate the potential impact of these sampling techniques on the overall assessment results and consider alternative sampling approaches or weighting methods to address breaks in values along stratification or cluster lines, while still ensuring accuracy and fairness in the assessment roll.

Ultimately, the goal should be to develop an AVM that provides accurate property values while maintaining fairness and equity in the assessment process, taking into account the unique characteristics and challenges of the local jurisdiction.

Sales Sampling in the Private Sector

In the private sector, particularly for AVM vendors or companies focused on developing automated valuation models for commercial purposes, the primary goal is often to produce the most accurate property values possible. In this context, the vendor may prioritize accuracy and modeling performance over concerns about value breaks along a county's stratification or cluster lines.

Therefore, private-sector AVM vendors may use stratified or cluster sampling techniques to develop their models. These methods often lead to more precise and reliable valuations by ensuring a more representative population sample. By using these sampling techniques, the model can capture nuances and variations in property values across market segments, leading to more accurate assessments overall.

While the focus in the private sector may lean toward accuracy and model performance, AVM vendors need to be transparent about their sampling methods and the potential implications of breaks in values across stratification or cluster boundaries. Additionally, vendors should strive to continuously validate and refine their models to ensure they are providing reliable and fair property valuations that meet the needs of their clients and stakeholders.

Ultimately, the decision to use stratified or cluster sampling for AVM development in the private sector may depend on the vendor's specific objectives and the desired balance among accuracy, fairness, and market representation in the valuation process.

Crossover – When a Private Vendor Sells to the Public Sector

Suppose a private AVM vendor contracts to sell values to a County Assessor to assist in producing an assessment roll. In that case, it may be beneficial for the vendor to recalibrate its model using a top-line representative random sample. Recalibrating the model ensures that the AVM's estimates align more closely with the specific market conditions and property characteristics in the county for which the assessment roll is being generated.

Using a top-line, representative random sample, the AVM vendor can play a crucial role in addressing potential issues related to bias or inaccuracies in the original model that may have arisen from different data sources or methodologies. With this recalibration using a representative sample specific to the county where the assessment roll will be used, the AVM vendor can significantly enhance the accuracy and reliability of the property valuations provided to the County Assessor, making the County Assessor feel valued for their contribution.

Additionally, recalibrating the model allows the vendor to tailor the AVM's estimates to reflect the local real estate market's unique characteristics and trends, improving the relevance and usefulness of the valuation results for assessment purposes.

Collaboration is Key

The AVM vendor and County Assessor, both key players in this process, can collaborate to ensure the recalibration process is effective. This would benefit not only the AVM model but also the accuracy of property assessments.

Sharing Data: The County Assessor may have access to additional data on property characteristics and sales that could improve the randomness and representativeness of the sample.

Transparency: The vendor should be transparent about their sampling methods and the potential for bias in their original model, allowing the assessment staff to assess the need for recalibration and its effectiveness.

Overall, recalibrating the AVM model using a top-line representative random sample can help ensure that the property values generated by the AVM align with the County Assessor's specific needs and requirements and contribute to producing a fair and accurate assessment roll.

Using Artificial Intelligence (AI) in Sales Sampling

Artificial Intelligence (AI) can play a crucial role in ensuring that the modeling sales sample used in an AVM represents the population it derives from in both the private and public sectors. Here are some ways in which AI can help achieve a more representative sales sample:

1. Advanced Sampling Techniques: AI algorithms can optimize sampling techniques, such as random sampling, stratified sampling, or cluster sampling, to ensure that the sales sample is representative of the population. AI can help identify patterns in the data and select sample points that reflect the diversity of the overall population.

2. Feature Selection: AI can assist in identifying the most relevant features or variables to include in the sales sample, ensuring that the selected properties capture the key characteristics of the population. Feature selection algorithms can help prioritize variables that have the most significant impact on property values.

3. Data Quality Assessment: AI can analyze the quality of the data used in the sales sample, identifying and addressing any biases, errors, or missing values that could impact the sample's representativeness. AI algorithms can flag data inconsistencies and suggest corrective actions to improve data quality.

4. Model Validation and Calibration: AI can be used to validate AVM models against actual sales data and adjust model parameters through calibration to improve accuracy and ensure that the model reflects the population's actual characteristics. This iterative process helps refine the AVM and enhance its representativeness.

5. Dynamic Sampling Strategies: AI can enable dynamic sampling strategies that adapt to changes in the real estate market or population characteristics over time. By continuously monitoring data trends and adjusting the sampling approach accordingly, AI can help maintain the relevance and representativeness of the sales sample.

6. Fairness and Bias Mitigation: AI can be leveraged to detect and mitigate biases in the modeling sales sample, ensuring that the sample is fair and equitable. Fairness-aware machine learning algorithms can help address bias issues and promote inclusivity in the sample selection.

By incorporating AI-driven approaches into sampling, data processing, model optimization, and validation, AVMs in the private and public sectors can enhance the representativeness of the sales sample used in modeling, leading to more accurate and reliable property valuations that better reflect the target population's characteristics.

Sid's AI-Assisted Bookshelf: Elevate Your Personal and Business Potential

No comments:

Post a Comment

Book: Challenging Your Property Assessment: The Art of the Rebuttal: (A Comprehensive Guide to Winning Property Tax Appeals)

Link to the Kindle version Book Summary Your property tax bill arrives — and it’s higher than it should be. The assessor’s valuation feels w...