Target Audience: New Graduates/Analysts
When constructing an AVM, it's of utmost importance that the sales sample used in modeling accurately mirrors the population it's derived from. This accuracy is a key factor in the reliability of the AVM model. To achieve this, analysts can employ several methods:
1. Random Sampling: Analysts can use random sampling
techniques, ensuring that each property in the population has an equal chance
of being included in the sample. This reduces bias and ensures a more
representative sample, giving them confidence in the data they are working
with.
2. Stratified Sampling: If the population can be divided into different
strata (such as different property types or locations), they can use stratified
sampling to ensure that each stratum is represented proportionally in the
sample.
3. Cluster Sampling: In cases where the population is divided into
clusters (such as neighborhoods), they can use cluster sampling to select
clusters randomly and include all properties within those clusters in the
sample.
4. Weighting: They can apply weighting techniques to adjust the influence
of different properties in the sample to reflect better the overall population,
which is particularly useful when specific population segments are
underrepresented in the sample.
5. Validation and Calibration: They can validate the AVM by comparing the
model's estimates to actual sales prices for a separate set of properties. The
model can then be calibrated to improve its accuracy and ensure validity if
there are significant discrepancies.
Employing these techniques and ensuring that the sales sample represents the population can help analysts create a more robust and
reliable model.
Data
Selection and Stratification
Geographic location: The sample should represent the
geographic distribution of the population, which is especially important if
there are significant variations in property values across different areas.
Property characteristics: Analysts should choose a sample that
includes properties with characteristics similar to those of the population of
interest, including factors such as land and building sizes, age, property
type, and amenities, which can be achieved by categorizing the data based on
these characteristics and ensuring that each category is proportionally
represented in the final sample.
Time: Analysts must ensure that the data used
for creating the sales sample reflects current market conditions, choose a
specific timeframe representing a stable market, or consider making time-series
adjustments.
Sample Size: The size of the modeling sample should
be large enough to provide reliable estimates of property values, and this will
depend on the model's complexity and the variability of the data.
By following these steps, analysts can
improve the likelihood that the sales sample used in an AVM accurately
represents the population it is derived from, resulting in more accurate
property valuations.
Example
Splitting
a Sales Sample between Modeling (80%) and Holdout (20%) Samples
Dataset
Since an AVM will be developed using a sample of 11,767 home sales
from a specific county, it is statistically necessary to divide the sample into
modeling and holdout samples, with an 80% and 20% split, respectively. The
above table displays the median sale prices and other vital independent
variables for both samples and the original (pre-split) sample.
Analysis
Analysts need to consider whether there are any discernible
patterns or biases in the data distribution between the modeling and holdout
samples to assess the split's randomness. In this case, the split has been
implemented to maintain the overall characteristics of the entire dataset.
1. Count: The count of observations in the modeling and
holdout samples reflects the intended 80-20 split, which suggests that the
split was executed as planned without any noticeable bias.
2. Sale Price: The median sale prices in the modeling and holdout
samples, $395,000 and $394,500, respectively, are remarkably close to each
other and to the pre-split dataset's median. This negligible difference, likely due to random chance, indicates that the split has not significantly
altered the sale price distribution between the two samples.
3. Land SF, Bldg Age, Living SF, Other SF, and Bath Count:
The median values of independent variables such as Land SF, Bldg Age,
Living SF, Other SF, and Bath Count in the modeling and holdout samples closely
mirror those of the pre-split dataset. This similarity, particularly the
identical land square feet and building age, underscores the split's success in
maintaining similar distributions of these variables in both samples.
As for the effectiveness of the split, an 80/20 split is a
commonly used ratio in the AVM world. In this case, it provides a good balance
between having enough data for developing the model (modeling sample) and
having enough data to evaluate the model's performance (holdout sample).
In conclusion, the split is random and effective. It maintains the
overall distribution and characteristics of the dataset in both the modeling
and holdout samples, allowing for the development of a robust and accurate AVM
model that can be validated using the holdout sample.
Sales
Sampling in Mass Appraisal
Analysts should carefully consider the implications of using
stratified or cluster sampling when developing an AVM to generate a countywide
assessment roll to ensure fairness and equity. While these sampling techniques
can help ensure a more representative sample and improve the model's accuracy,
they can also introduce breaks in values along the stratification or cluster
lines, such as towns or neighborhoods within the county.
When creating a countywide assessment roll, breaks in values along
stratification or cluster lines may not be desirable, which could lead to
inconsistencies or disparities in property assessments within the same
jurisdiction. This, in turn, can potentially result in unfair treatment or
unequal tax burdens for property owners in different areas, a situation we all
strive to avoid.
Therefore, analysts may need to balance the benefits of using
stratified or cluster sampling for a more representative sample and the need to
maintain fairness and equity in the assessment roll. They may need to carefully
evaluate the potential impact of these sampling techniques on the overall
assessment results and consider alternative sampling approaches or weighting
methods that can address the issue of breaks in values along stratification or
cluster lines while still ensuring accuracy and fairness in the assessment
roll.
Ultimately, the goal should be to develop an AVM that provides
accurate property values and maintains fairness and equity in the assessment
process, considering the unique characteristics and challenges of the local
jurisdiction.
Sales
Sampling in the Private Sector
In the private sector, particularly for AVM vendors or companies
focused on developing automated valuation models for commercial purposes, the
primary goal is often to produce the most accurate property values possible. In
this context, the vendor may prioritize accuracy and modeling performance over
concerns about value breaks along a county's stratification or cluster lines.
Therefore, private-sector AVM vendors may use
stratified or cluster sampling techniques to develop their models. These
methods often lead to more precise and reliable valuations by ensuring a
more representative population sample. By utilizing these sampling
techniques, the model can potentially capture the nuances and variations in
property values across different segments of the market, leading to more
accurate assessments overall.
While the focus in the private sector may lean toward accuracy and
model performance, AVM vendors need to be transparent about their sampling
methods and any potential implications of breaks in values along stratification
or cluster lines. Additionally, vendors should strive to continuously validate
and refine their models to ensure they are providing reliable and fair property
valuations that meet the needs of their clients and stakeholders.
Ultimately, the decision to use stratified or cluster sampling in AVM
development in the private sector may depend on the specific objectives of the
vendor and the desired balance between accuracy, fairness, and market
representation in the valuation process.
Crossover
– When a Private Vendor Sells to the Public Sector
Suppose a private AVM
vendor contracts to sell values to a County Assessor to assist in producing an
assessment roll. In that case, it may be beneficial for the vendor to
recalibrate its model using a top-line representative random sample.
Recalibrating the model ensures that the AVM's estimates align more closely
with the specific market conditions and property characteristics in the county
for which the assessment roll is being generated.
Using a top-line
representative random sample, the AVM vendor can play a crucial role in
addressing any potential issues related to bias or inaccuracies in the original
model that may have arisen from using different data sources or methodologies.
With this recalibration with a representative sample specific to the county where
the assessment roll will be used, the AVM vendor can significantly enhance the
accuracy and reliability of the property valuations provided to the County
Assessor, making them feel valued for their contribution.
Additionally, recalibrating the model allows the vendor to tailor the
AVM's estimates to reflect the local real estate market's unique
characteristics and trends, improving the valuation results' relevance and
usefulness for assessment purposes.
Collaboration is Key
The AVM vendor and County Assessor, both key players
in this process, can collaborate to ensure the recalibration process is
effective. This would benefit not only the AVM model but also the accuracy of property
assessments.
Sharing Data:
The County Assessor might have access to additional data on property
characteristics and sales that can help improve the randomness and
representativeness of the sample.
Transparency: The vendor should be transparent about their sampling methods
and the potential for bias in their original model, allowing the assessment
staff to assess the need for recalibration and its effectiveness.
Overall, recalibrating the AVM model using a
top-line representative random sample can help ensure that the property values
generated by the AVM align with the County Assessor's specific needs and
requirements and contribute to producing a fair and accurate assessment roll.
Using
Artificial Intelligence (AI) in Sales Sampling
Artificial Intelligence (AI) can play a crucial role in ensuring
that the modeling sales sample used in an AVM represents the population it
derives from in both the private and public sectors. Here are some ways in
which AI can help achieve a more representative sales sample:
1. Advanced
Sampling Techniques: AI algorithms can optimize sampling techniques, such as random sampling, stratified sampling, or
cluster sampling, to ensure that the sales sample is representative of the
population. AI can help identify patterns in the data and select sample points
that reflect the diversity of the overall population.
2. Feature
Selection: AI can assist in identifying the most relevant features or
variables to include in the sales sample, ensuring that the selected properties
capture the key characteristics of the population. Feature selection algorithms
can help prioritize variables that have the most significant impact on property
values.
3. Data
Quality Assessment: AI can analyze the quality of
the data used in the sales sample, identifying and addressing any biases,
errors, or missing values that could impact the sample's representativeness. AI algorithms can flag data inconsistencies and suggest corrective
actions to improve data quality.
4. Model
Validation and Calibration: AI can be used to validate AVM
models against actual sales data and adjust model parameters through
calibration to improve accuracy and ensure that the model reflects the population's actual characteristics. This iterative process helps refine the AVM
and enhance its representativeness.
5. Dynamic
Sampling Strategies: AI can enable dynamic sampling
strategies that adapt to changes in the real estate market or population
characteristics over time. By continuously monitoring data trends and adjusting
the sampling approach accordingly, AI can help maintain the relevance and
representativeness of the sales sample.
6. Fairness
and Bias Mitigation: AI can be leveraged to detect
and mitigate biases in the modeling sales sample, ensuring that the sample is
fair and equitable. Fairness-aware machine learning algorithms can help address
bias issues and promote inclusivity in the sample selection.
By incorporating AI-driven approaches in sampling, data
processing, model optimization, and validation, AVMs in the private and
public sectors can enhance the representativeness of the modeling sales sample,
leading to more accurate and reliable property valuations that better reflect
the target population's characteristics.
Sid's AI-Assisted Bookshelf: Elevate Your Personal and Business Potential
No comments:
Post a Comment