To effectively allocate marketing efforts, it is essential to identify guests with low, medium, or high purchase propensity. Therefore, a classification model can be developed to assess guests' likelihood of making a purchase.
However, guest behavior evolves at different stages of their journey, categorized as prospects, interested, qualified, and purchasers. As a result, a classification model must be built for each stage to accurately predict and adapt to changing guest behavior.
Data Segmentation:
Data can be segmented based on specific periods for analysis (pre-COVID/post-COVID), guest type (first-time vs. repeat guests), and their stage of journey viz. prospect, interested, qualified, purchaser.
Feature Engineering:
Categorical features with large number of levels can be consolidated into meaningful categories correlated with purchasing activity. Clustering algorithms further reduce feature complexity by grouping similar levels together..
Guest behavior exhibits non-linearity in relation to purchase behavior. Thus, Guest activity features which are captured based on web behavior and interactions related to marketing and sales tactics can be transformed into frequency and recency buckets using decision tree or DFS to measure guest engagement, adding non-linearity to predictors.
E.g.
Guest website visits: How many frequent visits in last 30 days before purchase lead to a purchase.
Guest marketing engagements: How many engagements of particular type are done in last 60 days of purchase and their impact on conversions.
Additional engineered features include:
Interest Indicators: "High/Low Interest" and "Future Interest" based on guest responses to marketing tactics.
Demographic indices: Household age and income to analyze segmented guest behavior.
Cart behavior: Frequency of visits before and after adding to cart and their correlation with purchase.
Search behavior: Number of searches leading to a purchase.
Feature Selection:
Features with high frequency with respect to purchase journeys, having strong correlation with purchase activity, and high predictive importance are selected. Variability plot analysis ensures features with different variance levels across decile buckets of purchase vs no purchase are prioritized.
Model Building & Predictions:
The selected features are given as input to a logistic regression model (ridge/lasso), ensuring proper distribution with respect to purchase behavior. The model undergoes automated backward selection, where:
Feature coefficients' p-values are assessed.
Features with low significance and weak correlation with purchase are removed.
Model Outputs:
Predicted purchase probability for guests
Weighted contribution of features influencing purchase behavior
Model Validation:
Testing on balanced data (purchase/no-purchase samples reflecting training distribution).
Precision and recall optimization, with threshold selection to balance false negatives (FN) and false positives (FP) based on business needs.
Evaluation Metrics: SHAP plots, gains charts, and lift curves to assess model performance.