In part 1 of this series, we discussed the challenges of audience segmentation using Google Analytics 4 (GA4) data. In part 2, we will discuss the solutions to these challenges. In doing so, we detail the novel approach used in our submission to Google’s open-source Marketing Analytics Jumpstart. Solutions We don’t know how many segments we need or how many will be useful. We can think of this as a model optimization problem where we’d want to tune hyper-parameters. In such a case, we can run a k-means model within a hyper-parameter optimization framework (Optuna, hyperopt, vizier). Hyper-parameters for k-means could be: Number of clusters Number of iterations Tolerance The evaluation metrics could be: Silhouette score Mean Squared Distance Davies–Bouldin Index (DBI) However, optimization for only one of those metrics might have some side effects. Usually, the more clusters there are the better the metric score. As clusters get smaller, usually they are more cohesive and better separated, so our optimization would likely return a very high number of clusters as our best choice. Most of the time, we prefer a small number of clusters because they are easier to explain, and there is typically less business value in having […]
The post Using BigQuery and GA4: Thoughts on Audience Segmentation (Part 2) Solutions appeared first on Adswerve.