# Measuring Site Link Performance: Volume I – Click-through rate

Figure 1: Percentage of total ad group clicks attributable to site links for a range of advertisers

Next we consider click-through rate.  Our aim is to establish if there has been a significant change in click-through rate after the introduction of site links.  For illustrative purposes, we focus on a high traffic brand ad group for one of the above advertisers (Advertiser A).

Visual inspection

In any statistical analysis a visual inspection of the data is good place to start, before embarking on more formal inference.  In figure 2 we present our data graphically.  The two plots at the top are two different ways to show the distribution of daily click-through rate before and after the introduction of site links.  There is a clear upward shift in distribution of click-through rate after the introduction of site links.  The density plots show an estimate of the actual densities and are just a different view to that presented by the boxplots, and also clearly show that click through rates have shifted upward since the introduction of site links.  The bottom left figure shows the mean CTR before and after site links with a 95% confidence interval in each case.  This plot suggests the difference in CTR is significant given the variability in the data.  However, it is also apparent that there has been a slight shift in average position after the introduction of site links which should be taken into account in any inference.  It is important to separate any effect on CTR caused by position from that caused by the introduction site links.

Figure 2: Visualization of click-through rate distributions before and after the introduction of site links on a branded ad group.

The box-and-whisker plots (or boxplots) in the top left plot are much underrated graphical tools and warrant a brief explanation.  In figure 3 we show that the black line that divides the “box” represents the median of the data.  The lower and upper edges of the box are represented by the lower and upper quartiles of the data.  This means that 50% of the observations fall within the range of the box, while 25% fall below and above the lower (Q1) and upper (Q3) quartiles, respectively.  The whiskers mark those values which are 1.5 * IQR from the upper and lower quartiles. The IQR is the inter quartile range: the distance between Q1 and Q3. If there are observations which are outside 1.5 * IQR or even 3 * IQR then they are considered as mild and extreme outliers, respectively.

Figure 3: Explanation of boxplots

Statistical Inference

The visual inspection of the data in figure 3 suggests that there has been an increase in click-through rate since the introduction of site links.  We will now consider how we can perform more formal statistical inference on the data, in order to establish if the observed differences are statistically significant given the variability I the data.

A simple and reasonable approach would be to perform a t-test.  This test can easily be performed in any statistical software or Excel. This test will compare the observed CTR after the introduction of site links to some value you believe the CTR to have centered on before the introduction of site links.   This will produce a p-value that represents the probability that the observed difference is due to chance.  A small p-value (typically less that 0.05) implies a significant difference.  When we apply this test to the data represented in figure 3 we obtain a p-value of 0.00001, suggesting that there has been a significant increase in CTR after the introduction of site links.

There are two shortcomings to this approach:

1. Binomial response: strictly speaking one should use a Chi-square test to conduct inference about the mean of a binomial variable as the standard t-test assume a normal population; this is particularly important when the sample size is relatively small;
1. Position: If there has been any shift in average position before and after the introduction of site links, this need to be accounted for in the inference.  As we know there is a very clear correlation between position and CTR, which a higher position resulting in a higher CTR.  In order to get the most accurate measure for the direct impact of site links on CTR, we need to account for any position effects.  This can be achieved through some more sophisticated statistical modeling.

If you are satisfied by basic statistics you should skip to the discussion.  However, for those who want to take their campaign measurement to a new level and/or if you enjoy some more sophisticated statistics, this for you.  In an earlier blog we introduced logistic regression as a very useful statistical modeling tool for modeling binomial responses as a function of other variables.  Once again it will prove to be useful here.

Logistic regression is part of a wider group of statistical models called generalized linear models or GLMs.  By using logistic regression, we can model the CTR function of any number of other explanatory variables. In our case average position and an indicator variable that shows if the observation was taken before or after the introduction of site links:

$\log\left(\frac{CTR}{1-CTR}\right ) \~ \mbox{Position} + \mbox{Sitelink}$

where  $\frac{CTR}{1-CTR}$ is defined as the odds of a click-through, so we are actually modeling the log-odds of the CTR (for theoretical reasons we will not go into here).  In our model both the position and site link variables are deemed to be significant.  The output of a logistic regression model is typically interpreted in terms of odds ratios.   The model parameters can be used to compute the odds for a click-through for different values of an explanatory variable, all else being equal.  The ratios of the odds ratios with their corresponding confidence intervals can then be used to conduct inference about the modeled data. For example can compute the odds ratio before and after the introduction of site links, at any given position as $\Omega_{\mbox{before},\mbox{after}} = \exp \left(\beta_{\mbox{sitelink}} \right) = \exp \left( 0.283\right) = 1.33$

where

$\beta_{\mbox{sitelink}$

is the regression parameter for the site link indicator variable.  The odds ratio of 1.33 implies that odds of a click-through have increased by 33% after the introduction of site links in any given position.  It is always important to compute a confidence interval for the odds in order to assess the significance of the change in the context of the data.  A 95% confidence interval for the above odds ratio is [1.319, 1.335].  The relatively narrow confidence interval suggests that we have high confidence that there is a real increase in the odds of a click of about 33%.   We can also visualize our fitted model by re-expression the model above in terms of the CTR and plotting it over a range of positions. This is visualization is presented in figure 4.

Discussion

Beyond an increase in CTR, it would also be of interest to see whether site links can be used to affect the conversion behavior of advertisements.  In volume II of this post we will present some initial data on this aspect of site link behavior, as well as data on testing different types of site links.

All the graphics and inference in this post were produced using R (http://www.r-project.org), a leading free software environment for statistical computing and graphics that we use internally for statistical modeling.