Measuring Site Link Performance: Volume I – Click-through rate
In a recent post Tom Van den Berckt discussed site links as a new feature on Google Adwords. Google’s new ad formats generated quite a bit of discussion at SMX West 2010. They include site links, product ads, local ads and comparison ads. Site links have been around the longest and the first performance data is starting to surface. At the recent SMX West conference, Google reported an average 30% to 40% increase in the click-through rate (CTR) for ad copy displaying site links. At the same conference, Clicks2Customers reported a similar increase in CTR based on our own experience across multiple clients. The general consensus was that site links have a positive impact on click-through rates. Google hinted at the expansion of site links to a wider set of keywords and also a potential increase in the maximum of 4 site links that are currently being displayed. Currently, site links only appear on the ads with the highest AdRank, which are typically those linked to brand terms.
At present, it is not possible to track site link performance directly using AdWords. We track site link performance through own campaign tracking system, by assigning a unique identifier to each site link within an ad group. This enables us to distinguish clicks that occur via the main ad copy link from those that occur via each of the site links for each ad group. We have been running site links on several client campaigns since November 2009 and outline our initial statistical findings on site link performance below.
We identify the earliest day that site link clicks started appearing on an ad group and then compare the performance of that ad group before and after the start of site links. Below we focus on the highest volume brand terms that have been displaying site links since November 2009. Before we consider changes in CTR, we measured what proportion of ad copy clicks occurred via site links rather than via the main ad copy link. In figure 1 it is shown that a relatively small proportion of overall clicks occur on the actual site links, across all the ad groups that displayed site links. It is important to note that site links do not always trigger and we are not able to tell how often they did trigger at this stage. This means that the true percentage of clicks that site links attract will be understated in figure 1.
Figure 1: Percentage of total ad group clicks attributable to site links for a range of advertisers
Next we consider click-through rate. Our aim is to establish if there has been a significant change in click-through rate after the introduction of site links. For illustrative purposes, we focus on a high traffic brand ad group for one of the above advertisers (Advertiser A).
In any statistical analysis a visual inspection of the data is good place to start, before embarking on more formal inference. In figure 2 we present our data graphically. The two plots at the top are two different ways to show the distribution of daily click-through rate before and after the introduction of site links. There is a clear upward shift in distribution of click-through rate after the introduction of site links. The density plots show an estimate of the actual densities and are just a different view to that presented by the boxplots, and also clearly show that click through rates have shifted upward since the introduction of site links. The bottom left figure shows the mean CTR before and after site links with a 95% confidence interval in each case. This plot suggests the difference in CTR is significant given the variability in the data. However, it is also apparent that there has been a slight shift in average position after the introduction of site links which should be taken into account in any inference. It is important to separate any effect on CTR caused by position from that caused by the introduction site links.
Figure 2: Visualization of click-through rate distributions before and after the introduction of site links on a branded ad group.
The box-and-whisker plots (or boxplots) in the top left plot are much underrated graphical tools and warrant a brief explanation. In figure 3 we show that the black line that divides the “box” represents the median of the data. The lower and upper edges of the box are represented by the lower and upper quartiles of the data. This means that 50% of the observations fall within the range of the box, while 25% fall below and above the lower (Q1) and upper (Q3) quartiles, respectively. The whiskers mark those values which are 1.5 * IQR from the upper and lower quartiles. The IQR is the inter quartile range: the distance between Q1 and Q3. If there are observations which are outside 1.5 * IQR or even 3 * IQR then they are considered as mild and extreme outliers, respectively.
Figure 3: Explanation of boxplots
The visual inspection of the data in figure 3 suggests that there has been an increase in click-through rate since the introduction of site links. We will now consider how we can perform more formal statistical inference on the data, in order to establish if the observed differences are statistically significant given the variability I the data.
A simple and reasonable approach would be to perform a t-test. This test can easily be performed in any statistical software or Excel. This test will compare the observed CTR after the introduction of site links to some value you believe the CTR to have centered on before the introduction of site links. This will produce a p-value that represents the probability that the observed difference is due to chance. A small p-value (typically less that 0.05) implies a significant difference. When we apply this test to the data represented in figure 3 we obtain a p-value of 0.00001, suggesting that there has been a significant increase in CTR after the introduction of site links.
There are two shortcomings to this approach:
- Binomial response: strictly speaking one should use a Chi-square test to conduct inference about the mean of a binomial variable as the standard t-test assume a normal population; this is particularly important when the sample size is relatively small;
- Position: If there has been any shift in average position before and after the introduction of site links, this need to be accounted for in the inference. As we know there is a very clear correlation between position and CTR, which a higher position resulting in a higher CTR. In order to get the most accurate measure for the direct impact of site links on CTR, we need to account for any position effects. This can be achieved through some more sophisticated statistical modeling.
If you are satisfied by basic statistics you should skip to the discussion. However, for those who want to take their campaign measurement to a new level and/or if you enjoy some more sophisticated statistics, this for you. In an earlier blog we introduced logistic regression as a very useful statistical modeling tool for modeling binomial responses as a function of other variables. Once again it will prove to be useful here.
Logistic regression is part of a wider group of statistical models called generalized linear models or GLMs. By using logistic regression, we can model the CTR function of any number of other explanatory variables. In our case average position and an indicator variable that shows if the observation was taken before or after the introduction of site links:
is defined as the odds of a click-through, so we are actually modeling the log-odds of the CTR (for theoretical reasons we will not go into here). In our model both the position and site link variables are deemed to be significant. The output of a logistic regression model is typically interpreted in terms of odds ratios. The model parameters can be used to compute the odds for a click-through for different values of an explanatory variable, all else being equal. The ratios of the odds ratios with their corresponding confidence intervals can then be used to conduct inference about the modeled data. For example can compute the odds ratio before and after the introduction of site links, at any given position as
is the regression parameter for the site link indicator variable. The odds ratio of 1.33 implies that odds of a click-through have increased by 33% after the introduction of site links in any given position. It is always important to compute a confidence interval for the odds in order to assess the significance of the change in the context of the data. A 95% confidence interval for the above odds ratio is [1.319, 1.335]. The relatively narrow confidence interval suggests that we have high confidence that there is a real increase in the odds of a click of about 33%. We can also visualize our fitted model by re-expression the model above in terms of the CTR and plotting it over a range of positions. This is visualization is presented in figure 4.
In this post we showed how we can make use of more sophisticated statistical tool to evaluate the performance of site links. We conclude that site links do increase the click-through rate, with a user generally being more than 30% more likely to click on an advertisement with site links than without site links. In figure 1 we show that the overall percentage of clicks that happen through site links is relatively small and not enough in itself to drive the observed increase in click-through rate. The conclusion is that the inclusion of site links increase the actual size and visibility of the ad copy, while pushing organic results further down the page. This seems to drive more clicks to main ad copy link thereby increasing the overall click-through rate. This agrees well with some of our initial observations about the introduction of site links. This is something we will continue research. Our CTR results agree quite closely to the preliminary data presented by Google at the recent SMX West conference in Santa Clara. It seems like it is a good idea to start experimenting with site links on your top keywords.
Beyond an increase in CTR, it would also be of interest to see whether site links can be used to affect the conversion behavior of advertisements. In volume II of this post we will present some initial data on this aspect of site link behavior, as well as data on testing different types of site links.
All the graphics and inference in this post were produced using R (http://www.r-project.org), a leading free software environment for statistical computing and graphics that we use internally for statistical modeling.