Become a better keyword vintner

A key objective in PPC is to grow revenue within efficiency targets. This can be achieved through good bid management on the existing keyword set, as well as through the addition of new keywords. It is easy to add thousands of keywords using product feeds or a machine generation method. These keyword sets often add very little term value in a PPC campaign. Quantity is no substitute for quality. It is important to track the quality of keywords you add to a PPC campaign over time and possibly by the source of the keyword e.g. product feed, keyword research, search query mining etc.

Some questions you may want answer include:

  • What is the average additional revenue we generate per keyword added (and is it within efficiency targets)?
  • Is our initial evaluation period for new keywords sufficient, or are we potentially missing opportunities?
  • What is the best source for new keywords?
  • Is the quality of new keywords improving over time?

The answers to the above questions are not always apparent from traditional keyword performance reports. Keyword portfolios have already been compared to investment portfolios when it comes to keyword bidding. We use a similar analogy which compares keyword portfolios to loan portfolios. This allows us to apply the idea of a vintage curve for a loan portfolio to keyword portfolios. A loan vintage is a group of accounts that originated in a given time period.  For example, all new customers from 2009 make up the 2009 vintage. Vintage groupings can be monthly, quarterly, or annual, depending on the application or data available. Different groups of customers from the same vintage can also be compared to assess their relative quality (e.g. walk-in customers versus directly solicited pre-selected customers). A performance metric such as the proportion of accounts more than 90 days in arrears is often chosen to compare different loan vintages by time on book. If we make the analogy that new keywords represent new customers we can extend the idea of loan vintages to that of keyword vintages. Similar to loans, keywords are added at different time and can originate from different sources (e.g. product feeds or a break-down of search query reports). We can track performance metrics with “keyword age”, which may include percentages of new keywords with a click per impression or revenue per new keyword after a certain time period.

Below we will illustrate one application of a vintage analysis on real campaign data to gain some insight into new keywords. Many more applications are also possible. In figure 1 we consider 8,467 new keywords added to a PPC campaign from April to August 2009. Suppose we start a large set of new keywords on a fairly aggressive initial bid in order to gather performance data with which to optimize keyword bids. We would expect that a large proportion of them would be bid down over time in order to achieve an acceptable or agreed efficiency target, until we reach a stable overall bid level for these keywords as a group. When we study a larger set of campaign data across various advertisers, this seems to happen on average after about 3 months. A similar trend is apparent in figure 1 for our example campaign. It suggests that we achieved an average monthly long-term revenue of about $4 per original keyword added. It is clear how new keywords were bid down over time as performance was optimized.

Figure 1: Revenue per new keyword added versus Average Bid

Figure 1: Revenue per new keyword added versus Average Bid

There is an important variable we did not consider in the above view, and that is the efficiency metric for the new keywords over time. In the case of the above campaign the efficiency metric is cost of sales (i.e. PPC cost/sales). A target of around 8% (allowing for statistical noise) on non-brand terms is an acceptable benchmark for this campaign. In figure 2 below we show the average revenue per keyword (as shown in figure 1) against the cost of sales by keyword age.  Despite consistently decreasing bids on new keywords over time, the overall efficiency varied around the 8% cost of sales benchmark throughout. This suggests that we potentially did not give new keywords enough time to prove themselves before decreasing bids on them. If we kept the average bid higher for a longer period we can potentially achieve a higher average long-term revenue than $4 per keyword added, within the efficiency target. The keyword bidding in month 8 seems to try and address this efficiency by raising the average bid. Hopefully we have not priced too many of the original keywords out of contention by this stage.

Figure 2: Revenue per new keyword versus Efficiency

Figure 2: Revenue per new keyword versus Efficiency

Similar to the above curves you can track the average revenue that you gain from each new keyword added for different cohorts of keywords. This often decreases over time if your inventory remains fairly stable over time, as the most relevant keywords are often selected upfront if your initial campaign setup and research processes are well developed. Treating your keywords as ‘vintages’ can provide useful insight about the quality and performance of your new keywords over time. It also gives you an approximation of the average revenue you can expect per keyword added and at what efficiency. I hope 2010 will be a good vintage for your PPC keywords!

  • Share/Bookmark

The paid search keyword tail

There is no doubt that the keyword tail is important, provided it is effectively managed in a resource efficient manner, for example by using a bid management system.  The difficulty in assessing just how much the tail contributes lies in the exact definition of the tail itself.  The contribution of the tail to total campaign revenue is quite sensitive to this definition.   There are numerous definitions of the keyword tail and head out there.  A good practical view of the keyword head is to consider it as the number of keywords an analyst can manage manually with reasonable ease.  A number of 250 to 500 seem to be popular choices.  Other definitions specify fairly arbitrary click thresholds.  In this post we will consider a visual way for assessing the size of your campaign’s tail. It is also important to assess if your keyword tail meets your campaign’s overall efficiency targets.  The tail can add great value to your campaign, but it needs to be managed effectively, just like the rest of your campaign.

A useful way to view the tail is to sort all keywords by their traffic contribution over a specified period.  This enables us to divide keywords into different segments based on their contribution to total traffic, and then to compute the contribution of each of these keyword segments to total revenue and their relative efficiency.   The choice of segments need some care due to the discrete nature of keyword traffic for lower volume keywords. Figure 1 below shows how this data can be visually represented to give you an overview of an entire campaign.  This can then be visually represented as illustrated in figure 1.  The data represents the performance of all non-brand keywords for an online retailer over the last 3 months of 2009.  The blue bar represents about 3,818 keywords (23% of the 16,601 keywords with an impression over the period).  If we consider this group of keywords as the head keywords, it will provide a fairly conservative estimate of the tail.  In this definition the tail keywords still contribute about 24% of total revenue at an acceptable efficiency given the campaign’s efficiency targets.

Read more

  • Share/Bookmark

Measuring Site Link Performance: Volume I – Click-through rate

In a recent post Tom Van den Berckt discussed site links as a new feature on Google Adwords.  Google’s new ad formats generated quite a bit of discussion at SMX West 2010.  They include site links, product ads, local ads and comparison ads.  Site links have been around the longest and the first performance data is starting to surface.  At the recent SMX West conference, Google reported an average 30% to 40% increase in the click-through rate (CTR) for ad copy displaying site links.  At the same conference, Clicks2Customers reported a similar increase in CTR based on our own experience across multiple clients.  The general consensus was that site links have a positive impact on click-through rates.  Google hinted at the expansion of site links to a wider set of keywords and also a potential increase in the maximum of 4 site links that are currently being displayed.  Currently, site links only appear on the ads with the highest AdRank, which are typically those linked to brand terms.

At present, it is not possible to track site link performance directly using AdWords.  We track site link performance through own campaign tracking system, by assigning a unique identifier to each site link within an ad group. This enables us to distinguish clicks that occur via the main ad copy link from those that occur via each of the site links for each ad group.  We have been running site links on several client campaigns since November 2009 and outline our initial statistical findings on site link performance below.

We identify the earliest day that site link clicks started appearing on an ad group and then compare the performance of that ad group before and after the start of site links.   Below we focus on the highest volume brand terms that have been displaying site links since November 2009.  Before we consider changes in CTR, we measured what proportion of ad copy clicks occurred via site links rather than via the main ad copy link.  In figure 1 it is shown that a relatively small proportion of overall clicks occur on the actual site links, across all the ad groups that displayed site links.  It is important to note that site links do not always trigger and we are not able to tell how often they did trigger at this stage.  This means that the true percentage of clicks that site links attract will be understated in figure 1.

Site Link Click Percentage

Figure 1: Percentage of total ad group clicks attributable to site links for a range of advertisers

Next we consider click-through rate.  Our aim is to establish if there has been a significant change in click-through rate after the introduction of site links.  For illustrative purposes, we focus on a high traffic brand ad group for one of the above advertisers (Advertiser A).

Visual inspection

In any statistical analysis a visual inspection of the data is good place to start, before embarking on more formal inference.  In figure 2 we present our data graphically.  The two plots at the top are two different ways to show the distribution of daily click-through rate before and after the introduction of site links.  There is a clear upward shift in distribution of click-through rate after the introduction of site links.  The density plots show an estimate of the actual densities and are just a different view to that presented by the boxplots, and also clearly show that click through rates have shifted upward since the introduction of site links.  The bottom left figure shows the mean CTR before and after site links with a 95% confidence interval in each case.  This plot suggests the difference in CTR is significant given the variability in the data.  However, it is also apparent that there has been a slight shift in average position after the introduction of site links which should be taken into account in any inference.  It is important to separate any effect on CTR caused by position from that caused by the introduction site links.

CTR Summary Plots

Figure 2: Visualization of click-through rate distributions before and after the introduction of site links on a branded ad group.

The box-and-whisker plots (or boxplots) in the top left plot are much underrated graphical tools and warrant a brief explanation.  In figure 3 we show that the black line that divides the “box” represents the median of the data.  The lower and upper edges of the box are represented by the lower and upper quartiles of the data.  This means that 50% of the observations fall within the range of the box, while 25% fall below and above the lower (Q1) and upper (Q3) quartiles, respectively.  The whiskers mark those values which are 1.5 * IQR from the upper and lower quartiles. The IQR is the inter quartile range: the distance between Q1 and Q3. If there are observations which are outside 1.5 * IQR or even 3 * IQR then they are considered as mild and extreme outliers, respectively.

boxplot_explanation

Figure 3: Explanation of boxplots

Statistical Inference

The visual inspection of the data in figure 3 suggests that there has been an increase in click-through rate since the introduction of site links.  We will now consider how we can perform more formal statistical inference on the data, in order to establish if the observed differences are statistically significant given the variability I the data.

A simple and reasonable approach would be to perform a t-test.  This test can easily be performed in any statistical software or Excel. This test will compare the observed CTR after the introduction of site links to some value you believe the CTR to have centered on before the introduction of site links.   This will produce a p-value that represents the probability that the observed difference is due to chance.  A small p-value (typically less that 0.05) implies a significant difference.  When we apply this test to the data represented in figure 3 we obtain a p-value of 0.00001, suggesting that there has been a significant increase in CTR after the introduction of site links.

There are two shortcomings to this approach:

  1. Binomial response: strictly speaking one should use a Chi-square test to conduct inference about the mean of a binomial variable as the standard t-test assume a normal population; this is particularly important when the sample size is relatively small;
  1. Position: If there has been any shift in average position before and after the introduction of site links, this need to be accounted for in the inference.  As we know there is a very clear correlation between position and CTR, which a higher position resulting in a higher CTR.  In order to get the most accurate measure for the direct impact of site links on CTR, we need to account for any position effects.  This can be achieved through some more sophisticated statistical modeling.

If you are satisfied by basic statistics you should skip to the discussion.  However, for those who want to take their campaign measurement to a new level and/or if you enjoy some more sophisticated statistics, this for you.  In an earlier blog we introduced logistic regression as a very useful statistical modeling tool for modeling binomial responses as a function of other variables.  Once again it will prove to be useful here.

Logistic regression is part of a wider group of statistical models called generalized linear models or GLMs.  By using logistic regression, we can model the CTR function of any number of other explanatory variables. In our case average position and an indicator variable that shows if the observation was taken before or after the introduction of site links:

 \log\left(\frac{CTR}{1-CTR}\right )  \~  \mbox{Position} + \mbox{Sitelink}

where   \frac{CTR}{1-CTR} is defined as the odds of a click-through, so we are actually modeling the log-odds of the CTR (for theoretical reasons we will not go into here).  In our model both the position and site link variables are deemed to be significant.  The output of a logistic regression model is typically interpreted in terms of odds ratios.   The model parameters can be used to compute the odds for a click-through for different values of an explanatory variable, all else being equal.  The ratios of the odds ratios with their corresponding confidence intervals can then be used to conduct inference about the modeled data. For example can compute the odds ratio before and after the introduction of site links, at any given position as  \Omega_{\mbox{before},\mbox{after}} =<br />
\exp \left(\beta_{\mbox{sitelink}} \right) = \exp \left( 0.283\right) = 1.33

where

 \beta_{\mbox{sitelink}

is the regression parameter for the site link indicator variable.  The odds ratio of 1.33 implies that odds of a click-through have increased by 33% after the introduction of site links in any given position.  It is always important to compute a confidence interval for the odds in order to assess the significance of the change in the context of the data.  A 95% confidence interval for the above odds ratio is [1.319, 1.335].  The relatively narrow confidence interval suggests that we have high confidence that there is a real increase in the odds of a click of about 33%.   We can also visualize our fitted model by re-expression the model above in terms of the CTR and plotting it over a range of positions. This is visualization is presented in figure 4.

Discussion

In this post we showed how we can make use of more sophisticated statistical tool to evaluate the performance of site links. We conclude that site links do increase the click-through rate, with a user generally being more than 30% more likely to click on an advertisement with site links than without site links.  In figure 1 we show that the overall percentage of clicks that happen through site links is relatively small and not enough in itself to drive the observed increase in click-through rate.  The conclusion is that the inclusion of site links increase the actual size and visibility of the ad copy, while pushing organic results further down the page.  This seems to drive more clicks to main ad copy link thereby increasing the overall click-through rate.  This agrees well with some of our initial observations about the introduction of site links. This is something we will continue research.   Our CTR results agree quite closely to the preliminary data presented by Google at the recent SMX West conference in Santa Clara.   It seems like it is a good idea to start experimenting with site links on your top keywords.

Beyond an increase in CTR, it would also be of interest to see whether site links can be used to affect the conversion behavior of advertisements.  In volume II of this post we will present some initial data on this aspect of site link behavior, as well as data on testing different types of site links.

GLM Model

All the graphics and inference in this post were produced using R (http://www.r-project.org), a leading free software environment for statistical computing and graphics that we use internally for statistical modeling.

  • Share/Bookmark

Effects of Site Links on Adwords Campaigns

One of the recent new features on Google Adwords has been the ability to add Site links to certain campaigns. These campaigns are the ones that Google deems eligible to display additional links in the Adwords advert (similar to what they’ve been doing for quite a while on the organic search results (see fig. 1). On the SERPS, these additional organic links provide the user with a quick way to navigate to subsections of a website without having to go the homepage first, effectively saving the user the effort of an extra click.

sitelinks fig 1

Fig. 1: Site links on organic results

So at first sight, it seems like a good idea for Google to implement the same feature on Sponsored Links because it adds value and relevance to the user. But could there be other motives behind Google’s action?

The first thing to point out is that in reality, Google (currently) only seems to display site links on ads with a very high Adrank, which basically means ads linked to brand (or trademark) keywords. Now there has been a lively debate in the search marketing world over whether or not one should bid on one’s own trademark. After all, why would you spend money on clicks that would find you in the (free) organic results anyway? That is a valid argument but unfortunately it is also a bit naïve. As Fig. 2 shows, if you don’t bid on your brand, someone else eventually will and no matter how good your organic rankings, you will lose clients to your competitors. Cause the truth is, Google does not make any money from organic results and if possible they will try to put a sponsored link above the organic results. Your competitor doesn’t even have to bid on your brand, Adwords’ Expanded Broad Matching might automatically match your competitor’s brand to yours. (Example: Google will match the keyword ‘Avis’ ,broad, to a user searching for ‘Hertz’ – i.e. without Avis actively bidding on the Hertz keyword). Have a proper look at your search query reports and you would be surprised at how frequently this occurs.

sitelinks fig 2

Fig. 2: Competitors reducing the effect of organic site links

With the site links feature, Google provides a trademark owner the ability to appear more prominently on the SERPs but only if they choose to bid on their trademarked keywords. If we look at fig. 3 it is obvious that site links make a sponsored link appear more prominent for a trademark owner, even as the organic results are being pushed lower down the page.

sitelinks fig 3

Fig. 3

This post is based on qualitative observations on popular brand that does not belong (or is related) to our client portfolio but we have quantitative data to back up our theories (more on that in a next post). Unlike in the organic results, site links are not only intended to increase user relevance, but also to increase Google’s revenue by making the competition for prime SERPs advertising space more fierce. Organic (free) listings are being pushed lower down the page, forcing lower ranking websites to start paying for more prominent sponsored listings. For trademark owners site links provide a competitive advantage but at the bottom of the ladder we suspect that competition and click prices will increase.

In a next post, we’ll take a more detailed look at the impact of site links on a campaign’s performance based upon our experience of the last few months.

  • Share/Bookmark