Digital Strategy: Keep The Wheel Turning

What is Digital Strategy?

If you go to Wikipedia the answer you will get is this:

“Digital strategy is the process of specifying an organization’s vision, goals, opportunities and initiatives in order to maximize the business benefits, digital investments and efforts provide to the organization.”

While this definition may seem very abstract it contains a few key points which allows us to greatly simplify it into something much more digestible:

At the core of this definition is the fundamental assumption that every organisation and digital investment has a goal. Moreover, that the goal of a digital investment is inextricably linked to the goals of the organisation.


Despite focusing on the digital assets of an organisation, at the end of the day digital strategy is still just a strategy. As such the process thereof can be broken up into four steps of strategy development:

One mustn’t however forget the fundamental building block of measurement, which is critical for each step of the way. This is especially relevant due to the measurable nature of the digital world, which is also one of its key strengths.

You might think this a very broad theoretical outline, which is hard to keep focused of in day to day activities. That may be true, especially since each phase will have its own sub processes which more often than not, incorporates elements from other phases. That said, I still think it is very important to have an ongoing and structured strategic outlook which incorporates these different phases, and most importantly the relation between them.

This means for instance, making sure that in the planning phase, a framework is properly outlined, to be effectively developed in the implementation phase, so that when the organisation has to react (possibly months or even years from now), it has a solid framework which allows it to be agile. A classic example would be how Amazon was able to completely change their ‘store front’ only hours after Steve Job passed away to accommodate the expected surge in interest in Steve and Apple in general.

At the end of the day my point is: Agility is arguably one of the most common elements which distinguish industry leaders. Moreover, it is only with a holistic and ongoing view of the strategic process that this agility can be achieved, especially in the digital space where things are moving so quickly.

A request to Google: enable us to pay a fair price search partner traffic

We would like to add our voice to those out there that have been pleading with Google for some time to give us the ability to differentiate bids by search partner on Google’s syndicate network.  Industry leaders such George Mitchie have in the past presented data illustrating that the quality of traffic coming from the Google search engine itself is significantly higher than the traffic coming from search partners.  Here we will present some of our own data to reinforce the fact that the quality of traffic we get via search partners is (with some exceptions) of a lower quality than the traffic we get directly from Google’s search engine.  Additionally, we will show that there are geographical differences in the value of traffic and that the value of brand and non-brand traffic can potentially be quite different for search partners.  We do not dispute the additional value of traffic from the syndication partners, but we just want the ability to pay the right price for that traffic based on its inherent value.

The proportion of Google search traffic that comes from syndicate partners varies quite a bit from client to client.  On average, we see about 20-25% of traffic coming from search partners for US clients, while it tends to be closer to 10% for Australasian clients.  Data for collections of US and Australasian clients show that search partner traffic is generally of a considerably lower quality.  In figures 1 and 2 we sort referring domains by click volume across a set of US and Australasian clients, respectively.  We then calculate the ratio of each domains average conversion rate to the overall average conversion rate for the client.   This comparative conversion rate data reveal the differences between Google and other referring domains.  The color of the bar labels for the different domains distinguish between 3 groups of domains: red (conversion rate at least 10% below overall conversion rate), green (conversion rate at least 10% above overall conversion rate) and black (conversion rate within 10% of overall conversion rate).

Figure 1: Relative traffic value by search partner for a selection of US clients

Figure 1: Relative traffic value by search partner for a selection of US clients

Figure 2:

Figure 2: Relative traffic value by search partner for a selection of AUS clients

In the US and Australasia most search partners bring in traffic that is well below that from in quality.  There are some differences in relative traffic value between the two regions, notably for Amazon traffic.  Out of interest we repeated the above analysis by excluding all brand traffic first.  The results show that the relative traffic value between search partners change considerably when we do this.   In figure 3 we show data for the set of Australasian clients above after excluding all brand traffic.  EBay is an example where the exclusion of brand traffic has increased the relative value of its traffic considerably. Volumes do become quite thin so the inherent statistical variability should be kept in mind in our inference.  It is clear that the traffic from most search partners is quite a bit lower than direct search traffic from Google.  The ability to bid differently by search partner would therefore ad quite a bid of value.  Adwords advertisers have the ability to bid differently for the Google domain and the rest of the network by separating all campaigns into Google only versions and exact copies of them to Google + Search partners versions.   The bids for the Google-only campaigns are higher because the conversion rates are higher, hence on only the Google-only campaigns are in play. That means that the Google + Syndication partner campaigns actually only serve ads on the syndication network and the bids can be depressed for that traffic.  This is a rough workaround and not ideal.   The ability to bid differently by domain would be much better.

Figure 3:

Figure 2: Figure 3: Relative non-brand traffic value by search partner for a selection of AUS clients

Can Google queries help predict economic activity?

In Bill Tancer’s book Click, he gives some examples of how near real-time Internet data provides a time advantage over traditional leading economic indicators.  These indicators are typically only available with a time lag.  The data for a particular month is generally released about halfway through the next month.  I found this concept quite interesting when I read the book a year ago.  I never really pursued it analytically myself, until I recently discovered a nice interface to query Google Trends data from within the leading freely available open-source statistical software package R.   The R package RGoogleTrends (developed under the Omegahat Project) provides a very useful tool to extract and analyze Google query data in an efficient manner.   In the documentation for this package it is stated that its development was inspired by a blog post by Google’s chief economist, Hal Varian, which was published on the Google’s Research Blog.  They illustrate some simple forecasting methods, and encourage readers to undertake their own analyses.  By their own admission it is possible to build more sophisticated forecasting methods.  We decided to take up the challenge, because at Clicks2Customers we are always keen for an analytical challenge, especially if it comes from the mighty Google.

R has a wide range of sophisticated time series packages, which we decided to put to the test to see if the incorporation of query data can indeed improve the estimation and forecasting of leading economic indicators.  In this post we will focus on the monthly home sales data released by the US Census Bureau and the US Department of Housing and Urban Development at the end of each month and which was used in Google’s study.  In order to make our results comparable to that of Google, we use the same January 2004 to July 2008 time window.

Our aims are two-fold:

  1. Verify that a more sophisticated time series modeling approach improves accuracy compared to Google’s relatively simple models
  2. Verify that the inclusion of query data in models improves the accuracy of estimates

In figures 2 and 3 we show the raw and seasonally adjusted home sales data downloaded from the US Census Bureau.   Similar to the Google study we will start our modeling process on the seasonally adjusted sales figures.   This is to aid a comparison of our results with those of Google, although the seasonal component can easily be modeled directly.  Google Trends provides an index of the volume of Google queries by geographic location and category.  The query index of a search term reflects the query volume for that term in a given geographical region divided by the total number of queries in that region at a point in time.  This index is then normalized relative to January 1, 2004.  The index at a later date therefore reflects a percentage deviation from January 1, 2004.  Google Trends data is also reflected on a category and sub-category level.   Figure 3 reflects the search index data for the ‘Real Estate’ category and 5 of its sub-categories: Real Estate Agencies, Home Financing, Home Inspections & Appraisal, Property Management, and Rental Listings & Referrals.

Figure 1: Raw Home Sales Data

Figure 1: Raw Home Sales Data

Figure 2: Seasonally adjusted home sales data

Figure 2: Seasonally adjusted home sales data

Figure 3: Google query volumes for the Real Estate category and 5 of its sub-categories

Figure 3: Google query volumes for the Real Estate category and 5 of its sub-categories

The Google study fits simple auto-regressive models using standard linear model fitting functions.  A closer investigation of these models shows that they do not adequately model the correlation structure in the data.  We will follow a more classical time series approach based on the classic autoregressive integrated moving average (or ARIMA) time series models.  In our study we will first model the house sales data on its own, in order to establish a performance benchmark.  Thereafter, we will incorporate query data in the models to test if its inclusion can improve the prediction of house sales data.  We will evaluate the prediction of the different models by making a series of one-month ahead predictions and compute the prediction error, known as the mean absolute error (MAE), as defined in the Google study. Each forecast uses only the information available up to the time the forecast is made, which is one week into the month in question.

The simplest time series model that is closest to the null model (Model 0), presented by Google is an ARIMA(1,1,0) model.  The difference being that our model takes a lag-1 difference of the log-transformed data to reduce it to a stationary data series, which is a necessary prerequisite.   This model provides a reasonable fit to the data and gives a prediction error of 6.03%, which is lower than the 6.91% of Google’s null model.  There is some suggestion in the data that a higher order auto-regressive model may provide a better fit.  We found that an ARIMA(7,2,0) model does result in an improved fit and a significantly reduced prediction error of 4.04%.  The previous model already outperforms Google’s more advance model (Model 1) with a prediction error of 6.08%, which already incorporates query data and house prices.  Next we take it up a notch by incorporating the above Google query data and fitting a multivariate time-series model.  We use the query data in the first week of each month.  We experiment with different combinations of the above query indices and found that the Property Management query index gives the lowest prediction error of 3.7%.  The model we fit is a vector auto-regressive model with a lag of 3 using the R package dse.  The monthly 1-step ahead prediction errors for the above models above are plotted in figure 3.

Figure 4: US home sales data 1 step ahead prediction errors

Figure 4: US home sales data 1 step ahead prediction errors

Let us return to our stated aims.  It seems like we have verified both aims, namely that a more formal time series approach improves considerably on the models presented by Google and that the inclusion of query data has the potential to further improve the 1-step ahead prediction in the case of the house sales data.  Our best performing model improves about 39% on the best model presented in the Google study in terms of 1 step ahead prediction accuracy (without incorporating the house price data used by Google yet).  There seems to be potential in using Google query data in forecasting economic data.

This is a single example and a proper study will have to apply a more sophisticated modeling approach to a much wider range of data sets.  The Google study also illustrates the use of Google Trends data in predicting travel visits.  In their example they use data from the Hong Kong Tourism Board.  We intent to perform a similar study using monthly tourism data released by Statistics South Africa in conjunction with Google Trends data for the period building up to the 2010 FIFA World Cup.  This should make for an interesting case study for the use of Google Trends data.  Keep an eye on our blog for the results sometime in the future!

getstats – promoting the understanding of statistics

Data is becoming more and more important in every sphere of society.  This is underlined by companies like Google that have it as their mission to organize the world’s information and make it universally accessible and useful.  Major consulting firms are acknowledging the emergence of data-driven decision making as an emerging global trend.  This is a trend that is not only limited to business world.  We are increasingly being exposed to statistics and data in our everyday lives.

The Royal Statistical Society  is launching its 10 year campaign for statistical literacy on World Statistics Day: 20/10/2010.  The vision for the campaign, known to its friends as getstats, is “a society in which our lives and choices are enriched by an understanding of statistics”.  Please visit for more information and to show your support  As a company operating in a data-driven industry, we are proud support this global initiative.

Quality score dynamics can vary between different advertisers and PPC markets

In an earlier post, it shown that the relationship between CTR and position can vary for different advertisers, especially in the higher positions.   These differences can potentially be explained by differences in brand traffic or are real differences in the click-through related to the dynamics of geographical PPC markets or the nature of the advertiser’s business.  It is interesting to note that the CTR in the higher positions seem to be considerably better for advertisers in less mature online markets namely Australia and South Africa.   We investigated this further by only focusing on non-brand specific keywords.  To achieve this we included a factor indicating the brand or non-brand status of keywords in our model. Figure 1 compares the four advertisers on all non-brand keywords with a QS of 7.   Even after non-brand differentiation we continue to observe substantially higher CTR in the top positions for Australian and South African advertisers compared to their UK & US counterparts used in the same model. This could be reflective of a lower level of competition relative to more developed markets resulting in a smaller number of competitive ads on a search results page, which in turn drives higher click-through in top positions. From this it is also clear that there are no fixed thresholds for determining quality score, rather it is determined on the relative CTR which varies across advertisers and potentially also  geographical regions

CTRvsQS7vsPOSvsBRAND_new1Figure 1: CTR by position across four advertisers for all non-brand keywords with a quality score of 7.