A Reasonable Solution for "(not provided)" Web Analytics Data
Posted at Feb 20, 2012 3:38:25 AM by Taylor De Luca | Share
If you're one of the few who actually pay close attention to search metrics, you may have recently noticed that about 15-20% of your organic search traffic now comes across as "(not provided)". This % has steadily increased ever since Google began using SSL search for all logged-in users.
For many of us, this is really big bummer. In effect, Google enabling SSL search has created an environment where search marketers can't source the types of keywords which are driving almost 1/4 of all organic search traffic. This makes it a bit harder to determine the value and impact of various organic search marketing efforts.
A Reasonable Solution for 'Not Provided'
First, let's define reasonable- Reasonable means that we use a scalable logic that paints a high-level picture of what the data actually means. We feel that this solution is actually good enough for a vast majority of websites which is why we use it for the most-part.
As an agency dealing with a wide variety of organic search campaigns, we've come up with a solution by providing a very simple and logical attribution methodology. This methodology relies on one assumption-- the proportions of each metric (visits, revenue, conversions, events, etc) is the same or similar for all 'not provided' traffic as it is for the remaining 'known' organic search traffic. For example, we have to assume that if, 20% of all known traffic is related to a certain keyword phrase, the same would be true for the unknown 'not provided' data. Likewise, if 40% of all sales came from a certain known organic keyword phrase, it would be safe to assume that a similar portion of the 'not provided' revenue would follow suite. We have no reason to feel that, in scale, logged0-in Google users behave significantly differently than those who are not logged-in.
By simply applying this logic and 'smoothing' to the 'not provided' data, we can continue to apply a logical approximation of the value of all 'unknown' traffic. To improve accuracy, we do this at the metric-level. So a different ratio will apply to 'visits' then does to 'revenue' or 'goal conversions' as we know that certain types of keywords are more likely to convert at proportions different to the actual volume of traffic, etc. Let's look at an example.
Data for Joe's Chicken Shack
(Joe's Chicken Shack is a made up business)
30% of all 'known' search traffic is related to the brand/business name (Joe's Chicken Shack) - Let's say the total is 300 visits.
70% of all 'known' search traffic is related to prospecting terms (BBQ restaurants, milwaukee BBQ restaurants, etc.) - The total of these is 700 visits
There are an additional 100 organic search visits that are reported by our web analytics provider (ie. Google Analytics) as being 'not provided'
Based on all known searches, we know that 3/10 searches are related to the business (or brand) name and the other 7/10 searches are prospecting terms. If we wanted to 'smooth' the total organic traffic into one metric for visits, we'd apply 30 of the 100 'not provided' visits to the brand/business name total and 70 to the prospecting total. In the end, we'd attribute 330 visits as being 'brand/business name' searches and 770 as being 'prospecting' searches.
As mentioned, we'd prefer applying this ratio at the metric-level. If we wanted to attribute the breakdown of revenue, we'd first calculate a new ratio based on the known revenue data (as opposed to just using the 30/70 split above).
As a Web analyst, we'd always prefer to have 100% of the data but this is the best work-around we can find at this time.While applying this smoothing to your existing data isn't a lot of fun (lots of excel work), we feel it is needed as we anticipate the % of unknown or 'not provided' searches will only increase over time so applying smoothing logic will become important to making good marketing decisions. Feedback on this logic?