I love analytics, but I am often unable to take advantage of some of the interesting and complex tracking Google Analytics can do because I do not have the time to set up the code properly. We are a small shop — two people to do all things web-related from simple updates to managing full-scale web projects to design to building custom functionality to giving our team and clients analytics on their sites. At the time this was written, the two of us manage about a hundred areas of content, 25 WordPress instances, and over 5,000 pages of content and growing. We simply do not have the bandwidth to plan that far ahead. We add the default Google Analytics code and hope for the best. That said we often need analytics beyond the basics Google Analytics provides and this is when my curiosity and love of data kicks in. I thrive when I get to find ways to get the information we need even if we didn’t plan ahead to capture it.
We have a site that we call our “content repository.” It is a WordPress site with Posts for every piece of article or story content we produce or is produced for us (we have a highly decentralized environment). We call it our “ribbon content” – the content that weaves through all of our websites like a ribbon but is managed in one place. The articles or stories live natively in the repository site or we put a Post in the repository that has the title, excerpt, tags and categories then we redirect to the source content. We do this so we can generate feeds for all of our content and keep all our normally static sites “fresh” with ribbon content via feeds. That said, we have not coded the content in our repository to use Content Groupings in Google Analytics so I have to do a little creative and sometimes time consuming analysis, but we get the results we need.
Here’s the scenario. We want to know if the content we are producing in a particular area is performing as we would like and getting the results we want. We want to know:
- How well is the content performing? (primarily looking at Pageviews)
- Are our calls to action driving traffic to our website? (we are using Page Tracking so we can use Campaign URLs to see if particular links and buttons are being clicked.)
- Are visitors actually reading the content? Is our content engaging? (this is when things get complicated – and fun!)
Measuring Engagement
Here is how I find out how engaging our content is. It seems like a long and complicated process, but, in reality, it is pretty straight forward and gets the job done.
- From our WordPress site, I noted the IDs of all the categories (or tags – in my case it was categories) of the area of content in which I was interested.
- Next, I used WP-CLI: Command line interface for WordPress to query the database of the repository WordPress site for all the stories in these categories and ported that output to a text file. The data I selected were ID, title, slug, categories, date posted and word count for the post. Word count is not a field in the database so I calculate it. See the end of this article for the query I used.
- Open that text file in Excel and clean the data a bit (removing duplicates and the inevitable posts you know are miscategorized or otherwise in the wrong place or in draft). When the data are clean, select all the slugs.
- Paste the slugs into a new text file and search and replace the line breaks with pipes (e.g., slug-1|slug-2|slug-3).
- Copy the slugs with pipes into your clipboard.
- Next, I went into my Google Analytics dashboard for the site and drilled down to All Pages (Behavior -> Site Content -> All Pages) and added a new segment.
- Give the segment a name
- Select Condition
- Choose Page
- Choose “matches regex”
- Paste in the slugs with the pipes between them
Now I went back to my All Pages report and selected the date range for which I was interested (I did a 3 year period) and looked at the segment I just created. Here I was interested in Pageviews and Average Time on Page, but I wasn’t satisfied with just that. I wanted to dig a little deeper.
I exported the top 100 results into an Excel spreadsheet and began the real work for this analysis.
Next, I merged and matched up data from the Google Analytics spreadsheet with the top 100 results with the original spreadsheet that was generated from the SQL query. I used the slugs to match up the data since the slug is the URL and the GA data has the URL. Essentially, I wanted a spreadsheet that had average time on page from Google Analytics and the word count from the database query.
Understanding that the average person reads about 250 words per minute, I created a new column in the spreadsheet for how long it should take people to read the article or story. I compared how long, on average, people viewed the page with how long it should take the average person to read the page. Essentially, the difference between those values resulted in a new variable I called Engagement. Negative values indicated low engagement, zero indicates perfect engagement, and positive values mean more time was spent on the page than it takes the average person to read it. This works well for me because our audience is highly educated and looking to get a professional degree so likely read at or above the average reading level. One day I will try to calculate the standard deviation and look at values that fall between one standard deviation of the mean, but, for now, this works well enough.
Even knowing that all kinds of factors affect “average time on page,” it is not unreasonable to assume if someone spent, on average, about the same amount of time (or more) on the page that an average person would need to read the number of words on the page that they actually read it. If they read it, then we could conclude the content was engaging.
Creating More Segments Based on Categories
In addition to looking at all of the content in the area of interest, I am also interested in how subsets of that content perform. To do that, I need to create some more segments that I will base on the categories I extracted initially from the database and that are in the spreadsheet. I created new columns breaking out the categories. Since some articles and stories could be in more than one category, if I wanted to analyze each category separately, I would need a way to select all articles in a particular category. I could either run another SQL query with just the category ID in which I am interested, or I can do it in Excel with this formula:
=ISNUMBER(SEARCH(substring,text))
So, for example, if my categories are in column C and I want to add a new column of 1 or 0 if the particular category name appears in the category list, I would use this formula:
=IF(ISNUMBER(SEARCH("Category 1 Name",A1)), "1", "0")
Then fill down the new column. Repeat for each individual category you want to look at. For example, I have three categories, MCM, MAPP and MIDGH. I would create three new columns for each of these swapping out “Category 1 Name” in the formula. From there, I could sort on those columns, select the slugs and create new segments in Google Analytics like I did above for the main segment and then look to see if there is a difference in the performance and engagement data by category.
Doing the Analysis
In the end, I have data in my spreadsheet that is of value and using the newly created segments in Google Analytics, I can go look at other analytics like Demographics and Behavior. I pull all this information together into a simple report with recommendations. In my case, I found a few interesting things based on this analysis.
- Word count did not make a lot of difference. I started with the theory that our content is too long, but I found that word count alone did not predict if someone was reading the full article. In many instances, the time on page was very close to the predicted time to read the article indicating the content was engaging regardless of word count. The recommendation I made based on this is we should do a qualitative analysis and look at the articles with highest levels of engagement versus articles with the lowest levels and see if we can find a pattern in the types of articles our site visitors like.
- We also found that across all categories the articles performed fairly well with a respectable amount of Pageviews. We did not look at Bounce Rate because the article is intended to drive to another site so the Bounce Rate is naturally high. (For reasons I alluded to at the beginning of this article, we have many websites but one content repository for articles from which we can pull feeds on all our sites of specific content we have tagged or categorized on our content repository. Google thinks these are different sites and when someone leaves the content repository for the site to which we are driving them, it thinks it is a bounce. I know we can “fix” this but that gets back to we don’t have time to recode all of our site analytics. We work with what we have.)
Wouldn’t it have been easier to just code for Content Groupings? Sure, but we are a small shop and I am a generalist who dabbles in a little of everything. Analytics happens to be something I am interested in and, if time allows, in the future I hope to code for Content Groupings, but, for now, I cobble together the data. And a little part of me says, where is the fun in having Google do all the work for me?
Sample Database Query
Here is the SQL query I used to selects posts in the categories with IDs 1, 3, and 5 as well as a new column with word count:
SELECT *, SUM( LENGTH(post_content) - LENGTH(REPLACE(wp_posts.post_content, ' ', ''))+1) AS 'wordcount' FROM wp_posts LEFT JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) LEFT JOIN wp_term_taxonomy ON (wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id) WHERE wp_term_taxonomy.taxonomy = 'category' AND wp_term_taxonomy.term_id IN (1,3,4) GROUP BY wp_posts.ID ORDER BY post_date DESC
You can clean out drafts and revisions easily in your spreadsheet or add those parameters to your query string. This just worked for me.