Serverless Twitter Analytics with CosmosDB and Logic Apps
Using the Azure cloud to understand click behavior
Part of the series: Serverless Link Shortener
I recently added a Logic App to my Azure portfolio, and the insights it provides are amazing. I chose Logic Apps because I needed an easy way to query Twitter. In the Logic Apps designer, I simply clicked âTwitterâ, logged in, and specified my search. It does the rest. I didnât have to write a line of code or even worry about the nuances of authentication. It was all handled for me!
Itâs been several months and tens of thousands of clicks since I started using my custom link shortener. I use this tool daily and leverage the analytics to refine the information I send out to social media. The goal is simply to find out how developers react to various topics so that I can eliminate the ones that are not interesting and focus on the ones that generate interest and excitement. In the context of the tool, a âclickâ is a âvoteâ for a particular topic.
I can look at the past 24 hours, for example, and see that although Twitteris by far the most active social medium I use, there is plenty of activity from my Channel 9 videos, this blog, and even GitHub.
I also started tracking data like the target host pages. This gives me an interesting visual of where most of the clicks through my URL tracker end up.
Recently, I realized that I was missing some important information. Although itâs not consistent, some sources are kind enough to provide referral information. Every request also provides a user agent. Combined, this information gives me valuable insights such as the browsers being used, whether my content is read on mobile or desktop, and where the clicks originate.
Iâll get into more of the user agent code in a different blog post, but adding referral information to the tracking data was as simple as:
if (parsed.Referrer != null)
{
doc.referrerUrl = parsed.Referrer.AsPage(HttpUtility.ParseQueryString);
doc.referrerHost = parsed.Referrer.DnsSafeHost;
}
This gives me both a page and a host so I can organize data at different levels. I love that đ CosmosDB is a schema-less database, so itâs trivial to add new properties as an afterthought. I just ignore the old data in my analytics that need referral information. After running with the new information for awhile, I noticed an interesting trend.
A large number of referrals were coming from ât.coâ which is Twitterâs own link shortener. In fact, I realized that every new tweet gets a unique âshort link.â This means that I can search Twitter for the exact short link URL, and find the original tweet that the referral came from! This is useful to me to analyze what tweets generate the most engagement, and to âfact checkâ my counts compared to Twitterâs stats.
Note: it would be great to understand activity at a retweet level, but this is not possible. A retweet simply references the original tweet, so there is no way of determining whether the click came from the original or a retweet. Therefore, I set up the logic to track the original tweet information, and from there I can always query statistics from twitter about likes and retweets.
I knew that Logic Apps would be the easiest way to process the Tweets, so I logged into my Azure portal and added a new Logic App. The designer makes it incredibly easy to connect to various resources and assets. Here is the first part of my workflow. The trigger is a timer I run every two hours and will tweak as needed based on activity. I initialize a variable to represent the current document and the Twitter link, then grab all documents that have a Twitter referral but havenât yet been mapped to the original tweet.
Notice the flexibility of the SQL syntax in being able to find a document that doesnât have a property defined yet (not(is_defined(âŠ))
). CosmosDB automatically indexes all properties, so these queries execute blazing fast.
I add a âforâŠeachâ loop to iterate the resulting documents.
I set the âdocument idâ variable to the current id, and the âtwitter URLâ variable to the Twitter website. This is the default Iâll use in case the search turns up empty.
Note: Logic Apps by default runs for loops in parallel. Although this is great for scale, the process Iâm writing is sequential because I need to map the document directly to the link I find. Therefore, I set the parallelism under âsettingsâ to â1.â I may refactor this in the future if I learn a way to scope the variables to the parallel process.
If youâre curious, here is a sample document that has a Twitter referral but hasnât yet been mapped to the original tweet.
{
"id": "07adda00-6b81-48a2-8ead-7acb52d9f1fc",
"page": "thenewstack.io/less-proving-enterprises",
"shortUrl": "CLM",
"campaign": "link",
"referrerUrl": "t.co/kX8cELXEYm?amp=1",
"referrerHost": "t.co",
"agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 9_2 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13C75 Safari/601.1 Mozilla/5.0,(iPhone; CPU iPhone OS 9_2 like Mac OS X),AppleWebKit/601.1.46,(KHTML, like Gecko),Version/9.0,Mobile/13C75,Safari/601.1",
"browser": "Safari",
"browserVersion": 9,
"browserWithVersion": "Safari 9",
"mobile": 1,
"mobileManufacturer": "Apple",
"mobileModel": "IPhone",
"mobileDevice": "Apple IPhone",
"platform": "Unknown",
"count": 1,
"timestamp": "2018-01-18T15:44:20",
"host": "thenewstack.io",
"twitter": 1,
"_rid": "[REDACTED]",
"_self": "[REDACTED]",
"_etag": "\"08001fba-0000-0000-0000-5a60c0d40000\"",
"_attachments": "attachments/",
"_ts": 1516290260
}
Now I search Twitter for tweets containing the link.
I only need one result. I also modify the referral URL a bit because sometimes there is a querystring attached that I donât need. Here is the backing code for the step.
{
"inputs": {
"host": {
"connection": {
"name": "@parameters('$connections')['twitter']['connectionId']"
}
},
"method": "get",
"path": "/searchtweets",
"queries": {
"searchQuery": "@{concat('https://', replace(item()['referrerUrl'], '?amp=1', ''))}",
"maxResults": 1
},
"authentication": "@parameters('$authentication')"
}
}
I then iterate the results (which at most Iâll get one) and branch based on whether or not an âoriginal tweetâ exists.
The variables are set either using the original tweet or the current tweet. âOriginal tweetâ is a nested object, so the easiest way to check for its existence is to coalesce with a known value. Here is the expression I compare to the text literal ânullâ:
coalesce(items('For_each_2')['OriginalTweet'], 'null')
The twitter URL variable now contains one of three possible values:
- The Twitter URL because the search did not return results
- The Original Tweet because the search returned a retweet
- The Tweet itself because the search returned an original tweet
The next step involves adding the new property to the document. There isnât a direct connector for this in Logic Apps, but Logic Apps play well with Azure Functions and adding the function was trivial:
[FunctionName(name: "UpdateTwitter")]
public static async Task<HttpResponseMessage> Twitter([HttpTrigger(AuthorizationLevel.Function,
"post",
Route = "UpdateTwitter/{id}")]HttpRequestMessage req,
[DocumentDB(Utility.URL_TRACKING, Utility.URL_STATS, CreateIfNotExists = false,
ConnectionStringSetting = "CosmosDb", Id = "{id}")]dynamic doc,
string id,
TraceWriter log)
{
if (doc == null)
{
log.Error($"Doc not found with id: {id}.");
return req.CreateResponse(HttpStatusCode.NotFound);
}
var link = await req.Content.ReadAsStringAsync();
if (!string.IsNullOrWhiteSpace(link))
{
doc.referralTweet = link;
}
return req.CreateResponse(HttpStatusCode.OK);
}
Functions make life so easy! By providing a CosmosDB (DocumentDB interface) binding, I donât have to do anything to retrieve the document. The binding automatically retrieves the id from the header, finds the corresponding document (if it exists) and then passes it into the function method. All I need to do is pull the URL that is passed as a simple text string as the body of the post and add it as a new referralTweet
property to the document. The binding even does the update for me!
There are two ways to call functions from Logic Apps. You can connect to certain functions directly, or you can call the function endpoint as a client. I chose the latter.
Notice I use the variables to add the document id to the route and include the twitter URL in the body. The design allows me to run this any time to capture unprocessed documents, and already is providing me with new insights on my dashboard.
Of course, now that I have the tweet, I can take it a step further and grab the tweet text to show more context or even a preview on the dashboard. But thatâs a task for a different day!
If you havenât already tapped into the power of Logic Apps, I encourage you to check them out today.
Until next time,
Part of the series: Serverless Link Shortener
- Build a Serverless Link Shortener with Analytics Faster than Finishing your Latte
- Real-Time Insights with Real-Low Effort
- Expanding Azure Functions to the Cosmos
- Exploring the CosmosDB with Power BI
- Serverless Twitter Analytics with CosmosDB and Logic Apps