Analysis

Identify Gaps in Public Discourse or Product Offering with Network Graphs

15 Feb 2019

Visualising a vast amount of text data in public discourse about your product offering is tough.

Analysing the vast amount of text data is even harder.

While word clouds and network graphs make it a tad easier, it can take a lot of time and technical skill to extract learnings from it.

Recently I was introduced to this very advanced, public access Network graphing tool that is extremely useful for text data analysis. While a more advanced analysis still requires the more manual method, this tool provides a quick and easy method to create network graphs and perform an analysis.

I'll introduce the InfraNodus tool and provide an interesting use case.

Disclaimer: Though it's a public access tool, it's not free. As of Feb 2019, it requires a one-time payment of €5.

Problem Statement

As a use case, I would like to analyse the gaps in public discourse on Twitter for the three biggest enterprise cloud solution provider: Amazon Web Services, Google Cloud and Microsoft Azure.

Enterprise Cloud Market Share - RightScale - AWS Azure Google Cloud Via this analysis, I aim to discover the gaps in consumer understanding about the problem (e.g. A particular offering is offered by both Google and Azure, but only mentioned/discussed on #Azure tweets. Is it a lack of consumer education?)

In the meantime, it can also uncover gaps in product offering (e.g. X appears in AWS searches but not Azure. Is it a feature that is unique to AWS?)

Why do I choose this topic?

I did a quick landscape analysis of the hygiene of the tweets on Twitter with several hashtag searches. I found that tweets with hashtags #AWS, #GoogleCloud and #Azure are very clean, consisting of tweets related to the topic I'm interested in (i.e. enterprise cloud solutions provider) and not about a vacation hotel named "Azure" or some SME named "AWS".

That will save a lot of time as I can safely skip the data cleaning step.

Besides, the volume of tweets is huge and heavily concentrated with the developer crowd and cloud solution providers.

Let's get started.

Extract the Data via InfraNodus

InfraNodus comes packed with several ways to obtain your data. While you can use the Twitter API to get the data on your own and copy paste into the tool, you can use the in-built Twitter search app to obtain the data easily.

Once you've created your account, access the list of apps and select Twitter.

On the next screen, key in "#aws" in the first text box. We'll replace with #GoogleCloud and #Azure later.

On the second text box, key in a name that's descriptive e.g. "awstwit", "awstweet".

For the third text box, I used 10000. You can choose your own number.

Under settings, choose the first option as we would like to analyse the full text of the tweets.

Click "+settings" and untick "exclude search term from the graph".

Network Graph Twitter Setup with InfraNodus by Noduslabs

Once you're, click visualise. You'll be introduced to your beautiful network graph of tweets.

Unprocessed Network Graph on InfraNodus - Amazon Web Services

Introduction to the Network Graph Analysis

Though I recommend that you watch all the tutorial videos, I'll introduce some basic actions you can take on the tool.

Quick Summary

At the right hand sided, you'll see a very, very useful summary on the communities detected on your text data. With this, you can easily ascertain the themes surrounding your text data.

It also highlights the "Most Influential Words" that are the words that have the most connection i.e. most often mentioned alongside a wide range of words.

Useful Summary using LDA methodology for the network graph

Highlighting Nodes

You may click on one or several nodes on the network graph or on the summary dialogue at the right-hand side.

Doing so will highlight the connection to the node. You'll also see "tags" appearing at the top right-hand side. Clicking those or the back arrow will reset the highlight.

Highlighting the nodes on the network graph for better analysis

Sometimes, there are nodes that cloud other data (i.e. the bigger ones) or nodes that are irrelevant. You can remove them by clicking the trash icon.

Cleaning data on the network graph with the trash can icon

Understanding the Public Conversation

If you're curious about the context to which the text appeared, you can select the node and click on the dialogue bubble icon at the top right-hand corner.

Doing so opens a sidebar on the left.

Contextual analysis for the Network Graph I find this very useful for contextual analysis. However, note that it is currently a simple "Find" mechanism i.e. "pineapple" will appear for the word "app"

Performing the Discourse Gap Analysis

Before proceeding to the next section, please create two similar network graphs for #GoogleCloud and #Azure. You may name them anything.

Once you're done, select the #AWS one from the hamburger menu on the top left-hand corner. Then, click the weighing scale icon at the right. The weighing scale icon enables the "Compare to Context" function.

Utilise the powerful compare to context to unlock powerful network gap analysis The button should turn blue once clicked. Then, head to #GoogleCloud network graph.

You'll now see a few black nodes. Those are the nodes that appear on #AWS but do not appear on #GoogleCloud i.e. the context.

Gaps between the two network graphs are highlighted black

In the summary box at the right, you'll see a portion for "In [xxx] but not in [yyy]".

More summaries available when comparing to context

Those are the gaps in the public discourse of Google Cloud. That is to say, those are the topics discussed or talked about in #AWS tweets but not #GoogleCloud tweets.

Though this analysis is dependent on the quality and volume of tweets, it can tell us quickly about:

competitor advantages
gaps in your product offering
consumer needs that are not addressed by your product
offering that your product possess but not known or discussed
potential future trends in the industry

For my result, I found the following are the (meaningful) gaps in Google Cloud vs Amazon Web Services (tips: you can discover them by removing the bigger nodes):

nutanix
golang
serverless
key (SSH key management)

Though Google Cloud does offer solutions for Nutanix, serverless environment and integration with Go applications, they do not appear in the tweets for #GoogleCloud.

Note: There are of course still dirty data (e.g. tweets from events, promotional tweets, tweets from enterprises promoting themselves). This is where the dialogue bubble function mentioned earlier can be useful to determine if they are insights or just dirty data.

Do note that there are enterprises tweeting that might not be considered "public discourse". I would object to that assessment within this context as enterprises are the target users of these products and are the "influencers" in this space. Hence I believe including their tweets actually make the analysis more wholesome.

Using #AWS vs #Azure with #Azure as the context shows these gaps:

nutanix
devops
key (SSH key management)

A more customised approach

While we've covered how to perform a tweet analysis using the inbuilt Twitter API, there is a way to insert your own data for analysis. The data could be your output in your Python/R code, an Excel report or Word/text document.

To do that, open your sidebar via the hamburger menu at the top-left corner and select "+ new context".

Type the name of your context and click Enter.

Once done, you'll be shown an empty slate. At the bottom left dialogue box, Paste your text data wholesale. Any line breaks are taken as a new paragraph of text (e.g. in tweets context, a new tweet).

Paste your text data into this dialogue box to visualise in a network graph

After clicking "save", wait a while for your data to be visualised.

Closing Notes

I hope you have found this powerful tool really useful. There are many, many use cases to InfraNodus and I'm merely scratching the surface with this post.

Do read my other post of Keyword Network Graphs if you prefer another solution that is more manual but provides more control and freedom.

All the best for your analysis and do spread the word so that this tool continues to be alive and usable.

Identify Gaps in Public Discourse or Product Offering with Network Graphs

Problem Statement

Extract the Data via InfraNodus

Introduction to the Network Graph Analysis

Quick Summary

Highlighting Nodes

Understanding the Public Conversation

Performing the Discourse Gap Analysis

A more customised approach

Closing Notes

Share the love

Related Posts

Analysing the Best Progressors of the Ball (Football's Top 5 Leagues)

More Analysis - Subreddit's Keyword Density with Python and PRAW

How News from Different Regions Cover the Same News Piece - A Python Analysis