Sinfonier Community and beyond!
- ACCESS DATA: Twitter. I have created an account in this social media network which delivers real-time stream of semi-structured data (loosely formatted characters inside a field but with little structure within it). The information is delivered in a JSON format which is what I will need to process. For example, word to search in the tweets: Verdun.
-
PREPARE AND CLEANSE DATA: Filter. I just want to keep those tweets written in a specific language, in this case English. In the field “lang” belonging to each tweet, I search for those tweets written in “ en”.
-
APPLY ADVANCED ANALYTICS: Now I add to my topology the module named “ AlyClassApi” which sends the text, in English and mentioning the word Verdun, to a sentiment analysis cloud service called Aylien (after creating a free user account – 2000 queries per day) which will classify the text in the tweet according to some predefined categories.
- As this module delivers a JSON array, I need to use the module called “EmitItemList” which will create simple JSONs for those elements present in the array (in this case the array is called categories).
- Having simple JSONs, I use a second filter module in order keep those tweets that I presume they are mentioning the battle of Verdun, so I search for the categories having the words history, war and culture. The results are then analysed by a second Aylien sentiment module that simply categorises the filtered texts as “positive or negative” (another category that is not considered is the category called “neutral”).
-
OUTPUT RESULTS: The final tweets are sent to two MongoDBs (after creating a free user account) where I can finally read those tweets which have gone through all the steps. The final topology looks like the following diagram:
Thanks to Fran and Alberto.