DataSoftware DevelopmentVisualization

Creating Word Clouds From Facebook Page Comments

There is a lot of talk of political polarisation in the USA, the large reason for it is media splintering. Basically, different media sources are more biased to different political views (liberal vs conservative) The public is only reading those that correspond to their own views, therefore strengthening their views. This makes both sides to creep closer to the extreme left or extreme right idea.

This is also true for Croatia and probably most developed countries in the world. So it sparked an idea in my mind. How can I determine which Croatian media is leaning to which side of the political spectrum? If the above thesis is correct, I should be able to get a profile of a certain media by profiling the people that are reading (and commenting) their content.

To do this right (fast and without breaking the API limit), I had to go through a lot of trial & error. Fetching the comments requires a bunch of individual calls because you have to fetch them post by post. This often reaches the Facebook API call limit, which stops you from doing any more requests for around 15-ish minutes. But Facebook offers a lovely feature called batch API calls. Basically, they allow you to make 50 API calls at once, by sending a single request containing 50 serialized requests and return them all at once.

All of this was used to create a small PHP CLI tool which I open-sourced here.

Building The Word Cloud

Now that I acquired all data that I need, it was time to create some awesome looking word clouds. By using a great Python library called word_cloud I managed to get pretty nice results.

Once again, this is also open-sourced here so you can check out how to build your own.


I’m not completely happy with how the results came out. Since I didn’t have enough time to play with stop-words and get a larger comment sample, I could not really tell where does the specific media lean on the political spectrum. I am sure it would show a lot more if I played around with the configuration. That being said, I have to admit that the results do look pretty sexy.

Why don’t we check out the results?

The last one is Croatian satire portal (similar to The Onion), and this is my favorite one because that becomes so obvious from the cloud. 🙂

Notice: Trying to get property of non-object in /var/www/ on line 3986

Notice: Trying to get property of non-object in /var/www/ on line 3987