Facebook Best Friend

My Facebook Best Friend

 I was reading Dataclysm by Christian Rudder and came across a fascinating section discussing network theory and personal relationship detection. He referenced this 2014 paper by Lars Backstrom from Facebook and Jon Kleinberg from Cornell detailing a new method of analyzing social networks to detect who one’s most likely partner is. The research is very interesting and I suggest you read it but of course I set out to investigate my own network (and who my supposed significant other would be).


Facebook scraper hard at work

After all the data privacy scandals facebook went through, their API no longer allows developers to easily get user friend lists. I found a great existing scraper script by xyz that uses selenium and chrome to crawl through facebook’s link structure to gather your friends and all of the mutual friends you share to build your network. Xyz also adapted some code by abc to cluster and plot the graph using plotly, networkx and community by python-louvain. I edited the scraper a bit to be able to handle being re-run if (when) facebook realizes that you are scraping and blocks you.. My version can be found here.


Once I had the graph, which is REALLY fun to play with and I highly recommend, it was time to move on to calculating my significant other… The traditional method of determining closeness in a social network is known as embededness which measures mostly how intertwined two nodes are. The paper, however, suggested a new measure, which rather than simply looking at embededness, added an element they termed 'dispersion’ which describes the degree to which any individual node connects otherwise unconnected groups.

In real life, this corresponds to someone (most likely a significant other) who is friends with multiple unrelated groups, like your high school friends, and your family and the people you met in that cooking class that one time. This person being a connector of several segregated groups is more indicative of a significant relationship than simply knowing a lot of the same people.

I found this bit of code from Adam Gluck which just had to be updated to account for the quirks of a new versioning of networkx and it calculated the recursive dispersion described in the paper. And it worked! I ran it for myself and the node in the first position was the closest thing to what I’d consider a partner.. For whatever that’s worth.