Statistics - measure of overlap between populations?

Actaeus » Sat May 30, 2009 11:38 am UTC

The original context of this problem was the search for a good statistic for how connected you are to someone on a social network such as Facebook. I wanted it to be independent of the total friends each person has, because otherwise friend-collectors would throw the whole thing off.

Original formula:
A = set of my friends
B = set of other person's friends
[math]\frac{|A\cap B|}{|B|}[/math]
Modified to be commutative and to take into account the size of my friend set:
[math]\frac{|A\cap B|}{\sqrt{|A|\times|B|}}[/math]
Note that this is the geometric mean of the original formula taken both ways (A = me and A = other)
What would be a more useful way to do this? I've been messing with chi-square tests with little success.

Clarification edit: I'm trying to measure how closely our sets of friends overlap. The Jaccard index actually seems like a much simpler and better idea than what I've been doing. It's also similar enough to set off my "someone smarter than me had the same idea, better, a long time ago" dismay reaction. This happens to me far too often.
Re: Statistics - measure of overlap between populations?

GreedyAlgorithm » Sun May 31, 2009 4:09 am UTC


What are you actually trying to find? |A intersect B| is the number of mutual friends A and B have, done. Why are you dividing by anything? If you want to exclude some people ("friend-collectors") from the domain, just exclude them. chi-squared tests? Clearly you haven't told us whatever your actually goal is.
Re: Statistics - measure of overlap between populations?

t0rajir0u » Sun May 31, 2009 6:34 am UTC

You should probably tell us what you're actually trying to measure. Right now your thread says "I want to study the number of mutual friends two people have, but I don't want to use the number of mutual friends."

