A few years ago, I was asked to develop an app for the following task: Let attendees of a party pick up to five other guests with which they would like to have the last dance. An hour before the event, the app would compute

*matches*(i.e., attendees that picked each other) and notify each guest the number of obtained matches---without disclosing the identity of the match. (Even if we had wanted to disclose the match, we would not have been able to do it. The data going through the app was anonymized.) Also, attendees could voluntarily opt out from voting.
It is worth mentioning three important aspects about attendees: (a) Their significant others were not invited to the party; (b) they could pick anyone from the list of attendees; and (c) they knew that "last dance" meant much more than holding each other for 3 minutes. We can thus safely assume that most guests casted their votes based on how attracted they were to their picks. In this blog post, we let the details of the app aside and, instead, try to understand the value in the data that we collected.

Everything has beauty, but not everyone sees it.

Confucius

In the remaining of the blog post, we analyze different interesting aspects of the data. We also invite readers to explore the data and submit their observations.

### A Bird's-Eye View

Let's start with a summary of the data:

- Number of Attendees:
**240** - Percentage of Picks (by Position):
**55%**(1) -**49.1%**(2) -**43.7%**(3) -**37.5%**(4) -**34.1%**(5)

**A significant portion of the attendees allegedly decided not to vote. This could be for a number of reasons. Two of these reasons are that they were either in relationships or not interested in participating. However, let us focus on those that did participate: They**

*Observation 1*:**consistently picked more than one person**.

### Distribution of Votes

Our next step is to understand how the votes above are distributed across attendees. For this, we produce a histogram over the number of votes that each attendee got. We include the R code used to process the data and, in turn, produce the histogram:

require(ggplot2) #Load data dance <- read.csv("[path_to_file]/dance.csv") #Produce count of votes per person counts <- as.data.frame(table(dance$voted)) #Count voters that were not voted None <- as.data.frame(setdiff(dance$voter, dance$voted)) colnames(None) <- c("Var1") None$Freq <- 0 #Combine tables counts <- rbind(counts,None) #Produce plot ggplot(data.frame(counts), aes(x=factor(Freq))) + geom_histogram(fill="grey", color="grey50") + xlab("Number of votes") + ylab("Frequency") + ggtitle("Histogram of attractiveness") + theme_bw()

Note that we account for the people that were voted but did not cast their vote. The histogram is as follows:

*Observation 2:*Many guests were voted by multiple people (see Number of votes >= 2 in the

*X*axis in Figure 1). (One of them even got picked by 36 different people!) We could say that these guests with multiple votes are rather popular and generally attractive. Nonetheless,

**the vast majority of guests got one vote**. This poses a hopeful message: Somewhere, someone is choosing you!

I could have concluded this post with the line above and left a very inspiring message. However, data is also inspiring and there is much more we can learn from it.

### Imagine Me and You, I do (Listen)

Our next concern is the distribution of votes by position. In other words, we want to know the number of guests that got

*N*votes in position*P*, for all positions. This could potentially empower our message above, if we saw a large number of guests being the top pick (*P=1*) of a single person (*N=1*). We show this for all positions (i.e.,*P=[1..5]*), for completeness. The R code to plot this information is as follows:ggplot(dance, aes(x=Freq, fill=as.factor(Position))) + geom_histogram(color="black", position="identity", binwidth=1, origin = -0.5) + facet_grid(Position ~ ., labeller="label_both") + scale_x_continuous(breaks=seq(1, max(data$Freq), by = 1)) + theme_bw() + theme(legend.position="none") + xlab("Number of votes") + ylab("Frequency") + ggtitle("Distribution of votes")

The histogram that we obtain is shown in Figure 2.

Figure 2: Histogram of votes by position. |

**The most notable result from the plot above is that we do not see 36 votes (or even close) in any particular position. Furthermore, the most number of votes for a guest in the first position (i.e., Position:1) was 8 (see top plot in Figure 2). This tells us that**

*Observation 3:***popular, generally attractive people are often**

**not the**

**top pick**.

### Is There a Match?

Although the observations discussed so far are intriguing and worth discussing further, lets bring back the focus to the original motivation of the app:

*Where there any matches?*We start by looking at the distribution of number of matches (see Table 1).Matches | 0 | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|---|

Guests | 214 | 22 | 2 | 2 | 0 | 0 |

The results above are very disappointing: Only 26 guests (22 + 2 + 2 with 1, 2, and 3 matches respectively), or about 10% of the total number of guests, got at least one match. We can still think positively and argue that a large fraction of the "zero-match" guests was due to the rather low participation in the app.

To validate the intuition above, we plot a directed graph of picks, defined as follows: Each

*node*represents a guest and each (directed)*edge*indicates a pick from the voter to the voted guest. Furthermore, the size of each node is proportional to the number of votes or, in other words, the number of incoming edges. The following code produces the in Figure 3. A high-definition version of the plot is available for download here in PDF format.
The graph is as follows:

It is easy to notice---by the size of the nodes---guests that are popular and generally attractive. Interestingly, one can assume that these (popular) guests will be surrounded by a large number of guests, which makes our plot representative of how attendees will be distributed across the dance floor.

*Observation 4:*In any case, these are far from being the main reasons why we plotted the data as a graph in the first place. As we had intuited above, a

**large fraction of the voters picked guests that did not participate in the survey**(see nodes in the perimeter of the graph with no outgoing edges). That being said, we can remain positive! This is not at all personal!

The analysis that I have performed above is by no means comprehensive. In fact, I would love to see other observations that you can extract from this valuable data. I have uploaded the anonymized data (dance.csv and dance-edge-list.csv), so feel free to download it and start experimenting with it. Once you find something worth sharing, go ahead and post it in a comment in the comment section below!

*Observation 5 (Bonus Track):*I do not want to end this post without pointing out something in Figure 3 that made me smile. If you look in the far left of the graph, there are

**two guests that have unequivocally picked each other**. I do not know if they planned it like this; I do not know if they this is 100% chance; What I know is that these two guests enjoyed the party very, very much.

ReplyDeleteGreetings from California! I'm bored at work so I decided to check out your website on my iphone during lunch break. I really like the info you present here and can't wait to take a look when I get home. I'm shocked at how quick your blog loaded on my mobile .. I'm not even using WIFI, just 3G .. Anyways, great blog! outlook 365 sign in

Great Article Artificial Intelligence Projects

DeleteProject Center in Chennai

JavaScript Training in Chennai

JavaScript Training in Chennai Project Centers in Chennai

The others fragment incorporates vitality estimating arrangements and information perception instruments. Data Analytics Course in Bangalore

ReplyDeleteI am looking for and I love to post a comment that "The content of your post is awesome" Great work!

ReplyDeleteSimple Linear Regression

Correlation vs covariance

KNN Algorithm

Logistic Regression explained

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

ReplyDeletedata science interview questions

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

ReplyDeleteSimple Linear Regression

Correlation vs Covariance

very well explained .I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.

ReplyDeleteSimple Linear Regression

Correlation vs covariance

data science interview questions

KNN Algorithm

Logistic Regression explained

Excellent article, I got new information from your article, keep updating.

ReplyDeletestring functions in java

string manipulation in java

string functions in java

date format in java

aws benefits

software testing interview question and answer

php interview questions and answers

Thanks for posting the best information and the blog is very helpful.

ReplyDeletedata science interview questions and answersHighly appreciable regarding the uniqueness of the content. This perhaps makes the readers feels excited to get stick to the subject. Certainly, the learners would thank the blogger to come up with the innovative content which keeps the readers to be up to date to stand by the competition. Once again nice blog keep it up and keep sharing the content as always.

ReplyDeleteData Science Course in Bhilai

Fantastic blog extremely good well enjoyed with the incredible informative content which surely activates the learners to gain the enough knowledge. Which in turn makes the readers to explore themselves and involve deeply in to the subject. Wish you to dispatch the similar content successively in future as well.

ReplyDeletedata science training in bhilai

ReplyDeleteGreat to become visiting your weblog once more, it has been a very long time for me. Pleasantly this article i've been sat tight for such a long time. I will require this post to add up to my task in the school, and it has identical subject along with your review. Much appreciated, great offer. data science course in nagpur

dent hangi borsada

ReplyDeletesc coin hangi borsada

btt coin hangi borsada

hnt coin hangi borsada

elf coin hangi borsada

psg coin hangi borsada

mdt coin hangi borsada

dot coin hangi borsada

mit coin hangi borsada

Extremely overall quite fascinating post. I was searching for this sort of data and delighted in perusing this one. Continue posting. A debt of gratitude is in order for sharing.data science course in kolhapur

ReplyDeleteA good blog always comes-up with new and exciting information and while reading I have feel that this blog is really have all those quality that qualify a blog to be a one. data science course in surat

ReplyDeletetiktok jeton hilesi

ReplyDeletetiktok jeton hilesi

referans kimliği nedir

gate güvenilir mi

tiktok jeton hilesi

paribu

btcturk

bitcoin nasıl alınır

yurtdışı kargo

It is extremely nice to see the greatest details presented in an easy and understanding manner.

ReplyDeletedata science training institute in hyderabad

seo fiyatları

ReplyDeletesaç ekimi

dedektör

instagram takipçi satın al

ankara evden eve nakliyat

fantezi iç giyim

sosyal medya yönetimi

mobil ödeme bozdurma

kripto para nasıl alınır

bitcoin nasıl alınır

ReplyDeletetiktok jeton hilesi

youtube abone satın al

gate io güvenilir mi

referans kimliği nedir

tiktok takipçi satın al

bitcoin nasıl alınır

mobil ödeme bozdurma

mobil ödeme bozdurma

YENİ PERDE MODELLERİ

ReplyDeletesms onay

mobil ödeme bozdurma

NFT NASIL ALINIR

Ankara evden eve nakliyat

Trafik Sigortası

dedektör

web sitesi kurma

ask kitaplari

SMM PANEL

ReplyDeleteSMM PANEL

iş ilanları

İnstagram Takipçi Satın Al

Hirdavatci

beyazesyateknikservisi.com.tr

Servis

jeton hile indir

en son çıkan perde modelleri

ReplyDeleteyurtdışı kargo

en son çıkan perde modelleri

minecraft premium

özel ambulans

uc satın al

nft nasıl alınır

lisans satın al

Hazard Management, also known as hazard identification and control, is the systematic process of identifying, assessing, and mitigating hazards within various environments or contexts. Hazards refer to potential sources of harm, including conditions, situations, or activities that have the potential to cause injury, illness, property damage, or adverse effects on the environment.

ReplyDelete