Question: Where is the happiest place in New York City?

Possible answers:

  1. Immediately adjacent to any hot dog stand.
  2. Madison Square Garden during moments of Linsanity.
  3. Tim Tebow’s new apartment building.

No really though, let’s measure some stuff.

Facts: (1) New York City is the most populous city in the US and (2) Manhattan streets are arranged on a rectangular grid. We have already seen how cities, airports, and even streets can be identified using geotagged tweets – here we use more than a half million messages from 2011 to investigate the happiness of NYC streets and avenues (clearly visible in the image below, as is Central Park).

Binning tweets by avenue and street, we use the labMT word list to measure happiness in tweets as a function of avenue and street number:

The results suggest that the west side is slightly happier than the east side, and that happiness actually declines as one moves further uptown. Next we bin by intersection and plot a heat map showing the distribution of happiness over all of the street corners in Manhattan:

The happiest “corner” is actually just inside the western edge of Central Park, where the intersection of 7th and 77th would be (this is just north of the lake and east of the Hayden Planetarium)*. This corner elicits tweets with a relatively high abundance of the positive words “loves” and “sky”, and proportionally less negative words like “not”, “fear” and “no”. Many of the happiest locations actually fall within Central Park!

* Please note that the results reported in this post have not been vetted through panels of experts, statistical tests of significance, or scientific peer review.  They are intended to be a fun and lighthearted exploration of our more formal research interests.

147 Comments

Filed under psychology, social phenomena

Does QWERTY Affect Happiness?

Last week, news broke of a paper published in the Psychonomic Bulletin and Review by Kyle Jasmin and Daniel Casasanto claiming to observe a positive relationship between the “right-handedness” of a word and its emotional valence. This is being called the ‘QWERTY effect’. (You may recall that ‘valence’ is psych-speak for ‘happiness’ associated with words.  What I called right-handedness they call the right side advantage of a word, \text{RSA} = (\text{\# of right side letters}) - (\text{\# of left side letters}) when typed using the ubiquitous QWERTY keyboard. )  You can read the original paper here, and there’s a Wired article that explains their conclusions.

Particularly interesting for the group here in Vermont was Jasmin and Casasanto’s use of the Affective Norms for English Words (ANEW, from Bradley and Lang (1999)) dataset, along with comparable data for Spanish and Dutch, in their analysis. The hedonometric work we’ve done on blogs, music lyrics, Twitter, etc. was initially based on the happiness scores from the ANEW study. The 1034 ANEW words were handpicked to represent the emotional spectrum, and as such don’t represent a uniform selection of words found in English-language texts. We merged the 5,000 most common words from 4 corpora (Twitter, Google Books, the New York Times, and music lyrics) and had Mechanical Turk users evaluate their valence in the same way as was done for ANEW, producing a list of ~10,000 words and their associated happiness scores. We’re calling this dataset LabMT-1.0, for Language assessment by Mechanical Turk. Since LabMT words were picked by frequency of usage, they provide much better coverage (i.e. the percent of words identified in a text) than ANEW.

When Jasmin and Casasanto’s paper appeared and achieved the impressive press coverage that it did, it also attracted the scrutiny of other language researchers who weren’t so sure of the significance of the QWERTY effect. A public debate has taken place between Mark Liberman of the Language Log blog and the authors of the study. See post1, post2, the response from J&C, and the response back. After being informed by (our) Peter Dodds of the LabMT data, Liberman made the second post, in which he calculated the RSA of our LabMT words but continued to find no or little QWERTY effect.

As we’ve explored in our hedonometrics papers, the hedonometer can be thought of as a tunable instrument when you remove neutral-valence stop words, effectively increasing the sensitivity of happiness measurements for texts. I wanted to see if tuning \Delta h_\text{avg} changed anything. In the process, I also repeated Liberman’s analysis of the LabMT data and am making available the R scripts and data that went into that.

analyze_rsa_labmt.R – script for the analysis and plots
labMT.rsa.txt – Liberman’s computation of RSA for the subset of LabMT-1.0 words containing only alphabetic characters

We haven’t seen any more evidence than Liberman did when looking simply at the relationship between RSA and h_\text{avg}. If the QWERTY effect is real, then it is exceedingly small, but the above data point to it being indistinguishable from zero.  It’s useful to look at the raw data, binned in both variables.

Raw data binned (RSA spacing of 1, h_\text{avg} spacing of 0.1) and plotted.

There is not any obvious, visually distinguishable correlation.

Now, if you take the average happiness of words for each RSA value, you can do a linear regression on that data, weighting each point by the number of words for that RSA value.

Data binned by RSA with the line indicating the linear regression weighted by the number of words for that RSA. Note that this is the same as a linear fit of the unbinned data, but the resulting plot is less cluttered and easier to read.

The trend actually runs in the negative direction, but with a p-value of 0.74, meaning there is no effect. Jasmin and Casasanto controlled for more variables in a different dataset, and independent evaluation of the significance of the correlations they observed, controlling for these other attributes, would be possible if all the original data were released. Sure, the data sources are listed, but it would be a significant effort to recreate the entire set. I’d also be curious to see if similar correlations could be observed in the other affective variables measured in ANEW (arousal and dominance).

Final note: Changing \Delta h_\text{avg}, our tuning knob, does change the magnitude of the correlation. (Imagine removing a horizontal band from the binned plot above; this changes the correlation.) However, it is still impossible to conclude that the effect is significant. Also, analyzing positive and negative words separately shows opposite trends for \Delta h_\text{avg} = 1. The code for this is all included in the script above.

9 Comments

Filed under psychology, social phenomena

Chaos in an Experimental Toy Climate

In the 1960’s, MIT meteorologist Edward Lorenz was investigating the effects of nonlinearity on short-term weather prediction in a model of convection. In his ground-breaking paper “Deterministic Nonperiodic Flow,” Lorenz showed that numerical solutions of the model exhibit sensitive dependence on their initial position, leading virtually indistinguishable states to diverge quickly. This phenomenon, which became known as chaos, is a major contributor to inaccuracies in weather and climate forecasts.

The thermal convection loop is an experimental analog of Lorenz’s system in the form of a hula-hoop shaped tube, filled with fluid, and oriented vertically like a wheel. The bottom half of the tube is warmed uniformly by a bath of hot water and the top half is cooled. Under certain conditions, a steady state is never reached, and the fluid switches direction in an unpredictable pattern.

In the past few years, we have used Computational Fluid Dynamics (CFD) simulations of the loop as a testbed for data assimilation, ensemble forecasting, and model error experiments in weather and climate prediction. Our team is developing algorithms to improve forecasts and uncertainty quantification using this simple but realistic toy climate. Successful techniques are then implemented on more realistic weather and climate models.

Details:

K. D. Harris, E.-H. Ridouane, D. L. Hitt, C. M. Danforth. Predicting Flow Reversals in Chaotic Natural Convection using Data Assimilation. Tellus A 2012, 64, 17598. [pdf]

N. Allgaier, K. D. Harris, C. M. Danforth. 2012. Empirical Correction of a Toy Climate Model. Physical Review E. 85, 026201. [pdf]

R. Lieb-Lappen, C. M. Danforth. 2012. Aggressive Shadowing of a Low-Dimensional Model of Atmospheric Dynamics. Physica D. Volume 241, Issue 6, Pages 637–648. [pdf]

E.-H. Ridouane, C. M. Danforth, D. L. Hitt. 2009. A 2-D Numerical Study Of Chaotic Flow In A Natural Convection Loop. International Journal of Heat and Mass Transfer. [pdf]

and a lecture on the topic given by Danforth to the Applied Dynamics graduate course at UNC Chapel Hill:

Funding from the project comes from NASA and NSF through the Mathematics and Climate Research Network.

1 Comment

Filed under physics

Hedonometrics

Our paper “Temporal Patterns of Happiness and Information in a Global Social Network: Hedonometrics and Twitter” appears in PLoS ONE this week. Their blog encourages you to tweet for the sake of science!

Among other findings, in this paper we demonstrate that human ratings of the happiness of an individual word correlate very strongly with the average happiness of the words that co-occur with it. This implies that tweets containing particular keywords can be used as an unsolicited public opinion poll.

For example, tweets containing “Tiger Woods” became decidedly less positive after his Thanksgiving disaster in 2009 as the words ‘accident’, ‘crash’, ‘scandal’, and ‘cheating’ are more abundant, while the word ‘love’ appears less often.

Happiness is measured relative to the ambient background of all tweets.

Sad words are blue, happy words are yellow. Up (down) arrow indicates that the word appeared more (less) frequently in tweets containing "Tiger Woods".

Generally, tweets containing personal pronouns tell a positive prosocial story with ‘our’ and ‘you’ outranking ‘I’ and ‘me’ in happiness. The least happy pronoun on our list is the easily demonized ‘they’.

Emoticons in increasing order of happiness are ‘:(’, ‘:-(’, ‘;-)’, ‘;)’, ‘:-)’, and ‘:)’. In terms of increasing information content (diversity of words co-occuring with each emoticon), the order is ‘:(’, ‘:-(’, ‘:)’, ‘:-)’, ‘;)’, and ‘;-)’. We see that happy emoticons co-occur with words of higher levels of both happiness and information but the ordering changes in a way that appears to reflect a richness associated with cheekiness and mischief: the two emoticons involving semi-colon winks are third and fourth in terms of happiness but first and second for information.

A list of the happiness ratings of tweets containing some interesting keywords can be seen here.

And not surprisingly, the happiness of all tweets appearing on a given day of the week correlates well with the happiness ratings humans give each day.

Happiness of tweets appearing on a given day

Human ratings of the happiness of each day of the week

You can download the language assessment by Mechanical Turk (labMT 1.0) word list here. It is a text file containing the set of 10,222 most frequently occurring words in the New York Times, Google Books, music lyrics, and tweets, as well as their average happiness evaluations according to users on Mechanical Turk.  See the paper for details.

Much more to come regarding sociotechnical phenomena…

3 Comments

Filed under social phenomena

The Happiest Distribution

Do you laugh within your tweets? e.g. hahaha!!!  Here we show the number of times these different laugh species appear in tweets as a function of how many ha‘s they contain.  A few observations:

  1. Longer laughs are less frequent, and the frequency decays at a constant rate. We’re plotting on logarithmic axes, the black line has a slope of -5 and appears to match the data over at least 5 decades in frequency… Zipf would be proud of the people: Hahaha power law?
  2. ha is less frequent than haha but slightly more frequent than hahaha.
  3. Only a select few humans are able to make it out beyond 100 letters without a typo.  Congratulations!

Thanks to one of our students, Tyler Gray, for sorting this all out.

2 Comments

Filed under social phenomena

Happy and we know it

Science Magazine published a piece today framing twitter as a laboratory for research, Social Scientists Wade Into The Tweet Stream, including the above figure showing our hedonometer’s measure of happiness in 2011 as a function of day. Dodds was also interviewed by Science for their weekly podcast, and by Benedict Carey for a New York Times piece, Happy and You Know It? So Are Millions on Twitter.

2 Comments

Filed under social phenomena

Tweet Cartography

Six months of geo-located messages from Twitter’s gardenhose feed, roughly 20 million.  World, US, and NYC twitterific projections.

PDF versions available here. Made possible by data ninjas Kameron Harris and Morgan Frank.

5 Comments

Filed under social phenomena