-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
26 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
title: "Spreading Love and Margarine: the link between margarine consumption and the divorce rate" | ||
layout: single | ||
excerpt: "a hilarious demonstration of data dredging" | ||
tags: [til, statistics] | ||
--- | ||
|
||
![](https://tylervigen.com/spurious/correlation/image/5920_per-capita-consumption-of-margarine_correlates-with_the-divorce-rate-in-maine.svg) | ||
|
||
Who knew that margarine consumption is correlated with the divorce rate in Maine? There is even a *very scientific* [paper](https://tylervigen.com/spurious/research-papers/5920_spreading-love-and-margarine-an-examination-of-the-butter-splitter-correlation-in-maine.pdf) on the subject. | ||
|
||
This is just one of thousands of [spurious correlations](https://tylervigen.com/spurious-correlations) from [Tyler Vigen's](https://tylervigen.com/) hilarious demonstration of data dredging (his figures are even included on the associated [Wikipedia page](https://en.wikipedia.org/wiki/Data_dredging)). This is when you take many variables, say 25,237 like on his website, and blindly accept statistically significant correlations. | ||
|
||
Turns out this is a major problem in the more statistical sciences, so much so that they now have a [pre-registration format](https://en.wikipedia.org/wiki/Preregistration_(science)#Registered_reports) to describe what the study will investigate before any data is investigated. | ||
|
||
This project also provides a great example of generating realistic looking content, in the form of *scientific* papers, from LLMs. Each paper shows the sequence of prompts that were used to create it. | ||
|
||
<a href = "https://tylervigen.com/spurious/research-papers/5920_spreading-love-and-margarine-an-examination-of-the-butter-splitter-correlation-in-maine.pdf"> | ||
<img src="/files/spreading-love-and-margarine-an-examination-of-the-butter-splitter-correlation-in-maine.pdf.png" alt="AI-generated paper for the relationship between margine consumption and divorce rates in Maine" style="height: 600px; width:auto;"> | ||
</a> | ||
|
||
The author does point out that: | ||
> The silliness of the papers is an artifact of me (1) having fun and (2) acknowledging that realistic-looking AI-generated noise is a real concern for academic research (peer reviews in particular). | ||
> The papers could sound more realistic than they do, but I intentionally prompted the model to write papers that _look_ real but _sound_ silly. | ||
Although, I'm sure you could convince some people that [Anne Hathaway films are responsible for the number of votes for Republican senators](https://tylervigen.com/spurious/correlation/5866_the-number-of-movies-anne-hathaway-appeared-in_correlates-with_republican-votes-for-senators-in-tennessee)... |
Binary file added
BIN
+373 KB
...nd-margarine-an-examination-of-the-butter-splitter-correlation-in-maine.pdf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.