This post was originally published on Benjamin Mako Hill‘s blog Copyrighteous.
In wikis, redirects are special pages that silently take readers from the page they are visiting to another page. Although their presence is noted in tiny gray text (see the image below) most people use them all the time and never know they exist. Redirects exist to make linking between pages easier, they populate Wikipedia’s search autocomplete list, and are generally helpful in organizing information. In the English Wikipedia, redirects make up more than half of all article pages.
Over the years, I’ve spent some time contributing to to Redirects for Discussion (RfD). I think of RfD as like an ultra-low stakes version of Articles for Deletion where Wikipedians decide whether to delete or keep articles. If a redirect is deleted, viewers are taken to a search results page and almost nobody notices. That said, because redirects are almost never viewed directly, almost nobody notices if a redirect is kept either!
I’ve told people that if they want to understand the soul of a Wikipedian, they should spend time participating in RfD. When you understand why arguing about and working hard to come to consensus solutions for how Wikipedia should handle individual redirects is an enjoyable way to spend your spare time — where any outcome is invisible — you understand what it means to be a Wikipedian.
That said, wiki researchers rarely take redirects into account. For years, I’ve suspected that accounting for redirects was important for Wikipedia research and that several classes of findings were noisy or misleading because most people haven’t done so. As a result, I worked with my colleague Aaron Shaw at Northwestern earlier this year to build a longitudinal dataset of redirects that can capture the dynamic nature of redirects. Our work was published as a short paper at OpenSym several months ago.
It turns out, taking redirects into account correctly (especially if you are looking at activity over time) is tricky because redirects are stored as normal pages by MediaWiki except that they happen to start with special redirect text. Like other pages, redirects can be updated and changed over time are frequently are. As a result, taking redirects into account for any study that looks at activity over time requires looking at the text of every revision of every page.
Using our dataset, Aaron and I showed that the distribution of edits across pages in English Wikipedia (a relationships that is used in many research projects) looks pretty close to log normal when we remove redirects and very different when you don’t. After all, half of articles are really just redirects and, and because they are just redirects, these “articles” are almost never edited.
Another puzzling finding that’s been reported in a few places — and that I repeated myself several times — is that edits and views are surprisingly uncorrelated. I’ll write more about this later but the short version is that we found that a big chunk of this can, in fact, be explained by considering redirects.
We’ve published our code and data and the article itself is online because we paid the ACM’s open access fee to ransom the article.
For more details see the paper: Hill, Benjamin Mako, and Aaron Shaw. (2014) “Consider the Redirect: A Missing Dimension of Wikipedia Research.” In Proceedings of the 10th International Symposium on Open Collaboration (OpenSym 2014). ACM Press, 2014.
Discover more from Community Data Science Collective
Subscribe to get the latest posts sent to your email.