ANALYSIS OF WIKIPEDIA IP EDITOR ACTIVITY)
Wikipedia matters. In a time of extreme political polarization, algorithmically enforced filter bubbles, and fact patterns dismissed as fake news, Wikipedia has become one of the few places where we can meet to write a shared reality. We treat it like a utility, and the U.S. and U.K. trust it about as much as the news.
But we know very little about who is writing the world’s encyclopedia. We do know that just because anyone can edit, doesn’t mean that everyone does: The site’s editors are disproportionately cis white men from the global North. We also know that, as with most of the internet, a small number of the editors do a large amount of the editing. But that’s basically it: In the interest of improving retention, the Wikimedia Foundation’s own research focuses on the motivations of people who do edit, not on those who don’t. The media, meanwhile, frequently focus on Wikipedia’s personality stories, even when covering the bigger questions. And Wikipedia’s own culture pushes back against granular data harvesting: The Wikimedia Foundation’s strong data-privacy rules guarantee users’ anonymity and limit the modes and duration of their own use of editor data.
But as part of my research in producing Print Wikipedia, I discovered a data set that can offer an entry point into the geography of Wikipedia’s contributors. Every time anyone edits Wikipedia, the software records the text added or removed, the time of the edit, and the username of the editor. (This edit history is part of Wikipedia’s ethos of radical transparency: Everyone is anonymous, and you can see what everyone is doing.) When an editor isn’t logged in with a username, the software records that user’s IP address. I parsed all of the 884 million edits to English Wikipedia to collect and geolocate the 43 million IP addresses that have edited English Wikipedia. I also counted 8.6 million username editors who have made at least one edit to an article.
The result is a set of maps that offer, for the first time, insight into where the millions of volunteer editors who build and maintain English Wikipedia’s 5 million pages are—and, maybe more important, where they aren’t.
Source: Analysis of Wikipedia IP editor activity
This map shows the percentage of households editing by county. It contains a number of distinct patterns, the most striking of which is the span of very low editing activity across the Plains, from the Dakotas through West Texas, and in the South, excluding the Carolinas and Florida, and cities such as Jackson, Mississippi; Birmingham, Alabama; Nashville, Tennessee; and Atlanta.
Source: The 2010 U.S. Religion Census: Religious Congregations and Membership Study (RCMS)
This pattern appears to closely and inversely resemble religious adherence: Counties with high religious adherence also have a low level of Wikipedia-editing activity, and counties with low religious adherence have high levels of editing. Of course, the modern encyclopedia is a largely secular project: The first large-scale one, Encyclopédie, did emerge from the Enlightenment after all, and took the then-radical approach of organizing its contents according to reason, not theology.
The possibility that areas of high religious adherence might be less active on Wikipedia has some support in Conservapedia, which was founded in 2006 to counter what its founder perceived as Wikipedia’s liberal bias. Its most popular articles include “Homosexual Agenda,” “Counterexamples to Relativity,” “Homosexuality and Anal Cancer,” and “Dinosaur,” which advances the argument that dinosaurs were created on the sixth day of Creation. It’s not all that difficult to imagine that Wikipedia may not appeal to those interested in creating an information ecosystem that confirms religious beliefs not verifiable by the independent, secondary reliable sources that Wikipedia requires. To put it another way: If your belief system is rooted in a book that has hardly changed for 2,000 years, you might be less interested in contributing to an encyclopedia that is continuously being written and rewritten.
Sources: MIT Election Data and Science Lab, 2018, “County Presidential Election Returns 2000-2016”, https://doi.org/
Meanwhile, though many of the low-editing-density areas are Republican-heavy counties in the Plains and the Rockies, the areas of high activity do not follow such clear voting patterns. Some swing states exhibit lots of editing across the state; in others, the activity isn’t distributed evenly. Likewise, states with histories of internal political divisions, such as California and New York, also have high overall editing activity that does not conform to political boundaries. (In California, notice the strong participation of historically Republican Orange County and San Diego.) Households in conservative upstate New York are as likely to contribute as ones in New York City, except for the two upstate counties (Lewis and Hamilton) that are also among the most religious and politically conservative in the state.
If Wikipedia is a place where people come to negotiate a shared understanding of the truth, these patterns of editing activity suggest that it might work in part because people come from regions of differing political beliefs, especially including the bellwether swing states, and that trustworthiness is established through the interaction of contributors across the political spectrum.
Editing patterns also map onto other demographic lines: The pattern of editing activity in Appalachia and the South appears to match population density, income, education, and broadband access. Does proximity to other people make you more inclined toward collective action, or is it simply the fact that editing would be difficult without the income to purchase a computer, access to broadband, and education to feel comfortable with formatting citations? While idealistic Wikipedians might like to think it is the former, the persistent and well-documented poverty of the rural South seems the more likely cause. This area of low editing, from East Texas to Virginia, includes the highest concentration of African Americans in the country, raising the likelihood that income, education, and internet access intersect with racial inequity as factors that prevent participation.
Following this pattern, Native American communities also appear to be prevented from editing by similar factors: low education, high poverty, and lack of internet access. Nearly all counties with majority Native American populations have low editing rates.
The absence of participation from majority Native American counties, and rural, poor, black counties in the South, is troubling. This absence is not a choice—as it may be with the deeply religious—but an inability to contribute due to intersectional inequality. Furthermore, the Wikipedia community’s forms of outreach are ill-equipped to reach these rural regions, because in-person meetups, edit-a-thons, and university programs all require population density to succeed.
English Wikipedia Editors by Country
While the United States accounts for nearly half of the editors, looking at the data from an international perspective reveals the United States as just one part of the colonial legacy of the English language. The five largest contributors were part of what once was the British Empire, and account for nearly 75 percent of all editors.
Global editing patterns also trace specific geographic contours of the British Empire: While editing activity across Africa is orders of magnitude lower than all other inhabited continents, the more active countries are mostly former British colonies; Francophone West Africa is one of the regions with the lowest activity. India is the third-largest contributor to English Wikipedia. I spoke with the Indian regional organizer for Art+Feminism—the Wikipedia editing nonprofit I co-founded—about the importance of translating our training materials into Hindi, Bengali, and other languages of India; she said that it wasn’t a priority, because her participants are focused on editing English Wikipedia and have little interest in editing the Hindi or Bengali Wikipedias. This is a result of the colonial legacy of English and its contemporary role in social and economic mobility, but also because of the gravitational pull of English Wikipedia: 92 percent of all Wikipedia traffic in India is to the English version, and if you want to share your knowledge, for better or worse, you go to where the audience is.
The map of households editing Wikipedia shows other constellations of low editing activity, including in parts of Asia, the Middle East, and Central America, as well as the former Soviet countries (especially in contrast to their surrounding areas in Europe). In some cases, these patterns mirror income, broadband access and affordability, and education. In others, the cause seems more likely to be war (Afghanistan, Iraq, Syria, Yemen) or isolated, repressive regimes (Myanmar, North Korea).
Source: Analysis of Wikipedia IP editor activity
These geographic editing trends have remained consistent over the past 15 years. As many others have reported, editing volume grew from 2002 to 2005, peaked in 2008, and declined slightly to remain fairly stable over the past 10 years.
It is important to note that working with IP data has significant limitations. IP geolocation is not a perfect science, and I have been careful to avoid some of the known pitfalls with mapping IP addresses. Because dynamic IPs are reassigned to new users, they move around, potentially diminishing the accuracy of the old data. But these maps help validate the data: In fact, the trends in these annual maps stay consistent.
Like the Enlightenment itself, the modern encyclopedia has a history entwined with colonialism. Encyclopédie aimed to collect and disseminate all the world’s knowledge—but in the end, it could not escape the biases of its colonial context. Likewise, Napoleon’s Description de l’Égypte augmented an imperial military campaign with a purportedly objective study of the nation, which was itself an additional form of conquest. If Wikipedia wants to break from the past and truly live up to its goal to compile the sum of all human knowledge, it requires the whole world’s participation.
Data, source code, and a more detailed methodology are available on Github.
This project was supported by the Eyebeam Center for the Future of Journalism. Danara Sarıoğlu contributed programming assistance. Frank Donnelly and the GIS Lab at Baruch College contributed spatial-analysis assistance.