Politicians by Country from the English-Language Wikipedia

From Wikipedia Quality
Revision as of 07:59, 19 June 2019 by Skylar (talk | contribs) (Creating a page: Politicians by Country from the English-Language Wikipedia)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Politicians by Country from the English-Language Wikipedia - scientific work related to Wikipedia quality published in 2017, written by Oliver Keyes.

Overview

This project contains data on most English-language Wikipedia articles within the category "Category:Politicians by nationality" and subcategories, along with the code used to generate that data. Both are released under the CC-BY-SA 4.0 license. Data The data was extracted via the Wikimedia API using the associated code. It is formatted as a CSV and saved as page_data.csv in the "data" directory. Columns are: 1. "country", containing the sanitised country name, extracted from the category name;2. "page", containing the unsanitised page title.3. "last_edit", containing the edit ID of the last edit to the page. Country codes are inconsistent. Where possible, they have been modified to match the country names found in http://www.prb.org/DataFinder/Topic/Rankings.aspx?ind=14 - but the PRB dataset contains nations not found in Wikipedia, and vice versa. The actual recursion only went 2 levels deep into the category tree: someone listed as an Antiguan politician, say, is included - someone exclusively listed as an Antiguan politician who was assassinated is not. Code The code is written in the programming language R, and heavily commented; it can be found in the "code" directory, and is split into 3 files: 1. utils.R, which contains utilities for operating the code in the other files;2. retrieve.R , which contains functions for retrieving the category and page data from Wikipedia;3. main.R , which executes the data retrieval code and performs sanitisation before writing it to file.