Difference between revisions of "Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda"

From Wikipedia Quality
Jump to: navigation, search
(wikilinks)
(+ Infobox work)
 
Line 1: Line 1:
 +
{{Infobox work
 +
| title = Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda
 +
| date = 2017
 +
| authors = [[Reid Priedhorsky]]<br />[[Dave Osthus]]<br />[[Ashlynn R. Daughton]]<br />[[Kelly Renee Moran]]<br />[[Nicholas Generous]]<br />[[Geoffrey Fairchild]]<br />[[Alina Deshpande]]<br />[[Sara Y. Del Valle]]
 +
| doi = 10.1145/2998181.2998183
 +
| link = http://dl.acm.org/citation.cfm?id=2998183
 +
}}
 
'''Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda''' - scientific work related to [[Wikipedia quality]] published in 2017, written by [[Reid Priedhorsky]], [[Dave Osthus]], [[Ashlynn R. Daughton]], [[Kelly Renee Moran]], [[Nicholas Generous]], [[Geoffrey Fairchild]], [[Alina Deshpande]] and [[Sara Y. Del Valle]].
 
'''Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda''' - scientific work related to [[Wikipedia quality]] published in 2017, written by [[Reid Priedhorsky]], [[Dave Osthus]], [[Ashlynn R. Daughton]], [[Kelly Renee Moran]], [[Nicholas Generous]], [[Geoffrey Fairchild]], [[Alina Deshpande]] and [[Sara Y. Del Valle]].
  
 
== Overview ==
 
== Overview ==
 
Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of when and how these approaches work remains open. Authors addressed this question using [[Wikipedia]] access logs and category links. Authors experiments, replicable and extensible using [[open source]] code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. Authors found that minimal-configuration, language-agnostic article selection process based on semantic [[relatedness]] is effective for improving predictions, and that approach is relatively insensitive to the amount and age of training data. Authors also found, in contrast to prior work, very little forecasting value, and authors argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.
 
Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of when and how these approaches work remains open. Authors addressed this question using [[Wikipedia]] access logs and category links. Authors experiments, replicable and extensible using [[open source]] code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. Authors found that minimal-configuration, language-agnostic article selection process based on semantic [[relatedness]] is effective for improving predictions, and that approach is relatively insensitive to the amount and age of training data. Authors also found, in contrast to prior work, very little forecasting value, and authors argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.

Latest revision as of 05:46, 23 May 2020


Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda
Authors
Reid Priedhorsky
Dave Osthus
Ashlynn R. Daughton
Kelly Renee Moran
Nicholas Generous
Geoffrey Fairchild
Alina Deshpande
Sara Y. Del Valle
Publication date
2017
DOI
10.1145/2998181.2998183
Links
Original

Measuring Global Disease with Wikipedia: Success, Failure, and a Research Agenda - scientific work related to Wikipedia quality published in 2017, written by Reid Priedhorsky, Dave Osthus, Ashlynn R. Daughton, Kelly Renee Moran, Nicholas Generous, Geoffrey Fairchild, Alina Deshpande and Sara Y. Del Valle.

Overview

Effective disease monitoring provides a foundation for effective public health systems. This has historically been accomplished with patient contact and bureaucratic aggregation, which tends to be slow and expensive. Recent internet-based approaches promise to be real-time and cheap, with few parameters. However, the question of when and how these approaches work remains open. Authors addressed this question using Wikipedia access logs and category links. Authors experiments, replicable and extensible using open source code and data, test the effect of semantic article filtering, amount of training data, forecast horizon, and model staleness by comparing across 6 diseases and 4 countries using thousands of individual models. Authors found that minimal-configuration, language-agnostic article selection process based on semantic relatedness is effective for improving predictions, and that approach is relatively insensitive to the amount and age of training data. Authors also found, in contrast to prior work, very little forecasting value, and authors argue that this is consistent with theoretical considerations about the nature of forecasting. These mixed results lead us to propose that the currently observational field of internet-based disease surveillance must pivot to include theoretical models of information flow as well as controlled experiments based on simulations of disease.