STIA @ VUW

We do research on information dynamics in socio-technical systems, collective intelligence rooted in coincidence, and humans in the information age.

Amongst the information systems we are studying are online communities, peer-production systems and, most significantly, citizen science platforms. In our work we mix methods commonly used in Web Science, Data Science, and Computational Social Science. The software developed as part of our research can be accessed on Github. For more information you can browse through the selection of recent publications on this site or get in touch with Markus Luczak-Roesch.

A Universal Socio-Technical Computing Machine

This is an attempt to develop a universal socio-technical computing machine that captures and coordinates human input to let collective problem solving activities emerge on the Web without the need for an a priori composition of a dedicated task or human collective.

Because science is awesome: studying participation in a citizen science game

In this paper, we examine the motivations for participation in Eye-Wire, a Web-based gamified citizen science platform. Our study is based on a large-scale survey to which we conducted a qualitative analysis of survey responses in order to understand what drives individuals to participate. Based on our analysis, we derive 18 motivations related to participation, and group them into 4 motivational themes related to engagement. We contextualize our findings against the broader literature on online communities, and compare our findings with other citizen science platforms, in order to understand the implications of gamification within the context of citizen science.

Finding Structure in Wikipedia Edit Activity: An Information Cascade Approach

This paper documents a study of the real-time Wikipedia edit stream containing over 6 million edits on 1.5 million English Wikipedia articles, during 2015. We focus on answering questions related to identification and use of information cascades between Wikipedia articles, based on author editing activity. Our findings show that by constructing information cascades between Wikipedia articles using editing activity, we are able to construct an alternative linking structure in comparison to the embedded links within a Wikipedia page. This alternative article hyperlink structure was found to be relevant in topic, and timely in relation to external global events (e.g., political activity). Based on our analysis, we contextualise the findings against areas of interest such as events detection, vandalism, edit wars, and editing behaviour.

From coincidence to purposeful flow? properties of transcendental information cascades

In this paper, we investigate a method for constructing cascades of information co-occurrence, which is suitable to trace emergent structures in information in scenarios where rich contextual features are unavailable. Our method relies only on the temporal order of content-sharing activities, and intrinsic properties of the shared content itself. We apply this method to analyse information dissemination patterns across the active online citizen science project Planet Hunters, a part of the Zooniverse platform. Our results lend insight into both structural and informational properties of different types of identifiers that can be used and combined to construct cascades. In particular, significant differences are found in the structural properties of information cascades when hashtags as used as cascade identifiers, compared with other content features. We also explain apparent local information losses in cascades in terms of information obsolescence and cascade divergence; e.g., when a cascade branches into multiple, divergent cascades with combined capacity equal to the original.

Peer-production system or collaborative ontology engineering effort: What is Wikidata?

Wikidata promises to reduce factual inconsistencies across all Wikipedia language versions. It will enable dynamic data reuse and complex fact queries within the world's largest knowledge database. Studies of the existing participation patterns that emerge in Wikidata are only just beginning. What delineates most of the contributions in the system has not yet been investigated. Is it an inheritance from the Wikipedia peer-production system or the proximity of tasks in Wikidata that have been studied in collaborative ontology engineering? As a first step to answering this question, we performed a cluster analysis of participants' content editing activities. This allowed us to blend our results with typical roles found in peer-production and collaborative ontology engineering projects. Our results suggest very specialised contributions from a majority of users. Only a minority, which is the most active group, participate all over the project. These users are particularly responsible for developing the conceptual knowledge of Wikidata. We show the alignment of existing algorithmic participation patterns with these human patterns of participation. In summary, our results suggest that Wikidata rather supports peer-production activities caused by its current focus on data collection. We hope that our study informs future analyses and developments and, as a result, allows us to build better tools to support contributors in peer-production-based ontology engineering.

When Resources Collide: Towards a Theory of Coincidence in Information Spaces

This paper is an attempt to lay out foundations for a general theory of coincidence in information spaces such as the World Wide Web, expanding on existing work on bursty structures in document streams and information cascades. We elaborate on the hypothesis that every resource that is published in an information space, enters a temporary interaction with another resource once a unique explicit or implicit reference between the two is found. This thought is motivated by Erwin Shroedingers notion of entanglement between quantum systems. We present a generic information cascade model that exploits only the temporal order of information sharing activities, combined with inherent properties of the shared information resources. The approach was applied to data from the world's largest online citizen science platform Zooniverse and we report about findings of this case study.

Designing for Citizen Data Analysis: A Cross-Sectional Case Study of a Multi-Domain Citizen Science Platform

Designing an effective and sustainable citizen science (CS)project requires consideration of a great number of factors. This makes the overall process unpredictable, even when a sound, user-centred design approach is followed by an experienced team of UX designers. Moreover, when such systems are deployed, the complexity of the resulting interactions challenges any attempt to generalisation from retrospective analysis. In this paper, we present a case study of the largest single platform of citizen driven data analysis projects to date, the Zooniverse. By eliciting, through structured reflection, experiences of core members of its design team, our grounded analysis yielded four sets of themes, focusing on Task Specificity, Community Development, Task Design and Public Relations and Engagement, supported by two-to-four specific design claims each. For each, we propose a set of design claims (DCs), drawing comparisons to the literature on crowdsourcing and online communities to contextualise our findings.

Socio-technical Computation

Motivated by the significant amount of successful collaborative problem solving activity on the Web, we ask: Can the accumulated information propagation behavior on the Web be conceived as a giant machine, and reasoned about accordingly? In this paper we elaborate a thesis about the computational capability embodied in information sharing activities that happen on the Web, which we term socio-technical computation, reflecting not only explicitly conditional activities but also the organic potential residing in information on the Web.

The role of ontology engineering in linked data publishing and management: An empirical study

In this article the authors evaluate the adoption and applicability of established ontology engineering results by the Linked Data providers' community. The evaluation relies on a combination of qualitative and quantitative methods; in particular, the authors conducted an analytical survey containing structured interviews with data publishers in order to give an account of the current ontology engineering practice in Linked Data provisioning, and compared and expanded our findings with statistics on ontology development and usage provided by the Billion Triple Challenges datasets from 2012 (using the vocab.cc platform) and from 2014 and other related tools. The findings of the evaluation allow data practitioners and ontologists to yield a better understanding of the conceptual part of the LOD Cloud; and form the basis for the definition of purposeful, empirically grounded guidelines and best practices for developing, managing and using ontologies in the new application scenarios that arise in the context of Linked Data.

A-posteriori provenance-enabled linking of publications and datasets via crowdsourcing

In this paper we present opportunities to leverage crowdsourcing for a-posteriori capturing dataset citation graphs. We describe a user study we carried out, which applied a possible crowdsourcing technique to collect this information from domain experts. We propose to publish the results as Linked Data, using the W3C PROV standard, and we demonstrate how to do this with the Web-based application we built for the study. Based on the results and feedback from this first study, we introduce a two-layered approach that combines information extraction technology and crowdsourcing in order to achieve both scalability (through the use of automatic tools) and accuracy (via human intelligence). In addition, non-experts can become involved in the process.

Why Won't Aliens Talk to Us? Content and Community Dynamics in Online Citizen Science.

We conducted a quantitative analysis of ten citizen science projects hosted on the Zooniverse platform, using a data set of over 50 million activity records and more than 250,000 users, collected between December 2010 and July 2013. We examined the level of participation of users in Zooniverse discussion forums in relation to their contributions toward the completion of scientific (micro-)tasks. As Zooniverse is home to a multitude of projects, we were also interested in the emergence of cross-projects effects, and identified those project characteristics, most importantly the subject domain and the duration of a project. We also looked into the adoption of expert terminology, showing that this phenomenon is dependent on the scientific domain which a project addresses but also affected by how the communication features are actually used by a community. This is the first study of this kind in this increasingly important class of online community, and its insights will inform the design and further development of the Zooniverse platform, and of citizen science systems as a whole.