Gijs for Reciprocal ecosystem for citizen science

Posted on Aug 25, 2021

Reciprocal ecosystem for citizen science — Grant Report #2 (Final Report)

#grantreports

Hi Community!
This is our final report. We're proud of what we have learned and achieved, we've worked hard to make this accessible. We would love you to take a look, read and share feedback or questions!

First, a short description of our project: Our goal was prototyping a survey platform that rewards respondents automatically upon filling in a survey using the Interledger specification. Survey-data will be stored in linked data format and users can copy the data on their own personal data storage: a SolidPod. In that way researchers can use citizen science as a way to collect data and contribute to environmental health research.

Project update

Since our interim report we worked on the survey generator, wrote a Data Privacy Impact Assessment, learned from scientists like the European Citizen Science project Cos4Cloud, and wrote blogs for developers, researchers and about privacy.

A summary: We succeeded in building the POC of a survey management system which includes a survey generator and coupons for rewarding respondents, generate survey responses in linked data format and a seperate payments service for rewarding respondents we called cash-link (watch the video below for a full flow of these components!). We miss a last piece of the puzzle for completing the payment service and making the actual payments work because current wallets do not yet support sending payments.

We also did a lot of research on types of payments and reusable surveys because those are more valuable and more likely to be involved in payments, also privacy is an important part of our project we dug into, and we interviewed a lot of scientists who work with citizen science. We'll dive deeper into this below.

Highlight: the full flow survey management system, cash-link and linked data

A short reading guide:

Under Progress on objectives we share our research on high-over learnings on different types of payments in citizen science and reusable surveys including SolidPODS, data allocation and privacy and a how data in an European citizen science project standardizes data
Under Key activities we explain how the reward service cash-link works, share a Data Privacy Impact Assessment for the Survey Rewards and Payment Service, our updates on the POC and describe some changes we've made along the way.
Under Communications and Marketing we share our blogs and summarize all of the organisations and scientists we talked to about our project, Interledger, and learned from and gave us a deeper understanding of citizen science which we describe in this final report.

Progress on objectives

In the section below we'll share some high-over learnings on our goals in payments, surveys, and privacy.

Payments serving multiple goals: incentivizing, reimbursing costs, diversifying research population and gamification

Since payments in citizen science are a main part of our project we researched and interviewed a lot of scientists active in citizen science. We learned that reimbursing and paying citizens to participate in citizen science could apply to different topics:

Inclusion: 1.7 billion people still lack access to digital payments, despite most owning a mobile phone, according to the World Bank's Global Findex. Accessible open financial transactions can help boost citizen science in many ways. With citizen science using an open payment ecosystem every human, regardless of identity, geography, or income can participate and contribute to science with the financial options mentioned below. This is important because science should reflect and involve as many groups as possible.
Sign up bonus: a lot of survey platforms use this as a way to attract potential respondents
Thanking citizens for participating with a small coupon. Sometimes when also personal health data is involved legal obligations don't allow you or restrict you to incentivize people to share this data.
Reimbursing for travelling costs and other practical issues like electricity costs for devices like devices that measure air-pollution.
Payments as a form of diversifying the research population because some groups of citizens are more likely to participate, and using payments could result in better representativity of population and data.
Dynamic and variable rewards. In the field of social science for example gamification is used for social experiments. These games often rely on a reward that is dependent on the performance in the game.
Participation: it can be hard to get enough respondents because people are overwhelmed by information-society. Payments could help get respondents to participate. An argument against this is only a certain type of respondent would be incentivized making the respondents pool and data not representative.
Donation: participants that are not incentivized by payments (for example because they earn enough money) could still be willing to participate if the amount is donated to a certain goal (e.g. charity), thereby broadening the potential target audience.
Also IOT devices can become autonomous and self-sustaining gathering data and receiving/sending financial transactions.

We learned that in health and nature related topics most of the time there are enough participants because of an intrinsic motivation. In social sciences it was harder to find participants. Pools of students are sometimes created by making participating in a research a mandatory part of their education.
We found out asking small questions regularly instead of a lot of questions once is also an upcoming phenomenon.

Cultural aspects influence how to perform citizen science too: Although international online platforms exist to recruit human participants (e.g., MTurk, Prolific, Cint, Figure8), these are unusable for research bound to the Dutch linguistic and/or cultural context, have GDPR issues (data is possibly stored on servers outside Europe) and data quality issues.

Platforms we found interesting are for example https://www.iedereenwetenschapper.be/ (translated: everybody is a scientist) where citizen science is used for research in climate change, historical research etc.

Also https://www.otree.org/ a software platform for behavioral experiments got our attention because they have an example with payments: https://otree-more-demos.herokuapp.com/demo/randomize_stimuli

We also found some very interesting experimental use cases.

Experimental use case: publishing scientific papers

Publication of scientific papers that are peer reviewed cost money to access. Open science is the practice of science in such a way that others can collaborate and contribute, where research data are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods. Publishing scientific papers combined with Web Monetization could be an interesting use case to investigate.

Experimental use case: paid devices and sensors as a sustainable form of environmental research

Deskresearch put us on the path of a project that installs 20.000 weather stations in Africa.
Experience learns that if the project funding stops the infrastructure also stops. The solution is to make the project self-sustaining. This can be done by selling the data. With the proceeds you can maintain the network, let it grow and expand all over Africa.
A customer could be insurance companies that can use the data to verify a claim of loss of crops by using the data from the weather station without having to travel huge distances to each field to check and assess the claim. When you start to think about the possibilities with SolidPods connected to weather stations then even more innovation is possible.

Towards reusable surveys & survey responses

We included Linked Data and Solid in our project as a way to enable citizens to store their survey data on their own SolidPods. It aligns with the principles of standardized FAIR data: Findable, Accessible, Interoperable, and Reusable.
Boths surveys and survey responses are not standardized a lot, at least not technically. This means that the questions themselves have vastly different representations, and the same goes for the answers.

Reusable data and surveys are more likely to be involved in payments because they become more valuable. In the sections belwo, we'll discuss our learnings, why it makes sense to standardize both responses and questions, and how this could be realized.

Reusable responses

Since many surveys describe personal information, it makes sense, as a respondent, to have a way of storing the information you filled in in a place that you control. Making this possible enables a few nice use cases.

Previously entered response data could be usable while filling in new surveys. This could result in a UX similar to auto-filling forms, but far more powerful and rich than browsers currently support.
Standardized survey responses could also be used to gather insights into your own personal information. For example, filling in a survey about how your shortness of breath linked to air pollution has been today could be used in a different app to make a graph that visualises how your shortness of breath has progressed over the months for personal insight.

Achieving something like this requires a high degree of standardization in both the surveys and the responses. The survey and its questions should provide information about:

The question itself. This is required in all survey questions, of course.
The required datatype of the response, such as 'string', or 'datetime' or some 'enumeration'.
A (link to a) semantic definition of the property being described. This is a bit more obscure: all pieces of linked data use links, instead of keys, to describe the relation between some resource and its property. For example, a normal resource might have a 'birthdate', while in linked data, we'd use 'https://schema.org/birthDate'. This semantic definition makes things easier to share, because it prevents misinterpretation. Links remove ambiguity.
A query description. This is even more obscure, but perhaps the most interesting. A query description means describing how a piece of information can be retrieved. Perhaps a question in a survey will want to know what your payment pointer is. If a piece of software wants to auto-fill this field, it needs to know where it can find your payment pointer.

How to achieve these technical goals? Well, the inventor of the World Wide Web (sir Tim Berners-Lee) started an initiative called Solid, which is a set of specifications that aim to help users get control over their data, and improve interoperability between web applications. We wanted to work in this direction.

However, achieving the high degree of survey response interoperability seems a bit too ambitious, at this moment.

Firstly, the problems regarding data discoverability and data types are not yet solved for Solid. There are (at least) two competing standards for describing datatype shapes (SHACL and SHEX).

The Shape Trees specification aims to solve some of these problems, but implementing this (still in draft) spec in our infrastructure would be too time consuming at the moment. The specification is very interesting, but is also not stable yet and quite complex. It requires having a SHEX parser, working with dynamic filenames, supporting file systems and more.

Some of us are working on a specification that has some similarities to Solid, called Atomic Data. It can be converted to RDF, but it is a bit more constrained. These constraints allow for some solutions that could really help realizing a higher degree of standardization in surveys. The Atomic Schema spec could standardize the surveys and responses, and Atomic Paths enable the query descriptions. If you'd like to contribute to a potential solution, please check out the issue for Atomic Data Surveys.

Reusable surveys: context matters

If the answers based on a survey are stored in a personal data storage, these become reusable from a machine point of view. However also the context of the survey matters: if I am a respondent and rate my shortness of breath linked to air pollution (an environmental health topic) then this answer can't be re-used one on one on a Covid-related questionnaire on shortness of breath for example: people may asses their shortness of breath differently in those different contexts. To become a valuable questionnaire the context needs to become insightful too.

During the interviews with researchers it became clear that there are some projects running that try to standardize the surveys and to describe their context in a standardized way, but none of those initiatives formalize their work into ontologies that are structured enough to translate them into SHACL, SHEX or shape trees. Right now, survey tools use different kinds of representations and models, and these can often not be shared between them. To be reusable in practice, this gap needs to be bridged too. One possible way forward is to extend an existing project like Data Documentation Initiative (DDI, https://ddialliance.org/) with formalized descriptions of the data.

Privacy, surveys, data allocation and Solid pods

One of the core principles in privacy is that no more personal data is collected, stored, accessed, used and exchanged then is needed for the goal it serves. In the context of surveys the goals and with it the need for processing data hugely differs from use case to use case. To name some:

In some cases the researcher only needs the responses to one survey, in other cases the researcher needs to link the results from the survey with different data sources. To do so, the researcher needs some kind of identifier to link the data.
In some use cases, changing the answers after the data is once submitted will break the research, in other use cases updating the answers is part of the research.
Often, but not always, the researcher will 'normalize' the data: create a copy of the data that is cleared from errors and can be used for statistical analysis. This copy diverts from the original responses.

Such differences result in different levels of control of the respondent over the submitted data. The basic assumption of Solid pods: 'it is my data, so I need control over it', is not that straightforward in the context of surveys. Still there are some use cases in the context of surveys where Solid pods may have additional value:

To create a personal copy of the responses that can be used for own insight, possibly over time and own analysis.
To pre-fill the answers on similar questions in subsequent surveys.
To share the answers with different researchers for different research.

All these use cases demand full control of the respondent over the data stored, so in these use cases, Solid pods are a powerful tool. The more reusable the data is stored on the Solid pod, the better these use cases can be covered.

Standardizing Citizen Science data in Europe

After the interim report we expanded our horizon from national to European initiatives.

The European Citizen Science Association is a great portal leading you to different aspects. They have a lot of working groups to create a sustainable ecosystem of citizen science. Although no particular working groups are focussing on payments, via the working group Projects, data, tools and technology we found a current project, Cos4cloud, that is about standardizing data for the European Open Science Cloud (an environment for hosting and processing research data to support EU science) and also involves interoperability, a topic that is related to reusable surveys.

We talked with the coordinators of the Cos4Cloud project who are very experienced in the field of citizen science.

They have experienced there is a lot of diversity in projects and making data reusable is hard. The challenge is to make data trustworthy. Data is not trusted because citizen science is based on subjectivity and prone to errors, it's often too much about quantity instead of quality, lacks a procedure and the methods behind results are sometimes not shared because of intellectual property.
To get trusted data they named 3 approaches:

a typical approach is data redundancy
another approach is to chop up questions so people can't relate their part to the end result and data is more likely to be trustworthy
An interesting approach is an indicator of trust for example by verifying the characteristics of a participant. To use this there is a need for cryptographically signed characteristics which can be found in this W3 spec of Verifiable Credentials.

In today's world of citizen science valuable contributions exist in various projects hosted at different portals. The data collected by citizens is mostly inaccessible for experts review or research.

To enable the cooperation between citizen, scientists, experts and researchers the Cos4Cloud project uses an authenticity broker: https://www.authenix.eu/

We learned about the OGC Standards to standardize data. In the Cos4Cloud project they are working on an api called OGC SensorThings API.
They are building an api based common data model, with a grouping concept and relationship concept. This helps to get from databases to relationships.

To have reusable data also requires licenses which specify how to share the data.

What could really help to standardize data is when citizen science becomes an annex of the INSPIRE knowledge base, which aims to create a European Union spatial data infrastructure for the purposes of EU environmental policies and policies or activities which may have an impact on the environment.

Payments

They also mentioned sometimes citizen science projects reserve funding to attract participants to mobilise participation in local communities. Our input was that when participating in surveys involves a payment (donation) to local projects that serve the community, this could help attract participants and still incentivize an altruistic motivation.

Key activities

Cash-link: Rewarding and Privacy by design

Cash-link is a reward collection service. We separated the server of the payments from the survey server so after filling in a survey a respondent is redirected to a screen where a respondent can fill in a payment pointer and get a reward. This will act as a reward collection service that's reusable.

We think the idea of a 'reward collection service' where you can fill in your payment pointer and get rewarded is a great abstraction of the concept of value transfer with payment pointers and will provide bigger value for the interledger community. Also this contributes to privacy by design of the POC.

In the context of surveys, the researcher can choose to use their own identifiers for identifying individuals responding to the survey. The survey system itself uses an unique token for each respondent, which might include an internal identifier used for linking responses to individuals by the researcher. After submitting the survey, the respondent can enter a payment pointer to receive a reward. To avoid linking payment pointers to individual survey responses, potentially creating a privacy breach, a strong separation of the identifiers and the payment pointer is needed. At the same time the system must be resilient to fraud and create an audit trail.

That's why we're building a separate service for handling the payout process called Cash-link. Both the survey tool and the payout service run as separate services and when communicating with each other, they encrypt and authenticate their communication mutually with an asymmetric keypair.
On response submission, the survey service generates an encrypted token that represents a certain value, and encodes it into a URL. This URL sends the user to the payment service, which decodes the encrypted token and prompts the user for their payment pointer. This architecture prevents the Survey service from knowing which payment pointer is linked to a certain survey response. Though for the respondent the separation is barely noticeable; the respondent only reveals the payment pointer to the payment service.
Both the survey service and the payment service keep their own logfiles and they log a unique payment identifier per transaction, making an audit possible, but only after explicitly combining the log-files from two different services. The researcher only has access to the single-use identifier in the link and may combine it with their own identifiers without any risk of correlation to identifiers like the payment pointer. The survey results themselves are of course accessible to the researchers, but the participant may store and reuse the data when they want to.

This Cash-link service is a standalone typescript node server designed to be re-used in other projects and it is fully open source. The tokens themselves can be generated by any library that supports JWE (JSON Web Encryption), which means that many existing frameworks and languages have tools available to create these tokens. A reference implementation in typescript is also available here.

Data Privacy Impact Assessment

Our privacy expert wrote an interesting Data Privacy Impact Assessment for the Survey Rewards and Payment Service.

Lessons learned with Uphold and Interledger

Our project is a bit different from most of the other projects in this community, as our concept involves sending one-off payments to payment pointers from a server, instead of sending payment streams to payment pointers from a browser.

We've come a long way building the payment server and enabling setting payments with coupons, but found out that there is a missing last puzzle piece since Uphold (and all of the other wallets) currently does not support Interledger Payments in it's API for sending payments. We have finished the reward collections service based on the current integration we have built with Uphold and connected this with the survey server.

After our Interim report we contacted Ben Shafarian, CTO of Coil, to have a more in depth view in the roadmap of Rafiki and the development of the API.The milestones are nice way to keep track of the development

Although this is the formal end of our project we keep an eye out on Rafiki's development to see if we can implement the last piece of the puzzle later.

POC survey management tool

During our project we are researching potential uses, which mainly involves interviewing researchers. These researchers are generally quite familiar with survey tools, which helped us to identify potential areas for improvement.

We designed a user interface for both the researchers and for the respondents. While building the payments server we also focussed on the UI and UX while iterating in Figma in new designs and user flows constantly.

We've built the software for creating a survey, setting payments and generating and validating the tokens for survey invites. Setting up the maximum wallet amount is done via the console. The token part is basically an implementation of the designs shown above.

The next step for us was building the Cash-Link service mentioned earlier, which is a standalone web service that turns links into cash. Open a link, enter your payment pointer, and receive funds. We've built this as a typescript application in the Next.js framework. The source code is available on Gitlab.

We've first used Typeform for creating the questions in the survey.
After that we built our own survey management tool which enables users to create their own questions and monitor reponses. We've also built the export feature for both surveys and responses, which means these can be re-used as RDF data in Solid pods.
The source code for the surveys is available on Gitlab.

Changes

In our project we have made some changes:

Export to SolidPODs

In our initial plan we wanted the Survey tool to support logging in with SolidPods using the W3C WebID spec, and copying the filled in survey data to the pod. Solid is an initiative by founder of the web Tim Berners Lee to give people more control over their data with Personal Online Data storages (SolidPODs).

Signing in with a Solidpod requires that the Survey service attains knowledge of the users identity, which is not privacy friendly. The privacy officer in our project objected to using this path: it will decrease the adoption of the system when there is an obligation to comply with the GDPR. Instead of leaving the entire feature out, we've opted for replacing it with an RDF export feature at the end of the survey so respondents can still store their own data on their Solid pod, yet it does not require the user to identify themselves. Also this allows the user to store their responses to other personal online data storages apart from Solidpods.

Changes: accounts

We have decided we will not use accounts/login for respondents. We want to collect as little data as possible from respondents from a privacy by design view. Having an account is not necessary for respondents. We prevent sleeping accounts with passwords and private data.
Also this enables respondents to start right away with the survey, instead of having to create an account first, which enhances the user friendliness and user flow and prevents road blocks. This is also feedback from the researchers we interviewed.

Changes: invitations

Our initial thought was we would send emails with invite-urls to respondents via the survey platform. We have changed this in enabling the researcher to upload respondent's id's (like mail addresses or other id's) and generate a list of tokens linked to those id's.
The researcher can export this list, and use it to send the invite-urls themselves. After the invite-urls have been sent, the researcher can see on the platform which tokens weren't used yet, export a list of these and send a reminder themselves. We don't send mass-mail invites via the platform. It offers the researcher the opportunity to use other communication tools apart from email (like platforms or apps). Also a researcher can upload other id's (numbers related to their own database) or anonymized id's which contributes to privacy. Mass emailing from a platform can lead to being blocked by spam filters. Also setting up a mass invite tool which provides good user experience like custom messages etc. is complex to add to other activities in the restricted project time and work out properly.

Communications and marketing

Blogs

For this project we wrote 3 blogs to cover different viewpoints on the subject: a blog for developers, researchers and people interested in privacy.

An astronauts view on payments and data- a blog for researchers in citizen science link
Paying and not knowing, a blog about privacy link
On linked data surveys, rewards and data ownership - a blog for developers link

Researchers

In this project we have contacted several scientists from organisations and projects that work with citizen science and environmental health to inform them about Interledger and learn how they work, and to ask what they think of this service and look at possible use cases it can be applied to and also dug deeper into the topic by doing deskresearch. We integrated those learning into our project. A list of some of the organisations / scientists we contacted:

Cities Health project, an citizen science project in Europe focussing on environmental science and epidemiology by tackling health issues that concern them.
Alterra (Wageningen Environmental Research) an organisation that contributes by qualified and independent research to the realisation of a high quality and sustainable green living environment.
ODISSEI (Open Data Infrastructure for Social Science and Economic Innovations) is the national research infrastructure for the social sciences in the Netherlands.
Mass online Experiments in Social Sciences (Utrecht University) which also include payment and survey related topics.
Developers of a Dutch participant recruitment platform from the Vrije Universiteit Amsterdam who are developing an affordable, sustainable, and secure online participant/annotator recruitment platform for Netherlands-based academic researchers. The reason they develop this platform, although there are already international online platforms for this, are obstacles in cultural aspects, GDPR and data quality.
We expanded our horizon from national to European initiatives. The European Citizen Science Association led us to the Cos4cloud project. Our learnings on this project we shared above under Progress on objectives in the section standardizing Citizen Science in Europe.
We talked to a professor at the department of Methodology and Statistics at Utrecht University, specialized in survey methodology, which includes inferences using a mix of survey data and big data and the modelling of survey errors, and the use of sensor technology in smartphones. He shared a list of awesome Citizen Science projects from the Netherlands with additional information such as duration, organizations, and links to resources.

What's next?

Inform the contacted researchers and organisations about our report
We keep an eye out on Rafiki's development to see if we can implement the last piece of the puzzle of the cash-link payment service
Clean up the repository and look how we can make the export function more user-friendly

Additional comments

We would like to thank the Grant for the Web team and project founders for the opportunity to work on this project. We'll keep track of the community to explore if we can do a follow-up project in the future to further expand on what we have learned and built so far.

The Interledger Community 🌱