Predicting adherence to ecological momentary assessments

Ecological momentary assessments (EMAs) prompt users with a short questionnaire. One of the biggest challenges in such studies is the lack of adherence, i.e., users stop filling out the questionnaires. Being able to predict if a user will fill out a questionnaire could allow for specifically addressing those users, or for over-sampling populations at higher risk of dropping out of a study.

Based on an observational study of the general population, we analyzed data from almost 1,000 users. The data include a large variety of sensor data from the users’ smartphones. With machine learning, we predict adherence on a day-to-day level, as well as predict adherence based on participant data after on-boarding.

Results:

  • For day-to-day prediction, the best performing model was a model based on metadata features (days since first questionnaire was filled out, days since the last questionnaire was filled out, number of filled-out questionnaires, days since app installation), yielding an area under the precision-recall curve of 0.89.
  • The inclusion of sensor data did not improve the model’s performance, indicating that the high cost of collecting and processing sensor data is not worth the benefits for predicting fill-out behavior.
  • Predicting at sign-up if a user will adhere to a questionnaire prompt at least once was better than chance, but further studies are needed.

You can find our full article published in Expert Systems with Applications (IF: 7.5) here (full PDF here).

The x-axis shows how many days have passed since app installation. The y-axis shows the number of user-day combinations. This means that about day 0 (app installation day), we have data from all users, almost 1,000. At day 100 after app installation, only about 100 users remain. The drop-off shown here is a typical pattern in ecological momentary assessments (EMA) apps.
Using both sensor data and metadata features, the prediction if a user will fill out a questionnaire in the evening works very well. Removing the sensor data did not reduce the prediction performance though. Note that there are five models because we used nested cross-validation.
Dr. Felix Beierle
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.