Danielle McCool statistician & research engineer · utrecht university

Statistics at the point of measurement.

I develop statistical methodology (and build the software to run and support it) for environments where data acquisition, computation, and privacy can't be neatly separated. Increasingly, that's where social science actually happens.

Co-maintainer of Port R · Python · Rust · Go · methods & infrastructure Based in Utrecht 🇳🇱

observe — compute — infer

Research interests

What kinds of things do I do?

Inference under measurement constraint is a more general problem.

I started my statistical career by looking at sensor data from Smart Surveys, which use mobile devices to augment data collection. The general consensus was that the data were incredibly useful, but researchers were struggling to do anything with it because measurement error and missingness got in the way of meaningful inference. This was my first introduction to the fundamental underlying tension we encounter as we expand into new areas: traditional statistics tend to work from the perspective that data collection and analysis are independent events, and that one precedes the other, but that isn't always true.

Working in compute-restricted environments means that you end up offloading some portion of your analysis before you ever get your data. Whether this is establishing stay-point detection rules on a travel survey, performing in-browser classification of video comments in data donation, or pre-specifying models when working in secure enclaves. These often look like engineering problems, but they fundamentally resist engineering solutions. They become statistical problems with consequences for inference.

Measuring something changes it, and that applies to data, too.

Pillar 01

Valid inference under decentralized, heterogeneous observation

Estimation and uncertainty that hold up when streams are partial, uneven, and never pooled in one place.

Pillar 02

Statistical learning under local computation

Procedures that run at the point of data generation (on the device, inside the enclave), not after centralization.

Pillar 03

Efficiency under resource & communication constraints

Designs that decide what to compute, retain, and transmit when bandwidth, battery, and privacy are all finite.

// what it isn't: privacy-preserving ML edge AI a privacy toolbox benchmarking under constraints → inference when measurement itself is constrained

Current work

What projects am I working on lately?

what it is

Port has been successfully running in production since 2020, and has provided the software for dozens of researchers to undertake privacy-preserving data donation studies. In Port, the data extraction happens in a respondent's own browser: we run Python on the page with pyodide to pull out only the parts that are relevant for a given study so the original file never leaves their device. Respondents can look at the rows and choose to remove ones that are sensitive.

The in-the-browser constraint is critical: it's local computation at the moment of collection, and it's a pattern we should expect to see more of in statistics as we seek more privacy and bigger data. As a consequence, we need better ways to scale down our methods and more insight into the impact.

ReferenceBoeschoten, L., de Schipper, N. C., Mendrik, A. M., van der Veen, E., Struminskaya, B., Janssen, H., & Araujo, T. Port: A software tool for digital data donation. Journal of Open Source Software, 8(90), 5596 (2023). doi:10.21105/joss.05596 ↗

how a donation works

Request

A participant makes a GDPR-backed request for their data.

Extract, locally

Extraction runs in their own browser, with the full export parsed entirely on their own machine.

Review & delete

They see their actual data, row by row, and delete what they'd rather not share before agreeing to the donation.

Port in-browser extraction GDPR right of access research infrastructure open source

example · try it

Your comments

Comments you've left on TikTok videos.

2 columns · 0 rows delete the rows you'd rather not share

	Comment	Date

datadonation.eu ↗ github.com/d3i-infra ↗

Currently in development for a Utrecht University sociologist where the underlying question was, "How do we measure exposure to crime-related videos on TikTok?" Here too we're working in constrained compute environments in service of privacy: we need modern methods to answer these questions at scale, but we have a responsibility to be careful with respondents' donated data.

For this, I developed a Rust pipeline that takes donated TikTok exports, fetches the videos the respondents watched, transcribes and classifies them with local models, at the scale of 1 million videos. Viewing history never leaves the secure environment to be processed by a third-party. The engineering serves to get everything working, but the remaining questions are statistical: how do you classify transcription or classifier uncertainty into the resulting estimates so that we can provide an honest inference?

Rust whisper.cpp local models uncertainty propagation ≈1M videos

github.com/daniellemccool/uu-tiktok ↗

A comparative study of API-based data access versus DSAR-export access for data-donation research across major platforms. Moving from one-off exports to APIs reshapes the measurement system itself: it relocates computation, opens up longitudinal rather than one-shot observation, and changes the privacy guarantee of the pipeline. The part I care about is the statistical one: what that does to validity, representation, and the kinds of inference these systems can actually support.

data portability API vs. DSAR access measurement design statistical validity in progress

CV & publications

What have I done before?

PhD thesis

Defended January 2026

Field notes from the travel app frontier: Build Me, Break Me, Impute Me

Supervisors: Prof. dr. Peter Lugtig & Prof. dr. Barry Schouten

Missing data in sensors measuring human mobility, and new imputation methodology for time-series data, drawing on statistics, machine learning, and transportation research to give applied researchers usable tools and guidelines.

Experience

2025–now

Research Engineer

Utrecht University · D3I (Digital Data Donation Infrastructure)

2024–25

Postdoctoral Researcher

Utrecht University · SHARE, NL Country Team Organizer

2023–25

Postdoctoral Researcher

Utrecht University · ESSnet Smart Surveys (Eurostat)

2023

Researcher

Statistics Netherlands (CBS) · mobility data

2016–18

Consultant

ScanmarQED · multilevel aggregation, R & data-viz training

Education

2018–26

PhD, Statistics & Methodology

Utrecht University

2012–14

MSc, Statistics & Methodology

Utrecht University · 8.3, cum laude

2005–08

BSc, Psychology / Philosophy

Texas Woman's University, Denton TX

↓ Download full CV (PDF)

Selected publications

McCool, D., Lugtig, P., & Schouten, B. Dynamic Time Warping-based imputation of long gaps in human mobility trajectories.

arXiv preprint · 2024 · arXiv:2410.16096

McCool, D., Lugtig, P., & Schouten, B. Maximum interpolable gap length in missing smartphone-based GPS mobility data.

Transportation · 2022

McCool, D., Lugtig, P., Mussmann, O., & Schouten, B. An app-assisted travel survey in official statistics: possibilities and challenges.

Journal of Official Statistics · 2021 · 37(1), 149–170

Software

topdowntimeratio: top-down time-ratio segmentation for coordinate trajectories.

R package · CRAN · 2022

stopdetection: stop detection in timestamped trajectory data via spatiotemporal clustering.

R package · CRAN · 2022

Selected talks

2025

The role of participant understanding in data donation studies

ESRA · Utrecht

2024

The future is smart (surveys)

EMOS Guest Lecture · Utrecht

2023

Dynamic-time-warping imputation of long gaps in human trajectories

MASS · Manchester

◆ Full list on Google Scholar ↗

Building the methods and the software they run on.

Email d.m.mccool@uu.nl ↗ GitHub daniellemccool ↗ ORCID 0000-0002-7055-7539 ↗ Google Scholar Publications ↗ Mastodon @DanielleMcCool ↗ Bluesky dmccool ↗ LinkedIn daniellemccool ↗

Lines & stations lifted from my thesis cover (thanks @Sharkblood). Set in Poppins, Newsreader & Space Mono. Obviously there's no trackers, cookies, scraping, data transfer, anything like that ... it'd be a bit on the nose. © 2026 Danielle McCool