Danielle McCool statistician & research engineer · utrecht university

Statistics at the point of measurement.

I develop statistical methodology (and build the software to run and support it) for environments where data acquisition, computation, and privacy can't be neatly separated. Increasingly, that's where social science actually happens.

Co-maintainer of Port R · Python · Rust · Go · methods & infrastructure Based in Utrecht 🇳🇱
observe compute infer
01

Research interests

What kinds of things do I do?

Inference under measurement constraint is a more general problem.

I started my statistical career by looking at sensor data from Smart Surveys, which use mobile devices to augment data collection. The general consensus was that the data were incredibly useful, but researchers were struggling to do anything with it because measurement error and missingness got in the way of meaningful inference. This was my first introduction to the fundamental underlying tension we encounter as we expand into new areas: traditional statistics tend to work from the perspective that data collection and analysis are independent events, and that one precedes the other, but that isn't always true.

Working in compute-restricted environments means that you end up offloading some portion of your analysis before you ever get your data. Whether this is establishing stay-point detection rules on a travel survey, performing in-browser classification of video comments in data donation, or pre-specifying models when working in secure enclaves. These often look like engineering problems, but they fundamentally resist engineering solutions. They become statistical problems with consequences for inference.

Measuring something changes it, and that applies to data, too.

Pillar 01

Valid inference under decentralized, heterogeneous observation

Estimation and uncertainty that hold up when streams are partial, uneven, and never pooled in one place.

Pillar 02

Statistical learning under local computation

Procedures that run at the point of data generation (on the device, inside the enclave), not after centralization.

Pillar 03

Efficiency under resource & communication constraints

Designs that decide what to compute, retain, and transmit when bandwidth, battery, and privacy are all finite.

// what it isn't: privacy-preserving ML edge AI a privacy toolbox benchmarking under constraints → inference when measurement itself is constrained
02

Current work

What projects am I working on lately?
what it is

Port has been successfully running in production since 2020, and has provided the software for dozens of researchers to undertake privacy-preserving data donation studies. In Port, the data extraction happens in a respondent's own browser: we run Python on the page with pyodide to pull out only the parts that are relevant for a given study so the original file never leaves their device. Respondents can look at the rows and choose to remove ones that are sensitive.

The in-the-browser constraint is critical: it's local computation at the moment of collection, and it's a pattern we should expect to see more of in statistics as we seek more privacy and bigger data. As a consequence, we need better ways to scale down our methods and more insight into the impact.

ReferenceBoeschoten, L., de Schipper, N. C., Mendrik, A. M., van der Veen, E., Struminskaya, B., Janssen, H., & Araujo, T. Port: A software tool for digital data donation. Journal of Open Source Software, 8(90), 5596 (2023). doi:10.21105/joss.05596 ↗

how a donation works
01
Request

A participant makes a GDPR-backed request for their data.

02
Extract, locally

Extraction runs in their own browser, with the full export parsed entirely on their own machine.

03
Review & delete

They see their actual data, row by row, and delete what they'd rather not share before agreeing to the donation.

Port in-browser extraction GDPR right of access research infrastructure open source
example · try it

Your comments

Comments you've left on TikTok videos.

2 columns · 0 rows delete the rows you'd rather not share
Comment Date

Currently in development for a Utrecht University sociologist where the underlying question was, "How do we measure exposure to crime-related videos on TikTok?" Here too we're working in constrained compute environments in service of privacy: we need modern methods to answer these questions at scale, but we have a responsibility to be careful with respondents' donated data.

For this, I developed a Rust pipeline that takes donated TikTok exports, fetches the videos the respondents watched, transcribes and classifies them with local models, at the scale of 1 million videos. Viewing history never leaves the secure environment to be processed by a third-party. The engineering serves to get everything working, but the remaining questions are statistical: how do you classify transcription or classifier uncertainty into the resulting estimates so that we can provide an honest inference?

Rust whisper.cpp local models uncertainty propagation ≈1M videos

A comparative study of API-based data access versus DSAR-export access for data-donation research across major platforms. Moving from one-off exports to APIs reshapes the measurement system itself: it relocates computation, opens up longitudinal rather than one-shot observation, and changes the privacy guarantee of the pipeline. The part I care about is the statistical one: what that does to validity, representation, and the kinds of inference these systems can actually support.

data portability API vs. DSAR access measurement design statistical validity in progress
03

CV & publications

What have I done before?

PhD thesis

Defended January 2026
Supervisors: Prof. dr. Peter Lugtig & Prof. dr. Barry Schouten

Missing data in sensors measuring human mobility, and new imputation methodology for time-series data, drawing on statistics, machine learning, and transportation research to give applied researchers usable tools and guidelines.

Experience

Research Engineer
Utrecht University · D3I (Digital Data Donation Infrastructure)
Postdoctoral Researcher
Utrecht University · SHARE, NL Country Team Organizer
Postdoctoral Researcher
Utrecht University · ESSnet Smart Surveys (Eurostat)
Researcher
Statistics Netherlands (CBS) · mobility data
Consultant
ScanmarQED · multilevel aggregation, R & data-viz training

Education

PhD, Statistics & Methodology
Utrecht University
MSc, Statistics & Methodology
Utrecht University · 8.3, cum laude
BSc, Psychology / Philosophy
Texas Woman's University, Denton TX

Selected publications

McCool, D., Lugtig, P., & Schouten, B. Dynamic Time Warping-based imputation of long gaps in human mobility trajectories.
arXiv preprint · 2024 · arXiv:2410.16096
McCool, D., Lugtig, P., & Schouten, B. Maximum interpolable gap length in missing smartphone-based GPS mobility data.
McCool, D., Lugtig, P., Mussmann, O., & Schouten, B. An app-assisted travel survey in official statistics: possibilities and challenges.
Journal of Official Statistics · 2021 · 37(1), 149–170

Software

topdowntimeratio: top-down time-ratio segmentation for coordinate trajectories.
R package · CRAN · 2022
stopdetection: stop detection in timestamped trajectory data via spatiotemporal clustering.
R package · CRAN · 2022

Selected talks

The role of participant understanding in data donation studies
ESRA · Utrecht
The future is smart (surveys)
EMOS Guest Lecture · Utrecht
Dynamic-time-warping imputation of long gaps in human trajectories
MASS · Manchester

Building the methods and the software they run on.

Lines & stations lifted from my thesis cover (thanks @Sharkblood). Set in Poppins, Newsreader & Space Mono. Obviously there's no trackers, cookies, scraping, data transfer, anything like that ... it'd be a bit on the nose. © 2026 Danielle McCool