Imputing
Smartphone

Mobility Data

Danielle McCool

Solution Evolution

Future Technologies

Windows and manual navigation controls were added to account for some situations

Dirty lines demanded development of packet-switching and error-checking mechanisms

Needed entirely new methodology for Next-Gen Sequencing

First Sequencer · Science Museum Group · CC

Arpanet 1972 Map · UCLA and BBN · CC

Travel Diary Studies were once future tech

(Brög, Fallast, Katteler, Sammer, & Schwertner, 1985, p. 187)

That had unexpected problems

Enormous burden¹

That had unexpected problems

Enormous burden¹
Poor data quality²

That had unexpected problems

Enormous burden¹
Poor data quality²
Wouldn’t validate against existing external sources³

The evolution of the solutions

took decades.

Current state of Travel Diary Studies

Widespread usage globally

Robust methodology

One even forms the backbone of the Dutch national transportation model

Travel Diary App

Using the data like a travel survey

It’s hard.

There’s a lot of missing data.

Data from the app

(McCool, Lugtig, Mussmann, & Schouten, 2021)

Data from the app

(McCool et al., 2021)

Missing data

Most people have missing data
Only 2 of 274 respondents had 7 contiguous days of complete data

(McCool et al., 2021)

Gaps in the data

(McCool, Lugtig, & Schouten, 2022)

Gaps in the data

(McCool et al., 2022)

Short gap sensitivity:

Short gap sensitivity: Median (%) absolute difference
Min Removed/Hr	Travel Distance (km)	Stops
1	0 (0%)	0 (0%)
2	-0.1 (0%)	0 (0%)
3	-0.2 (0%)	0 (0%)
4	-0.3 (0%)	0 (0%)
5	-0.4 (0%)	0 (0%)
10	-1 (-4%)	0 (0%)
15	-1.8 (-7.4%)	0 (0%)
20	-2.9 (-10.8%)	0 (0%)

(McCool et al., 2022)

Long gaps

Short gaps are mostly fine
Long gaps not so much

Many long gaps

Long gaps at night

Imputing the data

The imputation procedure

A query is the trajectory with a gap

The imputation procedure

We need data to fill the gap from other (complete) trajectories

The imputation procedure

We calculate how similar trajectories are before and after the gap using Dynamic Time Warping

Dynamic Time Warping

What is Dynamic Time Warping

Dynamic Time Warping finds the path of best alignment between two series

What is Dynamic Time Warping

There are lots of ways to specify its parameters

Two variants

We selected a high-information and low-information variant to test on simulated data.

High-information specifies parameters that opt for closer matching to longer periods of data – optimal when there’s lots of overlapping data from individuals.
Low-information specifies parameters that are more lax and matches trajectories based on what occurred immediately before and after the gap.

Results

Performance over all conditions favors DTW methods

Especially the low-information variant

Long gaps - One hour

Method	Abs Bias	Med Bias	TP Acc
LI	0.8Km	0Km	93.00%
MI	0.9Km	1.9Km	93.00%
TWI	1.4Km	0.2Km	89.30%
DTWBI	0.5Km	0Km	95.00%
DTWBMI-HI	1.4Km	0Km	94.10%
DTWBMI-LO	0.7Km	0Km	95.70%

Long gaps - Six hours

Method	Abs Bias	Med Bias	TP Acc
LI	5.4Km	−0.2Km	92.90%
MI	1.4Km	11.5Km	94.50%
TWI	0.2Km	3.3Km	93.00%
DTWBI	3.4Km	0Km	96.50%
DTWBMI-HI	3.4Km	0.1Km	94.80%
DTWBMI-LO	1.9Km	0.1Km	95.60%

Long gaps - Twelve hours

Method	Abs Bias	Med Bias	TP Acc
LI	9.4Km	−1.9Km	94.40%
MI	10.9Km	21.2Km	95.20%
TWI	9.3Km	13Km	93.80%
DTWBI	0.1Km	−0.4Km	95.90%
DTWBMI-HI	4.5Km	2.4Km	94.30%
DTWBMI-LO	0.2Km	1.7Km	96.00%

Comparison with interpolation

Gap Length	Method	Abs Bias	Med Bias
1 hr	LI	0.8Km	0Km
1 hr	DTWBMI-LO	0.7Km	0Km
6 hrs	LI	5.4Km	−0.2Km
6 hrs	DTWBMI-LO	1.9Km	0.1Km
12 hrs	LI	9.4Km	−1.9Km
12 hrs	DTWBMI-LO	0.2Km	1.7Km

Recap

The missing data problem is a serious problem with data collected via a smartphone

(To be expected with future tech)

There’s no fantastic existing methodology to correct for it

Dynamic Time Warping-Based Multiple Imputation has some promise

Disappointingly, the high-information variant performs worse

Things that might help
- More data per person
- Including personal/trip variables in the imputation

References

Ampt, E. S., Richardson, A. J., & Brög, W. (1985). New survey methods in transport: Proceedings of 2nd international conference, hungerford hill, australia, 12-16 september 1983. VSP.

Brög, W., Fallast, K., Katteler, H., Sammer, G., & Schwertner, B. (1985). Selected results of a standardised survey instrument for large-scale travel surveys in several european countries. In E. S. Ampt, A. J. Richardson, & W. Brög, New survey methods in transport: Proceedings of 2nd international conference, hungerford hill, australia, 12-16 september 1983 (pp. 173–191). VSP.

Brög, W., Meyburg, A. H., Stopher, P. R., & Wermuth, M. J. (1985). Collection of household travel and activity data: Development of a survey instrument. In E. S. Ampt, A. J. Richardson, & W. Brög, New survey methods in transport: Proceedings of 2nd international conference, hungerford hill, australia, 12-16 september 1983 (pp. 151–172). VSP.

Brög, W., Meyburg, A. H., & Wermuth, M. J. (1983). Development of survey instruments suitable for determining non-home activity patterns. Transportation Research Record, 944, 1–12.

McCool, D., Lugtig, P., Mussmann, O., & Schouten, B. (2021). An app-assisted travel survey in official statistics: Possibilities and challenges. Journal of Official Statistics, 37(1), 149–170.

McCool, D., Lugtig, P., & Schouten, B. (2022). Maximum interpolable gap length in missing smartphone-based GPS mobility data. Transportation, 1–31.

ImputingSmartphone Mobility Data

Imputing

Smartphone

Mobility Data

Solution Evolution

Future Technologies

Travel Diary Studies were once future tech

That had unexpected problems

That had unexpected problems

That had unexpected problems

Current state of Travel Diary Studies

Travel Diary App

Using the data like a travel survey

Data from the app

Data from the app

Missing data

Gaps in the data

Gaps in the data

Short gap sensitivity:

Long gaps

Many long gaps

Long gaps at night

Imputing the data

The imputation procedure

The imputation procedure

The imputation procedure

Dynamic Time Warping

What is Dynamic Time Warping

What is Dynamic Time Warping

Two variants

Results

Results

Long gaps - One hour

Long gaps - Six hours

Long gaps - Twelve hours

Comparison with interpolation

Recap

References

Imputing
Smartphone

Mobility Data