Imputing

Smartphone

Mobility Data

Danielle McCool

Solution Evolution

Future Technologies

Windows and manual navigation controls were added to account for some situations

Dirty lines demanded development of packet-switching and error-checking mechanisms

Needed entirely new methodology for Next-Gen Sequencing

First Sequencer · Science Museum Group · CC

Arpanet 1972 Map · UCLA and BBN · CC

Travel Diary Studies were once future tech

That had unexpected problems

  • Enormous burden1

That had unexpected problems

  • Enormous burden1
  • Poor data quality2

That had unexpected problems

  • Enormous burden1
  • Poor data quality2
  • Wouldn’t validate against existing external sources3

The evolution of the solutions


took decades.

Current state of Travel Diary Studies

  • Widespread usage globally
  • Robust methodology
  • One even forms the backbone of the Dutch national transportation model

Data from the app

Data from the app 2

Data from the app 2

Data from the app

Missing data

  • Most people have missing data
  • In 2018 data, Only 2 of 274 respondents had 7 contiguous days of complete data
  • In 2022 data, similar issues

Gaps in the data

Gaps in the data

Short gap sensitivity:

Short gap sensitivity: Median (%) absolute difference
Min Removed/Hr Travel Distance (km) Stops
1 0 (0%) 0 (0%)
2 -0.1 (0%) 0 (0%)
3 -0.2 (0%) 0 (0%)
4 -0.3 (0%) 0 (0%)
5 -0.4 (0%) 0 (0%)
10 -1 (-4%) 0 (0%)
15 -1.8 (-7.4%) 0 (0%)
20 -2.9 (-10.8%) 0 (0%)

Long gaps

  • Short gaps are mostly fine
  • Long gaps not so much

Many long gaps

Long gaps at night

Imputing the data

The imputation procedure

A query is the trajectory with a gap

The imputation procedure

We need data to fill the gap from other (complete) trajectories

The imputation procedure

We calculate how similar trajectories are before and after the gap using Dynamic Time Warping

Dynamic Time Warping

What is Dynamic Time Warping

Dynamic Time Warping finds the path of best alignment between two series

What is Dynamic Time Warping

There are lots of ways to specify its parameters

Two variants

We selected a high-information and low-information variant to test on simulated data.

  • High-information specifies parameters that opt for closer matching to longer periods of data – optimal when there’s lots of overlapping data from individuals.

  • Low-information specifies parameters that are more lax and matches trajectories based on what occurred immediately before and after the gap.

Results

Results

  • Performance over all conditions favors DTW methods
  • Especially the low-information variant

Long gaps - One hour

Method Abs Bias Med Bias TP Acc
LI 0.8Km 0Km 93.00%
MI 0.9Km 1.9Km 93.00%
TWI 1.4Km 0.2Km 89.30%
DTWBI 0.5Km 0Km 95.00%
DTWBMI-HI 1.4Km 0Km 94.10%
DTWBMI-LO 0.7Km 0Km 95.70%

Long gaps - Six hours

Method Abs Bias Med Bias TP Acc
LI 5.4Km −0.2Km 92.90%
MI 1.4Km 11.5Km 94.50%
TWI 0.2Km 3.3Km 93.00%
DTWBI 3.4Km 0Km 96.50%
DTWBMI-HI 3.4Km 0.1Km 94.80%
DTWBMI-LO 1.9Km 0.1Km 95.60%

Long gaps - Twelve hours

Method Abs Bias Med Bias TP Acc
LI 9.4Km −1.9Km 94.40%
MI 10.9Km 21.2Km 95.20%
TWI 9.3Km 13Km 93.80%
DTWBI 0.1Km −0.4Km 95.90%
DTWBMI-HI 4.5Km 2.4Km 94.30%
DTWBMI-LO 0.2Km 1.7Km 96.00%

Comparison with interpolation

Gap Length Method Abs Bias Med Bias
1 hr LI 0.8Km 0Km
1 hr DTWBMI-LO 0.7Km 0Km
6 hrs LI 5.4Km −0.2Km
6 hrs DTWBMI-LO 1.9Km 0.1Km
12 hrs LI 9.4Km −1.9Km
12 hrs DTWBMI-LO 0.2Km 1.7Km

Recap

The missing data problem is a serious problem with data collected via a smartphone

(To be expected with future tech)

There’s no fantastic existing methodology to correct for it

Dynamic Time Warping-Based Multiple Imputation has some promise

Disappointingly, the high-information variant performs worse

  • Things that might help

    • More data per person

    • Including personal/trip variables in the imputation

References

Ampt, E. S., Richardson, A. J., & Brög, W. (1985). New survey methods in transport: Proceedings of 2nd international conference, hungerford hill, australia, 12-16 september 1983. VSP.
Brög, W., Fallast, K., Katteler, H., Sammer, G., & Schwertner, B. (1985). Selected results of a standardised survey instrument for large-scale travel surveys in several european countries. In E. S. Ampt, A. J. Richardson, & W. Brög, New survey methods in transport: Proceedings of 2nd international conference, hungerford hill, australia, 12-16 september 1983 (pp. 173–191). VSP.
Brög, W., Meyburg, A. H., Stopher, P. R., & Wermuth, M. J. (1985). Collection of household travel and activity data: Development of a survey instrument. In E. S. Ampt, A. J. Richardson, & W. Brög, New survey methods in transport: Proceedings of 2nd international conference, hungerford hill, australia, 12-16 september 1983 (pp. 151–172). VSP.
Brög, W., Meyburg, A. H., & Wermuth, M. J. (1983). Development of survey instruments suitable for determining non-home activity patterns. Transportation Research Record, 944, 1–12.
McCool, D., Lugtig, P., Mussmann, O., & Schouten, B. (2021). An app-assisted travel survey in official statistics: Possibilities and challenges. Journal of Official Statistics, 37(1), 149–170.
McCool, D., Lugtig, P., & Schouten, B. (2022). Maximum interpolable gap length in missing smartphone-based GPS mobility data. Transportation, 1–31.
McCool, D., Lugtig, P., & Schouten, B. (forthcoming). Dynamic time warping-based imputation of long gaps in human mobility trajectories. Forthcoming.