Linear interpolation is hypothesized to work quite well in bridging short time gaps. This represents a reduction in the density of the points, but when the gaps do not exceed time intervals of 5-10 minutes, the total level of acceptable sparsity could be much higher before it impacts travel metrics of interest.
In the first study, missingness was introduced to the data at random, representing a situation in which the missingness was not functionally related to the content of the data or the user, and in which the gaps. Each period was divided into 288 five-minute time intervals. Sparsity was introduced at ten percent intervals, ranging from , where no data were removed, to in which 90% of the five minute intervals were excluded. For each period, this process was repeated 20 times at each interval to allow for different portions of the data to be removed.
Analysis steps proceeded in this order:
For each 24 hour period
For each
Sample without replacement the number of intervals to be removed
Replace gaps in the data longer than 3 minutes and larger than 80 meters by linear interpolation between the preceding and subsequent point at one second intervals.
Apply the stop detection algorithm
Mark as a candidate stop all points for which all points in the subsequent 180 seconds are within 80 meters.
Mark as being within a stop all points falling within the previous requirement
Determine state switches (move -> stop, stop -> move, stop -> stop)
Merge stops
Repeat until no further changes are made.
Calculate aggregate measures on both states
If there is at least one move state
Apply top down time ratio algorithm to divide data set into segments and calculate aggregate measures
Repeat step 2 20 times
The data with induced missingness were compared to the complete data set in order to evaluate which metrics were impacted. Table 1 shows the decrease in moved distance and number of stops with increasing sparsity. At 30% sparsity, the mean distance retained is almost 90%, and the median distance retained is 93%. Only as sparsity levels exceed 60% does the median distance lost reach 20%. Similarly, we see that linear interpolation doesn’t reduce the median number of stops until sparsity reaches 50%. In fact, these short gaps become problematic only when they become long gaps, as two or more adjacent short gaps merge into one.
A table of full results can be found in the appendix. Results from this simulation study suggest a fairly robust response from the data with increasing levels of . See ??.
q | Delta Moved Distance (Km) | Delta Radius of Gyration | Delta Number of Stops |
---|---|---|---|
0.1 | -0.4 (-1.2%) | 1.2 (5%) | 0 (0%) |
0.2 | -1.1 (-3.8%) | 2.8 (11.3%) | 0 (0%) |
0.3 | -1.8 (-6.9%) | 4.7 (18.9%) | 0 (0%) |
0.4 | -2.6 (-10.2%) | 7.1 (28.1%) | 0 (0%) |
0.5 | -3.8 (-14.2%) | 10 (40.2%) | -1 (-9.1%) |
0.6 | -4.9 (-18.4%) | 13.7 (55.3%) | -1 (-16.7%) |
0.7 | -6.5 (-24%) | 19.1 (77.5%) | -1 (-20%) |
0.8 | -8.5 (-31.7%) | 26.9 (111.4%) | -2 (-30%) |
0.9 | -11.9 (-47.7%) | 39.8 (173.1%) | -2 (-43.7%) |
We see relatively large increases in the calculated Radius of Gyration, , both of the stops and of the movements. This metric indicates the mobility tendency of a person, and is presented in kilometers. The increase is due to … ?
A second simulation study was designed to investigate whether or not the same method of linear interpolation worked for gaps of increasing length. It was hypothesized that, in comparison to short gaps, longer gaps would exhibit a more linear relationship between sparsity and the travel metrics of interest. Rather than decreasing the location density uniformly, gaps of increasing length have a greater potential to eliminate entire trips, greatly distorting metrics such as travel distance and number of stops.
Analysis steps proceeded in this order:
For each 24 hour period
For each of 20 iterations
Select a random time point from the 24 hours
Remove 10 percent of the remaining data following from this time point
Replace gaps in the data longer than 3 minutes and larger than 80 meters by linear interpolation between the preceding and subsequent point at one second intervals.
Apply the stop detection algorithm
Mark as a candidate stop all points for which all points in the subsequent 180 seconds are within 80 meters.
Mark as being within a stop all points falling within the previous requirement
Determine state switches (move -> stop, stop -> move, stop -> stop)
Merge stops
Repeat until no further changes are made.
Calculate aggregate measures on both states
If there is at least one move state
Apply top down time ratio algorithm to divide data set into segments and calculate aggregate measures
Return to Step 3A
q | Delta Moved Distance (Km) | Delta Radius of Gyration | Delta Number of Stops |
---|---|---|---|
0.1 | 0 (0%) | 1.2 (2.5%) | 0 (0%) |
0.2 | 0 (0%) | 2.4 (5.3%) | 0 (0%) |
0.3 | -2.3 (-8.7%) | 3.4 (8.8%) | 0 (0%) |
0.4 | -6.5 (-25.3%) | 4.2 (14.4%) | -1 (-25%) |
0.5 | -11 (-46%) | 3.1 (14.6%) | -2 (-33.3%) |
0.6 | -17 (-52.3%) | 1.3 (9.2%) | -2 (-50%) |
0.7 | -24.3 (-68.3%) | -0.2 (-12.7%) | -3 (-54.5%) |
0.8 | -31.9 (-97.3%) | -7.8 (-65%) | -4 (-66.7%) |
0.9 | -38 (-100%) | -25.9 (-96.2%) | -4 (-75%) |
Radius of gyration and urbanicity
Short gaps that react favorably to linear interpolation must be distinguished from long gaps, which do not. A more in-depth look at the variation between gap lengths varying from one minute to fifteen minutes was conducted. For each complete data set, 15 iterations were conducted, in which gaps were created within each hour of varying length, after which the comparison of metrics against the full data was made.
Finally, we sought to establish a set of covariates that could impact this relationship. This could allow for extending the boundaries of what we consider the maximum acceptable gap time.
Of central importance was the investigation of time as a metric. Both Android and iOS operating systems employ mechanisms for reducing device activity during times of lesser activity levels, contributing to long gaps during nights that are unlikely to contain travel behavior.
The urbanicity of a person’s home might also impact the cutoff point decision. Persons living in more rural areas are more likely to make longer trips, which may allow for a longer gap time before significant data losses are incurred.
The age of a participant may impact the tendency towards certain travel behaviors.
Full table MCAR
Short gap simulation study | ||||
---|---|---|---|---|
mean (sd) | median | |||
abs | perc | abs | perc | |
Delta Total Distance (m) | ||||
0.1 | −1.2K (2.4K) | −2.8% (4.0%) | −575.0 | −1.7% |
0.2 | −2.6K (3.9K) | −6.1% (6.4%) | −1.4K | −4.4% |
0.3 | −4.1K (5.6K) | −9.5% (8.3%) | −2.4K | −7.5% |
0.4 | −5.8K (7.6K) | −13.2% (10.1%) | −3.4K | −10.8% |
0.5 | −7.7K (9.3K) | −17.7% (12.3%) | −4.8K | −14.7% |
0.6 | −9.9K (11.7K) | −22.3% (14.2%) | −6.2K | −18.9% |
0.7 | −12.5K (14.4K) | −27.9% (16.4%) | −7.8K | −23.9% |
0.8 | −16.2K (18.8K) | −35.5% (19.4%) | −10.0K | −30.7% |
0.9 | −22.1K (26.4K) | −46.5% (22.9%) | −13.1K | −41.8% |
Delta Moved Distance (m) | ||||
0.1 | −1.0K (2.0K) | −2.7% (7.3%) | −368.8 | −1.2% |
0.2 | −2.2K (3.6K) | −6.3% (10.9%) | −1.1K | −3.8% |
0.3 | −3.5K (5.2K) | −10.1% (14.1%) | −1.8K | −6.9% |
0.4 | −5.0K (7.0K) | −13.8% (16.2%) | −2.6K | −10.2% |
0.5 | −6.7K (8.8K) | −19.0% (18.8%) | −3.8K | −14.2% |
0.6 | −8.7K (11.1K) | −24.2% (21.2%) | −4.9K | −18.4% |
0.7 | −11.1K (13.8K) | −31.0% (23.8%) | −6.5K | −24.0% |
0.8 | −14.6K (18.2K) | −40.4% (26.7%) | −8.5K | −31.7% |
0.9 | −20.4K (25.6K) | −54.0% (29.5%) | −11.9K | −47.7% |
Delta Number of Stops | ||||
0.1 | −0.1 (0.6) | −1.6% (8.0%) | 0.0 | 0.0% |
0.2 | −0.3 (0.8) | −3.6% (11.3%) | 0.0 | 0.0% |
0.3 | −0.5 (1.1) | −5.9% (14.4%) | 0.0 | 0.0% |
0.4 | −0.7 (1.3) | −8.4% (16.3%) | 0.0 | 0.0% |
0.5 | −1.0 (1.6) | −11.6% (18.5%) | −1.0 | −9.1% |
0.6 | −1.3 (1.9) | −15.7% (20.8%) | −1.0 | −16.7% |
0.7 | −1.7 (2.2) | −21.0% (22.5%) | −1.0 | −20.0% |
0.8 | −2.2 (2.7) | −28.4% (24.6%) | −2.0 | −30.0% |
0.9 | −3.0 (3.2) | −40.4% (25.7%) | −2.0 | −43.7% |
Delta Number of Moves | ||||
0.1 | −0.1 (0.5) | −1.8% (10.5%) | 0.0 | 0.0% |
0.2 | −0.3 (0.8) | −4.5% (14.7%) | 0.0 | 0.0% |
0.3 | −0.5 (1.1) | −7.6% (18.7%) | 0.0 | 0.0% |
0.4 | −0.8 (1.3) | −11.4% (20.5%) | 0.0 | 0.0% |
0.5 | −1.1 (1.6) | −16.7% (23.0%) | −1.0 | −14.3% |
0.6 | −1.5 (1.9) | −22.9% (25.1%) | −1.0 | −22.2% |
0.7 | −1.9 (2.3) | −31.2% (27.0%) | −1.0 | −33.3% |
0.8 | −2.6 (2.7) | −42.2% (28.7%) | −2.0 | −42.9% |
0.9 | −3.4 (3.2) | −58.3% (29.1%) | −3.0 | −60.0% |
Delta Mean SD latitude | ||||
0.1 | 0.0 (0.0) | 4.0% (30.6%) | 0.0 | 1.1% |
0.2 | 0.0 (0.0) | 10.5% (52.8%) | 0.0 | 2.9% |
0.3 | 0.0 (0.0) | 18.4% (68.0%) | 0.0 | 5.6% |
0.4 | 0.0 (0.0) | 30.7% (96.6%) | 0.0 | 9.4% |
0.5 | 0.0 (0.0) | 52.2% (139.2%) | 0.0 | 16.5% |
0.6 | 0.0 (0.0) | 80.9% (178.7%) | 0.0 | 28.4% |
0.7 | 0.0 (0.0) | 135.1% (268.4%) | 0.0 | 49.8% |
0.8 | 0.0 (0.0) | 239.6% (425.0%) | 0.0 | 95.6% |
0.9 | 0.0 (0.0) | 536.8% (938.5%) | 0.0 | 186.6% |
Delta Mean SD longitude | ||||
0.1 | −0.0 (0.0) | 4.2% (30.7%) | 0.0 | 1.2% |
0.2 | −0.0 (0.0) | 9.4% (44.1%) | 0.0 | 3.0% |
0.3 | −0.0 (0.0) | 17.1% (57.7%) | 0.0 | 5.8% |
0.4 | −0.0 (0.0) | 29.7% (92.8%) | 0.0 | 10.1% |
0.5 | 0.0 (0.0) | 49.1% (118.0%) | 0.0 | 18.5% |
0.6 | 0.0 (0.0) | 79.9% (171.8%) | 0.0 | 31.1% |
0.7 | 0.0 (0.0) | 138.0% (273.2%) | 0.0 | 54.2% |
0.8 | 0.0 (0.0) | 238.1% (420.9%) | 0.0 | 96.1% |
0.9 | 0.0 (0.0) | 537.4% (970.8%) | 0.0 | 180.2% |
Delta Number of locations | ||||
0.1 | −775.4 (610.8) | −9.6% (4.0%) | −654.0 | −9.2% |
0.2 | −1.6K (1.2K) | −19.5% (5.5%) | −1.3K | −19.3% |
0.3 | −2.4K (1.7K) | −29.3% (6.2%) | −1.9K | −29.3% |
0.4 | −3.2K (2.3K) | −39.1% (6.6%) | −2.6K | −39.2% |
0.5 | −4.0K (2.9K) | −49.0% (6.8%) | −3.3K | −49.3% |
0.6 | −4.7K (3.4K) | −58.2% (6.7%) | −3.9K | −58.7% |
0.7 | −5.5K (4.0K) | −67.7% (6.5%) | −4.5K | −68.4% |
0.8 | −6.3K (4.5K) | −77.0% (6.2%) | −5.1K | −78.0% |
0.9 | −7.0K (5.1K) | −85.5% (6.2%) | −5.7K | −87.1% |
Delta Radius of Gyration | ||||
0.1 | 4.9 (12.6) | 5.6% (15.1%) | 1.2 | 5.0% |
0.2 | 10.7 (25.4) | 12.2% (24.5%) | 2.8 | 11.3% |
0.3 | 17.8 (43.2) | 22.1% (87.7%) | 4.7 | 18.9% |
0.4 | 26.4 (62.7) | 32.2% (97.6%) | 7.1 | 28.1% |
0.5 | 36.7 (87.2) | 46.8% (176.8%) | 10.0 | 40.2% |
0.6 | 50.3 (120.2) | 66.3% (208.8%) | 13.7 | 55.3% |
0.7 | 69.5 (165.2) | 104.3% (427.3%) | 19.1 | 77.5% |
0.8 | 97.0 (242.9) | 154.3% (549.5%) | 26.9 | 111.4% |
0.9 | 146.2 (366.0) | 267.3% (1,038.3%) | 39.8 | 173.1% |
Full table MCAR
Long gap simulation study | ||||
---|---|---|---|---|
mean (sd) | median | |||
abs | perc | abs | perc | |
Delta Total Distance (m) | ||||
0.1 | −3.5K (11.5K) | −4.7% (12.1%) | −9.6 | −0.0% |
0.2 | −8.4K (20.0K) | −11.8% (21.7%) | −412.6 | −0.8% |
0.3 | −15.1K (30.2K) | −20.2% (28.4%) | −2.1K | −7.1% |
0.4 | −23.4K (41.5K) | −29.9% (32.8%) | −5.2K | −15.8% |
0.5 | −32.5K (53.2K) | −40.1% (35.8%) | −10.8K | −32.8% |
0.6 | −41.5K (62.7K) | −50.9% (37.0%) | −14.1K | −49.6% |
0.7 | −52.3K (73.5K) | −62.5% (35.4%) | −24.5K | −65.5% |
0.8 | −64.2K (84.4K) | −75.2% (30.9%) | −31.4K | −93.0% |
0.9 | −76.0K (94.7K) | −87.7% (22.3%) | −43.0K | −99.7% |
Delta Moved Distance (m) | ||||
0.1 | −3.4K (11.3K) | −5.7% (14.5%) | 0.0 | 0.0% |
0.2 | −8.3K (19.6K) | −14.2% (25.1%) | 0.0 | 0.0% |
0.3 | −15.2K (29.6K) | −24.4% (31.4%) | −2.3K | −8.7% |
0.4 | −23.5K (40.8K) | −34.4% (34.3%) | −6.5K | −25.3% |
0.5 | −32.3K (52.3K) | −44.8% (36.1%) | −11.0K | −46.0% |
0.6 | −40.5K (61.7K) | −55.1% (36.3%) | −17.0K | −52.3% |
0.7 | −50.5K (72.5K) | −65.8% (34.2%) | −24.3K | −68.3% |
0.8 | −61.6K (83.5K) | −77.8% (29.2%) | −31.9K | −97.3% |
0.9 | −72.3K (94.0K) | −89.0% (21.3%) | −38.0K | −100.0% |
Delta Number of Stops | ||||
0.1 | −0.3 (0.8) | −4.9% (12.0%) | 0.0 | 0.0% |
0.2 | −0.8 (1.3) | −11.2% (18.7%) | 0.0 | 0.0% |
0.3 | −1.2 (1.7) | −18.5% (22.7%) | 0.0 | 0.0% |
0.4 | −1.7 (2.0) | −26.3% (25.8%) | −1.0 | −25.0% |
0.5 | −2.2 (2.2) | −34.3% (27.6%) | −2.0 | −33.3% |
0.6 | −2.8 (2.5) | −42.9% (28.2%) | −2.0 | −50.0% |
0.7 | −3.4 (2.6) | −52.2% (26.5%) | −3.0 | −54.5% |
0.8 | −4.0 (2.7) | −61.6% (23.1%) | −4.0 | −66.7% |
0.9 | −4.6 (2.8) | −70.8% (18.1%) | −4.0 | −75.0% |
Delta Number of Moves | ||||
0.1 | −0.4 (0.8) | −7.0% (15.6%) | 0.0 | 0.0% |
0.2 | −0.8 (1.3) | −15.9% (24.1%) | 0.0 | 0.0% |
0.3 | −1.3 (1.7) | −25.6% (29.2%) | −1.0 | −20.0% |
0.4 | −1.8 (2.0) | −35.3% (31.9%) | −1.0 | −33.3% |
0.5 | −2.4 (2.2) | −45.2% (33.3%) | −2.0 | −44.4% |
0.6 | −2.9 (2.4) | −55.6% (33.2%) | −2.0 | −50.0% |
0.7 | −3.4 (2.5) | −65.9% (31.0%) | −3.0 | −66.7% |
0.8 | −4.0 (2.6) | −77.0% (26.4%) | −3.5 | −82.6% |
0.9 | −4.6 (2.7) | −87.6% (20.2%) | −4.0 | −100.0% |
Delta Total Stop Time (Minutes) | ||||
0.1 | 19.1 (47.1) | −1.4% (3.5%) | 0.0 | 0.0% |
0.2 | 59.9 (113.8) | −4.5% (8.6%) | 0.0 | 0.0% |
0.3 | 122.1 (190.3) | −9.3% (14.3%) | 0.0 | 0.0% |
0.4 | 233.0 (274.5) | −17.6% (20.6%) | 0.0 | 0.0% |
0.5 | 348.6 (347.4) | −26.2% (26.0%) | 583.3 | −46.6% |
0.6 | 492.7 (406.6) | −37.0% (30.4%) | 757.5 | −58.6% |
0.7 | 691.4 (431.6) | −51.6% (32.1%) | 926.0 | −69.3% |
0.8 | 874.9 (437.2) | −65.1% (32.3%) | 1.1K | −79.5% |
0.9 | 1.1K (378.1) | −81.1% (27.7%) | 1.2K | −89.7% |
Delta Total Move Time (Minutes) | ||||
0.1 | −11.4 (39.1) | 14.5% (68.1%) | 0.0 | 0.0% |
0.2 | −30.5 (91.7) | 28.9% (119.0%) | 0.0 | 0.0% |
0.3 | −35.5 (133.7) | 21.7% (142.0%) | 0.0 | 0.0% |
0.4 | −30.2 (165.1) | 11.8% (172.7%) | 8.0 | −22.6% |
0.5 | −26.7 (199.2) | 3.9% (200.5%) | 16.6 | −44.1% |
0.6 | −21.2 (229.1) | −1.9% (235.5%) | 21.5 | −52.5% |
0.7 | 6.4 (221.0) | −23.8% (235.1%) | 33.3 | −70.8% |
0.8 | 36.8 (192.5) | −51.4% (205.9%) | 49.4 | −92.2% |
0.9 | 63.1 (151.7) | −79.7% (109.3%) | 63.0 | −100.0% |
Delta Mean SD latitude | ||||
0.1 | 0.0 (0.0) | 47.0% (226.6%) | 0.0 | 0.5% |
0.2 | 0.0 (0.0) | 103.0% (393.7%) | 0.0 | 1.1% |
0.3 | 0.0 (0.0) | 244.0% (967.0%) | 0.0 | 2.1% |
0.4 | 0.0 (0.0) | 421.7% (1,873.1%) | 0.0 | 2.8% |
0.5 | 0.0 (0.0) | 469.8% (2,466.5%) | 0.0 | 1.9% |
0.6 | 0.0 (0.0) | 312.9% (1,511.7%) | −0.0 | −1.6% |
0.7 | 0.0 (0.0) | 207.1% (1,024.1%) | −0.0 | −17.5% |
0.8 | 0.0 (0.0) | 222.8% (1,653.4%) | −0.0 | −38.9% |
0.9 | 0.0 (0.0) | 123.4% (1,485.1%) | −0.0 | −60.5% |
Delta Mean SD longitude | ||||
0.1 | 0.0 (0.0) | 36.1% (205.8%) | 0.0 | 0.4% |
0.2 | 0.0 (0.0) | 183.1% (1,026.3%) | 0.0 | 0.9% |
0.3 | 0.0 (0.0) | 339.6% (1,531.8%) | 0.0 | 1.6% |
0.4 | 0.0 (0.0) | 464.8% (1,871.2%) | 0.0 | 1.9% |
0.5 | 0.0 (0.0) | 521.0% (2,129.1%) | 0.0 | 0.5% |
0.6 | 0.0 (0.0) | 458.6% (2,078.8%) | −0.0 | −3.2% |
0.7 | 0.0 (0.0) | 403.0% (2,124.3%) | −0.0 | −22.3% |
0.8 | 0.0 (0.0) | 394.0% (2,626.2%) | −0.0 | −42.4% |
0.9 | 0.0 (0.0) | 194.7% (2,075.0%) | −0.0 | −59.2% |
Delta Number of locations | ||||
0.1 | −760.9 (1.1K) | −9.4% (10.2%) | −583.0 | −5.6% |
0.2 | −1.5K (1.8K) | −18.3% (15.2%) | −1.2K | −15.0% |
0.3 | −2.2K (2.3K) | −26.8% (18.3%) | −1.7K | −25.5% |
0.4 | −2.9K (2.8K) | −35.8% (20.2%) | −2.4K | −35.3% |
0.5 | −3.7K (3.3K) | −45.0% (21.5%) | −2.9K | −45.9% |
0.6 | −4.4K (3.7K) | −54.5% (22.4%) | −3.6K | −54.7% |
0.7 | −5.2K (4.2K) | −64.8% (21.4%) | −4.2K | −67.4% |
0.8 | −6.2K (4.8K) | −76.4% (18.5%) | −5.0K | −81.9% |
0.9 | −7.1K (5.3K) | −88.1% (13.1%) | −6.2K | −92.9% |
Delta Radius of Gyration | ||||
0.1 | 9.8 (47.7) | 8.8% (45.4%) | 1.2 | 2.5% |
0.2 | 17.9 (95.2) | 13.8% (48.9%) | 2.4 | 5.3% |
0.3 | 24.6 (137.6) | 27.8% (123.2%) | 3.4 | 8.8% |
0.4 | 28.7 (153.9) | 41.2% (199.3%) | 4.2 | 14.4% |
0.5 | 26.9 (184.1) | 51.0% (203.5%) | 3.1 | 14.6% |
0.6 | 18.4 (226.8) | 52.9% (224.6%) | 1.3 | 9.2% |
0.7 | 4.1 (264.0) | 67.4% (318.1%) | −0.2 | −12.7% |
0.8 | −23.0 (288.7) | 67.4% (456.0%) | −7.8 | −65.0% |
0.9 | −62.8 (312.3) | 52.3% (637.1%) | −25.9 | −96.2% |