DTWBI

Dynamic Time Warping Based Imputation has been proposed in the literature independently at least two different authors.(1, 2) Its primary mechanism is imputation of missing data that occurs within a time series where there are other time series available that can function as candidate donors for the time period which is missing.

Anything that is periodic in nature can make use of the periodicity by aligning the non-missing portion of the data to a suitable portion of a reference data set and then using the contents of the reference data set during the missing point to fill in the gap. Although there are different ways to accomplish this goal, the one illustrated here is meant to align with the general concept of multiple imputation through chained equations, using Predictive Mean Matching as a guiding tool.

We will start with the time series with the least amount of missing data – this becomes our query. Some portion of the non-missing data in the query is selected to be matched against all other time series containing no missing data. The best-fitting subsequence alignment is found between the query and each of these reference sequences. The Euclidean distance between the aligned query and reference is captured for each reference set. This is used to create a probability of selection as an imputation candidate, where the closer matches are more likely to be chosen. Next, a number of new data sets are created in which the missing data in the query has been filled by the data from the chosen reference candidate. We then move to the next time series with the least amount of missing data and start again, now with one additional reference sequence – the initial query.

We proceed through the data until all missing data has been filled, leaving us with a number of complete data sets (depending on how many imputations were chosen).

  1. https://doi.org/10.1016/j.patrec.2017.08.019
  2. https://www.sciencedirect.com/science/article/pii/S0957417421017243

The parameters

The simulation is parameterized by a number of different features that may impact the way in which candidates time windows are chosen for imputation.

N imputations

The first is the number of imputations. Because we select candidates on the basis of probability within each imputation window, the impact of the number of imputations is in increasing the variation. For the moment, we will delay on this point because the number of imputations can be controlled after the fact. The impact on the imputation procedure itself is to increase it in length. Currently, the imputations don’t run in parallel, although they could, so running 10 imputations takes twice as long as running 5 imputations.

Time window

The time window is what controls how specific to a given time point we must be in order to impute it. If we select a time window of 1 hour, and we have a gap between 14.00 and 16.00, then we can select candidates for imputing the gap for any two-hour time periods starting between 13.00 and 15.00 (and thus ending between 15.00 and 17.00.) If we had a two-hour time window, we could select gap imputation candidate data that started between 12.00 and 16.00.

The tighter the time window, the more we restrict our matches to conform to time-based routines. The looser the time window, the more we allow for matches that favor similarity in behavior without requiring the time element. A time window of 12 hours would allow for unrestricted pattern matching in the time series (12 hours before, and 12 hours afterward).

Time window is the only parameter that also has an impact on TWI – the non-pattern-matching based approach, and one of our comparisons. Instead of selecting probability-based matches from among those with matching time slots, each imputation selects them completely at random.

This simulation applied time windows of 1, 2 and 3 hours.

Matching buffer

In addition to the time window, we also vary the number of elements against which we match. This is the buffer following the gap in the case of missingness in the beginning of a time series, preceding the gap in the case of missingness at the end of a time series, and on both sides in the case of missingness that occurs in the middle of a time series.

A matching buffer is specified in terms of the number of elements. If we specify a matching buffer of 4, and our time is discretized into units of 15 minutes, then we look at a buffer equivalent to an hour. In theory, a larger buffer prefers longer patterns (like a person’s repeated work commute), and a shorter buffer prefers shorter patterns (e.g. someone is moving a long distance before, but not afterward).

Matching buffers are a preference – we select the larger of the matching buffer and the available elements. Consider missingness occurring in the middle of a time series of 10 elements. If elements 7 and 8 are missing, we may have at most a matching buffer of 2 following the gap, and at most a matching buffer of 6 preceding the gap.

The matching buffer has no impact on TWI because there is no matching.

This simulation applied matching buffers of 4, 8, 16, 24 and 32 elements, which is the equivalent of 1, 2, 4, 6 and 8 hours.

Candidate specificity

Candidate specificity describes the way in which the probability of candidate selection is generated. Because candidates are chosen independently in each imputation chain, we rely on a probabilistic mechanism for their selection in order to allow them to vary. If we have a low candidate specificity, then we have only a mild preference for better pattern matches. If we have a high candidate specificity, then we will be much less likely to select poor matches, but we also will increase the selection potential of a few best matches.

If we have zero candidate specificity, we are in the TWI case. We select at random from among the candidates with matching time periods and we lose the potential benefit of the subsequence matching.

On the other hand, if our candidate specificity is too high, all imputations may select the single best donor period and we will lose any variation.

Here’s an example of the mechanism of candidate selection:

query <- 1:10
ref1 <- 2:11
ref2 <- 0:9
ref3 <- 3:12
ref4 <- 11:20 

library(dtw)

r1 <- dtw(query, ref1)$normalizedDistance
r2 <- dtw(query, ref2)$normalizedDistance
r3 <- dtw(query, ref3)$normalizedDistance
r4 <- dtw(query, ref4)$normalizedDistance

dists <- c(r1, r2, r3, r4)
dists
## [1] 0.10 0.10 0.30 5.45
prob.powers <- c(0, 1, 2, 3, 4)
probFromDist_pretty <- function(prob.power, dists){
  probs <- (1/(dists^prob.power))/sum(1/(dists^prob.power))
  paste0("Prob.power ", prob.power, ": ", 
         paste0(round(probs, 2), collapse = ", "))
}
sapply(prob.powers, probFromDist_pretty, dists = dists)
## [1] "Prob.power 0: 0.25, 0.25, 0.25, 0.25"
## [2] "Prob.power 1: 0.43, 0.43, 0.14, 0.01"
## [3] "Prob.power 2: 0.47, 0.47, 0.05, 0"   
## [4] "Prob.power 3: 0.49, 0.49, 0.02, 0"   
## [5] "Prob.power 4: 0.5, 0.5, 0.01, 0"

This simulation applies candidate specificity powers of 2, 3, and 4 (with 0 for comparison).

DTW Bias

We can also judge the extent to which finding Dynamic Time Warping matches improves our ability to impute the missingness. We subtract the absolute Dynamic Time Warping bias from the absolute time-window-only bias. A positive number means that DTWBI outperforms TWBI.

We can use this to help guide parameter selection. Here’s an overview, where we can see the impact of the time windowing, the selection buffer and candidate specificity. Notably, these relationships differ based on where the missingness is located.

DTW Bias Middle

DTWBI outperforms TWI when the missingness occurs in the middle. If we look only at the difference in biases, the largest impact comes from a relatively short buffer window. We would expect higher variability for lower buffering. In data sets in which individuals have recorded multiple similar days against which to match, we would expect to find a tradeoff in which longer windows offered higher precision but more bias.

Here we are matching primarily generic trends where the immediacy of what precedes and follows the missingness is the strongest predictor of what occurs during it. Shorter matching buffers produce better results in this situation.