Redesigning the model
TimesFM is a patched decoder that tokens as an input token at every 32 consecutive time points (patches), and applies a trans stack over a sequence of input tokens to generate an output token. Next, apply a shared multilayer perceptron (MLP) to convert each output token into a time series of 128 points in time.
To create a TimesFM-ICF (fine tuning within a context), start with the base TimesFM model and continue the pre-training in a new context: prediction history and all context examples. The first step is to ensure that the model does not confuse or confuse the prediction history and examples within the context. Imagine providing a model with a list of numbers that represent several different things. Perhaps imagine the sales of sunglasses from one store, and the sales of umbrellas from another store. Just merging all these numbers together can cause confusion and we consider the model to be a continuous stream of data. For example, if sales in the first store were rising and sales in the second store were falling, the model could be misread as a single up-down pattern rather than two separate, simple trends.
To fix this, each set of numbers is followed by a special learning “common separator token”, such as a digital “stop sign” or “new paragraph” symbol. With these separators in place, as soon as the model notes the separator token in the example we've seen earlier, it won't be confused with the data we're currently trying to predict. This allows theoretically the model to learn from patterns from these past examples and apply that knowledge to current predictions. For example, the model can learn that “all store sales have shown a consistent trend in direction recently, so we need to predict an upward trend in sunscreen sales in new stores.”
