Methodological Aspects on Model Validation: Out-of-Time Validation

Error in predictive models

What's this about?

Once you've built a predictive model, how sure we are it captured general patterns and not just the data it has seen (overfitting)?.

Will it perform well when it is on production / running live? What is the expected error?

What sort of data?

If it's generated over time and -let's say- every day you have new cases like "page visits on a website", or "new patients arriving at a medical center", one strong validation is the Out-Of-Time approach.

Out-Of-Time Validation Example

How to?

Imagine you are building the model on Jan-01, then to build the model you use all the data before Oct-31. Between these two dates, there are 2 months.

When predicting a binary/two class variable (or multi-class), it's quite straightforward: with the model we've built -with data <= Oct-31- we score the data on that exact day, and then we measure how the users/patients/persons/cases evolved during those two months.

Since the output of a binary model should be a number indicating the likelihood for each case to belong to a particular class (Scoring Data chapter), you test what the model "said" on Oct-31 against what it actually happened on "Jan-01".

So the validation workflow looks something like...

Model performance workflow

Enlarge image.

Using Gain and Lift Analysis

Gain and lift analysis

This analysis explained in the other chapter of the book can be used following the out-of-time validation.

Keeping only with those cases that were negative on Oct-31, we get the score returned by the model on that date, and the target variable is the value that those cases had on Jan-1.

How about a numerical target variable?

Now the common sense and business need is more present. A numerical outcome can take any value, it can increase or decrease through time, so we may have to consider these two scenarios to help us thinking what we consider success.

Example scenario: You measure some app usage, the standard thing is as the days pass, the users use it more.

Case A: Convert the numerical target into categorical?

For an app user, she/he can be more active through time measured in page views, so to do an out of time validation we would predict if the user visit more than the average, or more than the top 10%, or twice what he spent up to the model's creation day, etc.

Examples of this case can be:

  • Binary: "yes/no" above average.
  • Multi-label: "low increase"/"mid increase"/"high increase"

Case B: Leave it numerical (linear regression)?


  • Predicting the concentration of a certain substance in the blood.
  • Predicting page visits.
  • Time series analysis.

We also have in these cases the difference between: "what was expected" vs. "what it is".

This difference can take any number. This is the error or residuals.

If the model is good, this error should be white noise [1]. It follows a normal curve when mainly there are some logical properties:

  • The error should be around 0 -the model must tend its error to 0-.
  • The standard deviation from this error must be finite -to avoid unpredictable outliers-.
  • There has to be no correlation between the errors.
  • Normal distribution: expect the majority of errors around 0, having the biggest ones in a smaller proportion as the error increases -likelihood of finding bigger errors decreases exponentially-.

Error curve following a normal distribution

Final thoughts

  • Out-of-Time Validation is a powerful validation tool to simulate the running of the model on production with data that may not need to depend on sampling.

  • The error analysis is a big chapter in data science. Time to go to next chapter which will try to cover key-concepts on this: Knowing the error


results matching ""

    No results matching ""