Methodological Aspects on Model Validation

Out-of-Time Validation

Error in predictive models

What's this about?

Once you've built a predictive model, how sure you are it captured general patterns, and not just the data it has seen (overfitting)?.

Will it perform well when it will be on production / running live? What is the expected error?

What sort of data?

If it's generated along time and -let's say- every day you have new cases like "page visits on a website", or "new patients arriving to a medical center", one strong validation is the Out-Of-Time approach.

Out-Of-Time Validation Example

How to?

Imagine you are building the model on Jan-01, then to build the model you use all the data before Oct-31. Between these two dates, there are 2 months.

When predicting a binary/two class variable (or multi-class), it's quite straight-forward: with the model we've built -with data <= Oct-31- we score the data on that exact day, and then we measure how the users/patients/persons/cases evolved during those two months.

Since the output of a binary model should be a number indicating the likelihood for each case to belong to a certain class (Scoring Data chapter), you test what the model "said" on Oct-31 against what it really happened on "Jan-01".

So the validation workflow looks something like...

Model performance workflow

Enlarge image.

Using Gain and Lift Analysis

Gain and lift analysis

This analysis explained in the other chapter of the book can be used following the out-of-time validation.

Keeping only with those cases that were negative on Oct-31, we get the score returned by the model on that date, and the target variable is the value that those cases actually had on Jan-1.

How about a numerical target variable?

Now the common sense and/or business need is more present. A numerical outcome can take any value, it can increase or decrease through time, so we may have to consider these 2 scenarios to help us thinking what we consider success.

Example scenario: You are measuring certain app usage, the normal thing is as the days pass, the users use it more.

Case A: Convert the numerical target into categorical?

For an app user, she/he can be more active through time-measured in page views, so to do an out of time validation we would predict if the user visit more than the average, or more than the top 10%, or twice what he spent up to the model's creation day, etc.

Examples of this case can be:

  • Binary: "yes/no" above average.
  • Multi-label: "low increase"/"mid increase"/"high increase"

Case B: Leave it numerical (linear regression)?


  • Predicting the concentration of certain substance in blood.
  • Predicting page visits.
  • Time series analysis.

We also have in these cases the difference between: "what was expected" vs "what it is".

This difference can take any number. This is the error, or residuals.

If the model is good, this error should be white noise [1]. It follows a normal curve when mainly there are some logical properties:

  • The error should be around 0 -the model must tend its error to 0-.
  • The standard deviation from this error must be finite -to avoid unpredictable outliers-.
  • There has to be no correlation between the errors.
  • Normal distribution: expect the majority of errors around 0, having the biggest ones in a smaller proportion as the error increases -likelihood of finding bigger errors decreases exponentially-.

Error curve following a normal distribution

Final thoughts

  • Out-of-Time Validation is an strong validation tool to simulate the running of the model on production with data that may not need to depend on sampling.

  • The error analysis is a big chapter in data science. Time to go to next chapter which will try to cover key-concepts on this: Knowing the error


results matching ""

    No results matching ""