Long-Ji Lin delivered a very interesting presentation on how he has worked on addressing the key issues facing predictive models for on-line ads placements. These challenges are wide ranging and significant:
- latency
- scalability under cost control
- curse of dimensionality – very large number of input variables, very few positive cases
- actual efficiency – with very few positive cases to work with
- environment challenges – drifts in taste, impact of unpredictable events
Interestingly, these are very similar to the challenges faced by fraud detection systems that I am more familiar with.
Long-Ji mentioned the fact that it’s the actual choice of the algorithm for the creation of these models is singificantly less important than handling the challenges above.
The solutions mentioned are also similar to those in the fraud industry:
- Downsample the negative data (Heuristic 1 positive case for 5 to 100 negative cases)
- Use near-conversion as positive data – such as putting items in the shopping cart
- Use pooled models
- Inject domain knowledge you may have into the models
To address the curse of dimentionality (50K features per campaign/model), the solutions are also similar
- pruning (tree pruning, connection pruning, L1/L2 norm regularlization)
- but even better, spend time/energy finding the good features!
The scale is Big Data scale
- 20 b ad requests
- 100 M ad views
- 1 TB data
- 1000 advertisers
- 20 trillion decisions
The models are built frequently. One key point is that these models frequently need to be tested in real situations rather than on historical situations in order to really make an assessment of quality.
The technologies used by the systems Long-Ji works with in order to build and test the models combine the usual Big Data suspects:
- Hadoop
- Hive
- HBase
Learn more about Decision management and Sparkling Logic’s SMARTS™ Data-Powered Decision Manager