Machine learning is everywhere, but the “how” is not well understood by the masses. Andrew Ng, professor at Stanford university, visits the conference for an introduction.
When we look at a picture, our brain automatically interprets the information and we recognize it. For a computer, the work is not that simple.
Looking at pixels individually could be tedious to say the least. The size of the problem and the variability of subjects make it almost impossible to find interesting correlation at that level. That being said, if we break the problem by applying algorithms that identify smaller components like wheels or handlebars, then the correlation becomes much simpler. Presence of both wheels and handlebar is a decent predictor of a motorcycle — although it could also be a wheelbarrow or dumbbells I suppose. But confusion with trees and pasta would be limited I guess.
This technique is called feature extraction. You look for features you can estimate and use those features to teach the system to detect the target subject.
Hierarchical sparse coding allows to layer those levels of abstraction to learn how to detect basic trends that, assembled together can allow the detection of “bigger” pieces, like an eye or an ear, then aggregated together could allow face detection for example.
This is not specific to image detection. Dr. Ng explained how it could be used for Audio or video.
Analytics for business transactions use similar technique to facilitate the creation of predictive models, although the underlying algorithms would be different — the algorithm presented by Andrew is actually more adapted to perceptual data. I talked about some of those principles in Rules Fest 2008 actually in my introduction to Predictive Analytics but we did not blog the show back then; I do not have a link to offer… Let me elaborate a bit on the lingo the modelers use in our space. In Risk Management, features are also called Variables, Calculations or Characteristics. Looking at tons of transactions you may have a hard time detecting patterns of fraud for example. But once you aggregate the data to look at the “feature” that is equal to the number of transactions in the past 3 months for travel expenses or the “feature” that is equal to the time on books in months, you have the opportunity to detect interesting correlation. Those features are then used to train the models — neural nets, linear models, etc.
Andrew debunks typical criticism:
- Is it better to encode prior knowledge about structure of image (audio, etc.). Linguists argued similarly a couple of decades ago but Google’s success on translation automation speaks by itself (no pun intended)
- Unsupervised feature learning cannot currently do X… The list is long but over time many of those barriers fall one by one with technology advances
The talk is heavily tainted on the AI-as-data side rather than AI-as-knowledge side. It is good to balance the opposite view offered by Paul Haley on Monday. As you know, my vision is actually hybrid – I believe that data and expert should share the spotlight, each with its characteristics, and mixing both can deliver superior value in some other cases. Carlos’s 101 session on Analytics for Rules Writers is a great resource to get started too. i will share the link as the materials are posted.
Learn more about Decision management and Sparkling Logic’s SMARTS™ Data-Powered Decision Manager
See our recap on Alan Moore’s talk