Tom Pinckney has written a great post on GigaOm about the new kinds of processing companies have to do to create a highly personalized experience for users.
My comment is still in moderation, so in the spirit of the real-time Web, here it is:
Tom, great post. The problem is spot on. The solution makes one key assumption — that you can’t approach the problem with pre-computation.
That’s true for a service like Hunch which sees a tiny amount of data about me, especially when I first show up. Given how little you know about me, you have two choices: either you pre-compute info in an impossibly large space, which is impractical, just as you describe, or you do the type of real-time processing which is much more effective. So far so good.
But that’s not necessarily the best way to approach the problem from the standpoint of someone who had a lot of data about me, e.g., Google or Facebook or even Amazon. The set of Internet-connected humans is small from a computational standpoint and the meta-data trail we leave is growing at a much slower rate than compute/storage. Pre-computing starting with 100 dimensions doesn’t work. Pre-computing starting with a few billion humans works really well, if you have a lot of data on the humans.
This is one of the fundamental advantages Amazon, FB, GOOG and others have compared to point services such as Hunch. It’s not a fair fight. So you have to innovate like crazy to compensate. Rock on!
People who do data analysis and machine learning have learned one thing through experience (can someone claim it as their Law?): to solve a complex problem with little data you need fancy technology but you may be able to solve the same problem with much simpler technologies if you have a lot of data.
So, the big guys have a fundamental advantage. However, there are ways to even out the playing field. It starts with users owning their data and giving it to a trusted third party that can do the same type of pre-computation the big guys can do. Then you put the right access control mechanisms allowing users to share this info with third party services like Hunch.
Facebook already does something like this through F8. Facebook apps get access to a lot of valuable information not available through other means. But Facebook doesn’t share any really interesting pre-computed analytics, at least not right now. They are the smartest of the big guy bunch so far. With the exception of the the string of privacy faux pas, I’ve been consistently impressed with their strategy.