Over Memorial Day weekend I wanted to play with CrunchBase data. I wrote a quick bash script that pulled data from CrunchBase and put it in MongoDB, one of the new databases from the NoSQL movement. In the process, I noticed I was programming file operations. It was a strange feeling. The last time I wrote code that manipulated files was a decade ago. For other projects, the data has been either in a database or a web service somewhere. Why would I put anything in a file? For that matter, why would I want to deal with hardware constructs such as network ports? For an application developer, as opposed to an infrastructure developer, all these vestiges of decades-old operating system architecture add little value. In fact, they cause deployment and operational headaches—lots of them. If I had taken almost any other approach to the problem using the tools I’m familiar with I would have performed HTTP operations against the REST-based web services interface for ChrunchBase and then used HTTP to send the data to MongoDB. My code would have never operated against a file or any other OS-level construct directly.
This experience got me thinking about the evolution of application development and that led to a guest post on Om Malik’s GigaOm on the migration of cloud computing from infrastructure-as-a-service (IaaS) to platform-as-a-service (PaaS).
Most assume that server virtualization as we know it today is a fundamental enabler of the cloud, but it is only a crutch we need until cloud-based application platforms mature to the point where applications are built and deployed without any reference to current notions of servers and operating systems.
As I mention in the post, I’m quite impressed with VMWare’s willingness to push forward in this direction. Listening to Paul Maritz, CEO of VMWare, speak at Structure, it’s clear he’s aiming very far and has the leadership potential to get there. More than a decade ago, I used to listen very carefully to what he said because he owned several large groups inside Microsoft, some of which loved my first startup (Allaire) because we pulled through many thousands of Windows & SQL servers and some of which hated us because we had the best Web development environment on Windows. He’s back on the list of execs I’ll follow carefully.
This is a big opportunity for Amazon to go up the stack at the right time. It’s good from an economics standpoint as it can increase margins in two ways: (a) improves efficiency and (b) switches pricing to more value-based application-related metrics. AWS has gone up the stack into data storage, management and analytics. I doubt they’d miss the opportunity to become a meaningful PaaS provider at the right time. Breadth of platform support and platform expertise will be interesting challenges to resolve.
The other interesting trend to watch for here is that a reduction in the capabilities of the server virtualization tier increases the value of intelligent networks, one reason why Cisco smartly grabbed @lewtucker as CTO of their emerging cloud group and has security gurus like @Beaker on board.
The comments have raised several questions:
- Is security harder with PaaS? In the short run, yes, but only because we have less experience with shared hosting on locked down PaaS platforms. Google App Engine, Heroku and others are leading the way here. Werner Vogels said that he trusts hypervisors to provide isolation. It will take a while for big cloud providers such as AWS to equally trust PaaS implementations. In fact, it’s likely they’ll build their own as Google has. Cisco badly wants to help, too.
- How does IT rebill in enterprises? Having a simpler hypervisor or no hypervisor at all doesn’t mean you can’t collect HW usage metrics and decide how to apportion them to simultaneous users of the hardware. Even better, you can measure and rebill based on other, more business-value-oriented metrics which could give the IT organization some budgetary slack. It would certainly give them more deployment flexibility both inside and outside of their data centers.
Soon we will be able to throw away the server virtualization crutch and, like in that memorable moment from Forrest Gump, we will be able to run leaner and more scalable applications in the cloud on next-generation platforms-as-a-service. For the time being, my call to action is for application developers to stop writing code that directly touches any hardware or operating system objects and try the current generation of platforms-as-a-service.
Developers out there building applications, give me a shout about when was the last time you programmed against a file.
Let me know what you think in the comments or on Twitter @simeons.
Pingback: Is PaaS “The SOA Reloaded” ? « My missives
Programmed against a file? Just under 36 hours ago at Yahoo’s Open Hack Day in Bangalore, importing data from a CSV file into MongoDB. The CSV itself was a “fake”, cut-and-pasted from a MySQL “SELECT *” statement and edited in TextMate where I did a regular expression find-and-replace to compress spaces down to one, so that I was left with ‘|’ separated values that I could treat as CSV.
I have to deal with files all the time.
Interesting. And you are dealing with applications as opposed to infrastructure?
Who writes the data into the files? Is there an easier way to get to the data using an API?
Dealing with files is almost always a one-off job. I see what you mean — I can’t think of many modern scenarios where physical files have to be passed around with data.
Off the top of my head:
1. Export to Excel (a recurring request from corporate users; so far haven’t had to *import* from Excel. That would be a nightmare)
2. Print shops where files to be printed are placed in shared folders, accessed over the local network. These guys stand-in for anyone dealing with resource files too large to be moved around much.
3. A job I’m currently speccing out, where a subset of many small resource files have to be assembled into a customised package for each recipient.
Pingback: Top 10 Boston High Tech Blogs « High Tech in the Hub