Angel investing strategies

My friend Jerry Neumann wrote a great post on angel investing strategies, dissecting truth and myth about different betting strategies and sharing his own approach.

The question of luck came up and a commenter linked to my work on data-driven patterns of successful angel investing with the subtext that being data driven implies index investing. That’s certainly not what I believe or recommend.

The goal of my Monte Carlo analysis was to shine a light on the main flaw I’ve seen in casual angel investing, which is the angel death spiral:

  1. Make a few relatively random investments
  2. Lose money
  3. Become disillusioned
  4. Give up angel investing
  5. Tell all your friends angel investing is terrible

Well, you can’t expect a quick win out of a highly skewed distribution (startup exits are a very skewed distribution). That’s just math and math is rather unemotional about these things.

You can get out of the angel death spiral in one of two ways. You can take the exit distribution for what it is. In that case, you need many more shots on goal (dozens of investments) to ensure a much better outcome. Alternatively, you can try to pick your investment opportunities from a different, better distribution. That’s what I like to do and this is what Jerry is advocating.

The main influencer of return for angel investors is the quality of deal flow that you can win. Why? Because this changes the shape of your personal exit distribution and, in most cases not involving unicorn hunting, improves your outcomes at any portfolio size.

As an investor, you sell cash + you and buy equity. To see better deals and win them you need to increase the value of “you.” After all, anyone’s cash is just as good as everyone else’s. The easiest way to do this is via deep, real, current expertise and relationships that are critical to the success of the companies you want to invest in, backed by a reputation that you are a helpful and easy to work with angel. One way to maximize the chance of this being true is to follow some of Jerry’s advice:

  • Invest in markets that you know
  • Make multiple investments in such markets
  • Help your companies

There is a bootstrap problem, however, when new markets are concerned. How do you get to know them? Well, one way to do it is to make a number of investments in a new space. In this case, your investments have dual value: in addition to the financial return expectations (which should be reduced) you have the benefit of learning. Yes, it can be an expensive way to learn but it may be well worth it when you consider the forward benefits that affect the quality of your deal flow and your ability to win deals.

As an aside, I’ve always advised angels to not invest just for financial return. Do angel investing to increase your overall utility (in the multi-faceted economic theory sense) and do it so that it generates a return you are happy with.

In summary:

  1. Don’t attempt to pick unicorns as an angel.
  2. Where you can get high-quality deal flow you can win, do a smaller number of deals.
  3. Where needed, and if you can afford it, use higher-volume investing as a way to signal interest in a market and to learn about it so that you can get higher-quality deal flow.
Posted in angel investing, VC, Venture Capital | Tagged , , , | 4 Comments

JSON and JSONlines from the command line

At Swoop we have many terabytes of JSON-like data in MongoDB, Redis, ElasticSearch, HDFS/Hadoop and even Amazon Redshift. While the internal representations are typically not JSON but BSON, MsgPack or native encodings, when it comes time to move large amounts of data for easy ad hoc processing I often end up using JSON and its bulk cousin, JSONlines. This post is about what you can quickly do with this type of data from the command line.

The best JSON(lines) command line tools

There has been a marked increase in the number of powerful & robust tools for validating and manipulating JSON and JSONlines from the command line. My favorites are:

  • jq: a blazingly fast, C-based stream processor for JSON documents with an easy yet powerful language. Think of it as sed and awk for JSON but without the 1970s syntax. Simple tasks are trivial. Powerful tasks are possible. The syntax is intuitive. Check out the tutorial and manual. Because of its stream orientation and speed, jq is the most natural fit when processing large amounts of JSONlines data. If you want to push the boundaries of what is sane to do on the command line there are conditionals, variables and UDFs.
  • underscore-cli: this is the Swiss Army knife for manipulating JSON on the command line. Based on Node.js, it supports JavaScript and CoffeeScript expressions with built-in functional programming primitives from the underscore.js library, relatively easy JSON traversal via json:select and more. This also is the best tool for debugging JSON data because of the multitude of output formats. A special plus in my book is that underscore-cli supports MsgPack, which we use in real-time flows and inside memory-constrained caches.
  • jsonpath: Ruby-based implementation of JSONPath with a corresponding command line tool. Speedy it is not but it’s great when you want JSONPath compatibility or can reuse existing expressions. There are some neat features such as pattern-based tree replace operations.
  • json (a.k.a., jsontool): another tool based on Node.js. Not as rich as underscore-cli but has a couple of occasionally useful features having to do with merging and grouping of documents. This tool also has a simple validation-only mode, which is convenient.

Keep in mind that you can modify/extend JSON data with these tools, not just transform it. jsontool can edit documents in place from the command line, something that can be useful for, for example, quickly updating properties in JSON config files.

JSON and 64-bit (BIGINT) numbers

JSON has undefined (as in implementation-specific ) semantics when it comes to dealing with 64-bit integers. The problem stems from the fact that JavaScript does not have this data type. There are Python, Ruby and Java JSON libraries that have no problem with 8-byte integers but I’d be suspicious of any Node.js implementation. If you have this type of data, test the edge cases with your tool of choice.

JSONlines validation & cleanup

There are times when JSONlines data does not come clean. It may include error messages or a mix of STDOUT and STDERR output (something Heroku is notorious for). At those times, it’s good to know how to quickly validate and clean up a large JSONlines file.

To clean up the input, we can use a simple sed incantation that removes all lines that do not begin with [ and {, the start of a JSON array or object. It is hard to think of a bulk export command or script that outputs primitive JSON types. To validate the remaining lines, we can filter through jq and output the type of the root object.

cat data.jsonlines | sed '/^[^[{]/d' > clean_data.jsonlines
cat clean_data.jsonlines | jq 'type' > /dev/null

This will generate output on STDERR with the line & column of any bad JSON.

Pretty printing JSON

Everyone has their favorite way to pretty print JSON. Mine uses the default jq output because it comes in color and because it makes it easy to drill down into the data structure. Let’s use the GitHub API as an example here.

# List of Swoop repos on GitHub
API='https://api.github.com/users/swoop-inc/repos'
alias swoop_repos="curl $API"

# Pretty print the list of Swoop repos on GitHub in color
swoop_repos | jq '.'

JSON arrays to JSONlines

GitHub gives us an array of repo objects but let’s say we want JSONlines instead in order to prepare the API output for input into MongoDB via mongoimport. The –compact option of jq is perfect for JSONlines output.

# Swoop repos as JSONlines
swoop_repos | jq -c '.[]'

The .[] filter breaks up an array of inputs into individual inputs.

Filtering and selection

Say we want to pull out the full names of Swoop’s own repos as a JSON array. “Own” in this case means not forked.

swoop_repos | jq '[.[] | select(.fork == false) | .full_name]'

Let’s parse this one piece at a time:

  • The wrapping [...] merges any output into an array.
  • You’ve seen .[] already. It breaks up the single array input into many separate inputs, one per repo.
  • The select only outputs those repos that are not forked.
  • The .full_name filter plucks the value of that field from the repo data.

Here is the equivalent using underscore-cli and a json:select expression:

swoop_repos | underscore select \ 
    'object:has(.fork:expr(x=false)) > .full_name'

In both cases we are not saving that much code but not having to create files just keeps things simpler. For comparison, here is the code to output the names of Swoop’s own GitHub repos in Ruby.

Posted in Code | Tagged , , , , , , , , | 3 Comments

My most favorite math proof ever

Math is beautiful and, sometimes, math becomes even more beautiful with the help of a bit of computer science. My favorite proof of all time combines the two in just such a way.

Goal: prove that the cardinality of the set of positive rational numbers is the same as that of the set of natural numbers.

This is an old problem dating back to Cantor with many proofs:

  • The traditional proof uses a diagonal argument: geometric insight that lays out the numerator and the denominator of a rational number along the x and y axes of a plane. The proof is intuitive but cumbersome to formalize.
  • There is a short but dense proof that uses a Cartesian product mapping and another theorem. Personally, I don’t find simplicity and beauty in referring to complex things.
  • There is a generative proof using a breadth-first traversal of a Calkin-Wilf tree (a.k.a, H tree because of its shape). Now we are getting some help from computer science but not in a way that aids simplicity.

We can do much better.

Proof:

Given a rational number p/q, write it as the hexadecimal number pAq. QED

Examples:

  • 0/1 → 0A1 (161 in decimal)
  • ¾ → 3A4 (932 in decimal)
  • 12/5 → 12A5 (4773 in decimal)

Code (because we can):

def to_natural(p, q)
 "#{p}A#{q}"
end

It is trivial to extend the generation to all rationals, not just the positive ones, as long as we require p/q to be in canonical form:

def to_natural(p, q)
 "#{p < 0 ? 'A' : ''}#{p.abs}A#{q}"
end

To me, this CS-y proof feels much simpler and more accessible than any of the standard math-y proofs. It is generative, reducible to a line of code and does not require knowledge of any advanced concepts beyond number systems which are not base 10, a straight, intuitive extension of base 10 positional arithmetic.

Note: we don’t need to use hexadecimal. The first time I heard this proof it was done in base 11 but I feel that using an unusual base system does not make the proof better.

Posted in Uncategorized | 2 Comments

Monitoring Redis with MONITOR and WireShark

At Swoop we use Redis extensively for caching, message processing and analytics. The Redis documentation can be pithy at times and recently I found myself wanting to look in more depth at the Redis wire protocol. Getting everything set up the right way took some time and, hopefully, this blog post can save you that hassle.

Redis MONITOR

The Redis logs do not include the commands that the database is executing but you can see them via the MONITOR command. As a habit, during development I run redis-cli MONITOR in a terminal window to see what’s going on.

Getting set up with WireShark

While normally we’d use a debugging proxy such as Charles to look at traffic in a Web application, here we need a real network protocol analyzer because Redis uses a TCP-based binary protocol. My go-to tool is WireShark because it is free, powerful and highly customizable (including Lua scriptable). The price for all this is dealing with an X11 interface from the last century and the expectation that you passed your Certified Network Engineer exams with flying colors.

To get going:

  1. WireShark needs X11. Since even Mac OS X stopped shipping X11 by default with Mountain Lion, you’ll most likely want to grab a copy, e.g., XQuartz for OS X or Xming for Windows.
  2. Download and install WireShark.
  3. Start WireShark. If you see nothing, it may be because the app shows as a window associated with the X11 server process. Look for that and you’ll find the main application window.

Redis protocol monitoring

WireShark’s plugin architecture allows it to understand dozens of network protocols. Luckily for us, jzwinck has written a Redis protocol plugin. It doesn’t come with WireShark by default so you’ll need to install it. Run the following:

If WireShark is running, restart it to pick up the Redis plugin.

Now let’s monitor the traffic to a default Redis installation (port 6379) on your machine. In WireShark, you’ll have to select the loopback interface.

wireshark-startTo reduce the noise, filter capture to TCP packets on port 6379. If you need more sophisticated filtering, consult the docs.

wireshark-filter

Once you start capture, it’s time to send some Redis commands. I’ll use the Ruby console for that.

This will generate the following output from the MONITOR command:

1999[~]$ redis-cli MONITOR
OK
1369526925.306016 [0 127.0.0.1:55023] "set" "key:5" "\xad\xad"
1369526927.497785 [0 127.0.0.1:55023] "get" "key:5"

In WireShark you’ll be able to see the binary data moving between the client and Redis with the benefit of the command and its parameters clearly visible.

wireshark-view

Check out the time between request and response. Redis is fast!

Posted in Software Development | Tagged , , , , , | 1 Comment

Google and the ecosystem test

I am roaming the halls of Google I/O 2013 and wondering whether Google’s platform passes the ecosystem test.

… no platform has become hugely successful without a corresponding ecosystem of vendors building significant businesses on top of the platform. Typically, the combined revenues of the ecosystem are a multiple of the revenues of the platform.

So much activity but what’s the combined revenue of the businesses building on top of Android, Chrome & Apps?

Posted in Google | Tagged , , , | 3 Comments

Anatomy of an online ad

I’ve been asked to explain how online ads are delivered many times and every time I’m surprised by the complexity of covering even the most basic elements of how ads appear on Web pages. Since Wikipedia’s article on ad serving is not much help, I’ll try to explain one common way ads are delivered using a concrete example.

Side note: this is not how Swoop works. At Swoop we use a much simpler and more efficient model because we’ve built an end-to-end system. This eliminates the need for lots of different systems to touch (+ cookie + track) users. It also allows us to create deeper and more relevant matches by placing Swoop content not in arbitrary fixed slots but in dynamic slots right next to the part of a page it relates to. If you are looking for an analogy, think about Google Adwords on SERPs. It’s an end-to-end system where Google has complete control over ad placement and no ads are shown if there are no relevant ads to show.

Tools of the trade

If you want to know how adtech works, there is no better tool than Ghostery. Ghostery was created by my friend David Cancel and, later, when he was starting Performable (now part of Hubspot), my previous startup, Evidon, became the custodian of the Ghostery community. Ghostery will show you, page-by-page, all the different adtech players operating on a site. For example, on Boston.com’s sports page, there are 35 (!) separate adtech scripts running today.

ghostery-on-boston-sportsGhostery will show you what is happening but not how it happened. If you are technical and want to understand the details of how ad delivery works, there no better tool than a debugging proxy such as Charles or Fiddler. Just be prepared for the need to use code deobfuscators. If you don’t have time for wading through obfuscated code and you really want to know what’s going with your sites(s) or campaign(s), it is worth taking a look at Evidon Encompass. It’s an advanced analytics suite built on top of the Ghostery data.

The example

The example we’ll use is the arthritis page on Yahoo!’s health network. We will focus on the leaderboard ad at the top, which is a Celebrex (arthritis drug) ad from Pfizer.

yahoo-arthritis-page

What Yahoo! sent the browser

The initial response my browser got from Yahoo!’s server included the following chunk of HTML about the leaderboard ad unit, which I’ve formatted and added comments to. (Not sure what’s up with the empty lines that WP is adding to the bottom of the gists–they’re not on GitHub).

This content was mostly likely emitted by the Yahoo publishing system without direct coordination with the Yahoo ad server but instead using conventions about categories, page types, etc. and hence parameters like rs=cnp:healthline that you see on the URLs.

Display advertising units use standard IAB formats. In this case, we are dealing with a 728×90 leaderboard unit. The DIV with id yahoohealth-n-leaderboard-ad sets up the location where the ad unit will be displayed. The DIV under it serves the dubious function of controlling some styling related to the ad content.

Beyond this there are two things going on here. The first is the delivery of the ad script and the second is the delivery of a tracking pixel via a tracking pixel script.

Tracking pixels

Tracking pixels are 1×1 invisible images served from highly-parameterized URLs. They are not used for their content but for the request they generate to a server. The request parameters are used for record-keeping and the response could be used to cookie the user, though this did not happen in this case.

The tracking pixel is delivered via the script inside the <center> tag. It’s contents are shown below.

The script uses the JavaScript document.write function to write some HTML into the page. In this case the HTML is for an invisible image (display: none, height: 0, width: 0) whose URL is that of the tracking pixel, whose unencoded value is the long URL:

http://us.bc.yahoo.com/b?P=s4i.Bjc2LjFQGJ6UTYvtUx1LMTczLlFt_1n__8MU&T=18f3d55it/X=1366163289/E=96843138/R=he/K=5/V=8.1/W=0/Y=YAHOO/F=4264990850/H=YWRjdmVyPSI2LjQuNCIgc2VydmVJZD0iczRpLkJqYzJMakZRR0o2VVRZdnRVeDFMTVRjekxsRnRfMW5fXzhNVSIgc2l0ZUlkPSI0NDUzMDUxIiB0U3RtcD0iMTM2NjE2MzI4OTI1ODA1NyIg/Q=-1/S=1/J=69060D4C&U=12d690f6l/N=lv51DmKImng-/C=-1/D=FSRVY/B=-1/V=0

As you can see, lots of data getting sent, most likely to record the impression opportunity parameters.

Yahoo! ad delivery script

There are two ways to deliver an ad unit. The preferred way is via a script. If scripting is disabled in the browser, however, Yahoo doesn’t want to lose the ad impression opportunity and so there is the <noscript> option to show the ad in an iframe, probably as an image.

The code for the Yahoo! ad delivery script, which comes from the Yahoo! ad server, is shown below with reformatting and comments from me.

There are several things going on here:

  • AdChoice notice
  • Cache busting
  • Google ad delivery script activation
  • Yahoo impression tracking
  • No script handling

Let’s consider them one at a time.

AdChoice notice

AdChoice came about in 2010 as the online advertising industry’s response to FTC pressure to reign in some poor privacy practices and provide consumers with more transparency and choice when it comes to interest-based advertising, a.k.a., behavioral targeting (BT in adtech parlance).

The AdChoice icon is a triangle with an i in it. Its color can vary. Yahoo!’s is gray (). Next time you see an ad with it, click on the AdChoice notice. You should see information about who targeted the ad at you and get some options to opt-out of interest-based advertising. We started Evidon back in 2009 to bring more transparently to adtech and we helped create AdChoice. Evidon is now the leading independent player in this space.

In the case of the Celebrex ad from our example, the AdChoice icon is tied to a very long URL:

http://clicks.beap.bc.yahoo.com/yc/YnY9MS4wLjAmYnM9KDE1aGtrdjRldShnaWQkVHdWQTBqYzJMakZRR0o2VVRZdnRVd1VTTVRjekxsRnRfMGJfXzcwayxzdCQxMzY2MTYzMjcwNTgzNjQ0LHNpJDQ0NTMwNTEsc3AkOTY4NDMxMzgsY3IkMzM1MTkyOTA1MSx2JDIuMCxhaWQkMENPUU9rd05QZlEtLGN0JDI1LHlieCR2ZFJfX19Ed2xLd2liQ0k4SEdHUVVBLGJpJDE3Mzk2MTcwNTEsdyQwKSk/1/*http://info.yahoo.com/relevantads/

If you click on the AdChoice icon, Yahoo! will record information about which ad you are selecting to learn more about and then redirect you to the page at the end of the URL, which is the Yahoo learn more about this ad page. The long URL is just for bookkeeping.

BTW, the reason why you don’t see AdChoice notice with Swoop is because Swoop does not do any behavioral targeting at this time. Still, because we want to make it clear that Swoop is serving content, you’ll see our logo on our units.

Cache busting

After the AdChoice notice setup comes a line of script that creates a random number. This is used for cache busting.

A cache-buster is a unique piece of code that prevents a browser from reusing an ad it has already seen and cached, or saved, to a temporary memory file.

Adding a random number to a URL does that nicely.

Google ad delivery script activation

The following script tag loads Google’s ad delivery script from ad.doubleclick.net. We will look at this later on.

Yahoo impression tracking

Remember how Yahoo already fired one tracking pixel to record the impression opportunity. Well, here, at the end of the script they are going to fire another tracking pixel but this time the purpose will be to record the impression of the Google ad. As before, you can see lots of data being passed.

http://csc.beap.bc.yahoo.com/yi?bv=1.0.0&bs=(134i64f2m(gid$TwVA0jc2LjFQGJ6UTYvtUwUSMTczLlFt_0b__70k,st$1366163270583644,si$4453051,sp$96843138,pv$0,v$2.0))&t=D_3&al=(as$12rveflk1,aid$0COQOkwNPfQ-,bi$1739617051,cr$3351929051,ct$25,at$0,eob$-1)

Noscript processing & click tracking

As before, in the case that the browser does not have JavaScript enabled, Yahoo doesn’t want to miss the opportunity to deliver an ad, which is why they have the option to display the Google ad as an image.

In that case, Yahoo is also positioned to capture the click and then redirect to Google. This is achieved by wrapping the image (<img>) in a link (<a>). Getting click feedback data would be valuable for Yahoo as it allow is to optimize better. If the unit is sold on a cost-per-click (CPC) basis, then getting click data is a requirement for good record-keeping.

Google/DoubleClick ad delivery script

It’s time for us to take a look at what Google’s ad delivery script does. Alas, the guys at Google don’t want to waste bandwidth so they’ve packed everything into a single unreadable document.write call. You can scroll to the right for a very long time…

Here is what Google is actually trying to write into the HTML page (with my comments added):

Don’t worry about the volume of code. There are basically two things going on here: delivering a Flash ad and lots of third party ad verification.

Delivering a Flash ad

Flash ads are richer and more interactive but there are browsers where Flash ads don’t do so well. The first part of the Google/DoubleClick ad delivery script is about carefully determining whether a Flash ad unit can be used and falling back to images otherwise. As before, all clicks are tracked via redirects.

Third party  verification

We saw Yahoo! attempting to fire three types of tracking pixels: (a) for impression opportunities, (b) for actual impressions and, in the case of no scripting, (c) for clicks. This is to help optimize the performance of Yahoo!’s ad network. This is first party verification. Google/DoubleClick does the same with its own systems.

Third party verification happens when the advertiser asks the delivery network, in this case Google/DoubleClick, to include additional verification tags (scripts) to prove that the campaign is delivered based on its configured parameters.

In the case of this Celebrex campaign, Pfizer is using four separate verification vendors. At the top level we have only Nielsen NetRatings and DoubleVerify, however NetRatings’s script loads AdSafe as well as Facebook in the pattern we are familiar with: a script that writes out <script> tags to load more scripts.

Putting it all together

Let’s try to piece together the requests that allow this one single ad unit for Celebrex to appear:

  • Yahoo ad unit delivery script
    • Google ad delivery script
      • Flash movie
        • ??? (not easy to track Flash traffic)
      • Nielsen NetRatings tracking script
        • AdSafe pixel
        • Facebook iframe
          • Facebook tracking
            • ??? (did not analyze)
      • DoubleVerify script
        • Tracking script (like a pixel)
    • Yahoo impression tracker
      • Tracking pixel
  • Yahoo impression opportunity tracker
    • Tracking pixel

All in all, 13 separate HTTP requests to 6 separate companies, not counting redirects and cacheable scripts. With this much complexity and independent parties doing their own accounting, it’s no surprise the display advertising value chain is in a bit of a mess right now.

Posted in Advertising, Swoop | Tagged , , , , , , , , | 1 Comment

Startup anti-pattern: platform risk

One of the fastest ways for a startup to grow has always been to ride on the shoulders of a successful platform: from Microsoft/OSS in software to AWS in cloud computing to iOS/Android in mobile to Facebook/Twitter/Pinterest in social to IAB/Google in advertising and the many SaaS players. Betting on a platform focuses product development both because of technology/API choices and because of the automatic reduction in the customer/user pool. Also, platforms that satisfy the ecosystem test help the startups that bet on them make money. That is, until they don’t.

I’ve been involved with three startups that have been significantly helped by platforms initially and then hurt by them. Two cases involved Microsoft. One case involved Twitter. The first time it happened, our eyes were closed and it hurt. It prompted me to learn more about how platform companies operate and how they use and abuse partners—companies small and large—to help them compete with other platforms. The basic reality is that platform companies will do whatever it takes to win and they typically don’t care much about the collateral damage they cause.

Just like hacking fast & loose, which accumulates technical debt, accelerating the growth of a startup by leveraging a platform may come with substantial platform risk.

Note: links to undocumented anti-patterns will take you to the main list.

Startup Anti-Pattern: Platform Risk

What it is

Platform risk is the debt associated with adopting a platform. Platform risk becomes an anti-pattern when three conditions are met:

  1. The platform dependency becomes critical to company operations.
  2. The company is unaware of the extent of the risk it has assumed.
  3. There is increased likelihood of adverse platform change.

Platform risk tends to appear with other situational awareness anti-patterns such as ignorance and unrealistic expectations.

Why it matters

Here are the top 10 sub-patterns of platform risk hurting startups that I’ve seen:

  • Lock-in. Startups that adopt a closed platform can be locked into their choice typically for the duration of the company’s life. This is not a problem until the need arises to support another platform. At that point the time & cost associated with the work could be substantial, especially if the core architecture was not designed with this in mind. In many cases, it is cheaper to start from scratch.
  • Forced upgrades. When software came in boxes, if you didn’t like the new version or if it was incompatible with your own software, you and your customers did not need to upgrade. You could take the time to make things work and upgrade on your own schedule. In the platform-as-a-service world, you do not have this option. Instead, forced upgrades are the norm. You have to deal with them on the platform vendor’s schedule, which may be quite inconvenient and costly. You do not have the option to ignore the update. Vendors vary widely in how they manage their partner ecosystems with respect to forced upgrades. Google has been pretty good when it comes to its APIs and has acted like a not-so-benevolent dictator when it comes to non-API-related behaviors of services such as search and advertising. Facebook and Google have both been accused of manipulating the behavior of their systems to force businesses to spend more money in their advertising platforms. In the case of Facebook, the issue has been pay-to-play for likes. Google has come repeatedly under fire for manipulating the search user experience to (a) shape traffic away from large publishers it competes with and (b) reduce advertiser choices and drive more ad dollars to AdWords. If your business depends on SEO or SEM, these changes can be very significant. The former CEO of a large advertising agency once summarized this as “Google giveth and Google taketh away.”
  • Forced platform switch. A forced platform switch usually comes as a side effect of platform vendors playing turf wars. For example, Apple severely hurt Adobe’s Flash platform as a way to limit write once, run anywhere options in mobile, thus also slowing Android’s adoption a bit. Thousands of small game & other types of content developers in the Web & Flash ecosystem were affected and had to either abandon iOS development or find new costly talent.
  • The partner dance. The partner dance is most commonly seen in enterprise software. It was popularized by Microsoft. As one former MS exec described it to me: “first you design your partners in and then you design you partners out.” During the design-in phase, a platform vendor partners with and, in some cases, spends meaningful resources helping an innovative startup company with solutions that compete with the solutions of another platform vendor. As the platform company’s own product roadmap matures, it designs its partners out starts directly competing with them.
  • Swinging. Swinging is a variation of the partner dance where rather than competing directly with a startup, the platform vendor partners with one of the startup’s competitors. Some years ago I was on the board of a European company that was Microsoft’s preferred partner in a fast-growing market. After winning against much bigger players such as EMC and IBM, the startup convinced MS that there was a big business to be built in this market. At that point MS promptly terminated the startup’s preferred status and partnered with a much bigger competitor. We were expecting the move: Microsoft now wanted to move hundreds of millions of dollars of its platform products in this space and the startup, despite closing significant business, could not operate at this scale. The Facebook/Zynga saga is an example from the online world.
  • Hundred flowers. The name of this sub-pattern comes from the famous Chairman Mao quote “Let a hundred flowers blossom.” Mao fostered “innovation” in Chinese socialist culture—open dissent—and then promptly executed many of the innovators. It seems that Twitter, Facebook and other social platforms have studied the Chairman quite well, judging by how efficiently they have moved from relying on the adopters of their APIs for growth and traffic to restricting their access and hurting their businesses. The prototypical example is Twitter driving much of its traffic from third party clients and then moving against them.
  • Failure to deliver. Startups pick platforms not just because of their current capabilities and distribution but also because of their expected future capabilities and distribution power. If the platform does not deliver, the startup’s ability to execute can be significantly hampered. One of the most common use cases of this sub-pattern relates to open-source platforms where the frequent lack of a single driving force behind a product or service could lead to substantial delays. At various points, teams I’ve been involved with have had to dedicate significant resources to accelerate development of OSS, e.g., Apache Axis, which turned out to be the most popular Web services engine, and Merb, whose adoption turned out to be a bad platform decision for my startup. It’s rewarding work but it also usually is plumbing work that generated little business value.
  • Divergence. Divergence is a form of failure to deliver rooted in a change of strategic direction of the platform. Divergence can be very costly over time and difficult to diagnose correctly because it happens very slowly. The analogy that comes to mind is of a frog in a pot of water on the stove. I knew a startup with a neat idea on how to provide significant value on top on the Salesforce platform APIs. They just needed one improvement that was “on the roadmap.” The improvement remained on the Salesforce roadmap for more than two years as the startup ran out of money. The hidden reason was that Salesforce had grown less interested in the use case. Another Salesforce-related example is the recent hoopla about the unannounced changes in Heroku’s routing mechanism, which cost RapGenius a lot of money. In this case, the reason was Heroku moving from being a great place to host Ruby apps to being a great place to host any apps and in the process becoming a less great place to host Ruby apps.
  • Poison pill. A platform choice made years ago could turn out to be a poison pill when it comes to selling your company to another larger platform vendor. As an example, consider the case of Google buying a company whose products are built on Microsoft’s .NET platform or Microsoft buying a SaaS collaboration solution that runs on Google Apps. Alas, most startups do not think about the exit implications as they make platform decisions early on.
  • Exit pressure. Platform companies may sometimes exert substantial pressure on partners when they want to acquire them. When Photobucket did not want to sell to MySpace they somehow experienced “integration problems” with MySpace, which affected their traffic. The sale soon completed. This goes to show that talking softly while controlling the source of traffic tends to deliver results. This week we learned that Twitter’s acquisition of social measurement service Bluefin Labs involved some threats, which must have been perceived as credible since 90% of Bluefin’s data came from Twitter.

Diagnosis

Good diagnosis of the platform risk anti-pattern is exceptionally difficult because it requires predicting the future path of a platform as well as those of the platforms it competes with. The basic strategy for diagnosing this anti-pattern involves three parts:

  1. Investment in ongoing deep learning about the platform and its key competitors. This should cover the gamut from history to technology to business model to the personalities involved.
  2. Developing relationships with industry experts with a deep perspective of the platform, whose businesses, like telltales on a sailboat, in some way provide leading indicators of platform change. You don’t want just smart people. You want people with proprietary access and data. For enterprise software try preferred channel partners. For open-source software try high-end OSS consultants. For advertising, find the right type of agency.
  3. Network into the group(s) responsible for the platform, both involving people currently on the job as well as senior people who’ve recently left. This latter group has been the most helpful in my experience.

Ignorance is the most common anti-pattern that makes the diagnosis of platform risk difficult.

Misdiagnosis

A common misdiagnosis stems from failure to consider the effects of competitive platforms on the platform a startup has adopted. Sometimes it is these competitors’ actions that trigger the negative consequences, as was the case of Apple’s decisions hurting the Adobe Flash developer ecosystem.

Refactored solutions

Once diagnosed, the key question regarding the platform risk anti-pattern is whether anything at all should be done about it. Most companies choose to live with the risk, though very few fully use the diagnosis strategies to get an accurate handle of the net present value of the risk.

The refactoring of platform risk is typically very, very expensive as well as very distracting. For example, some would argue that Zynga’s fight on two fronts (a) trying to refactor its platform risk related to Facebook and (b) ship new games is what hurt the company’s ability to execute.

In the case of platform risk, prevention is far better than any cure. In the words of Fred Wilson (an investor in Twitter): “Don’t be a Google Bitch, don’t be a Facebook Bitch, and don’t be a Twitter Bitch. Be your own Bitch.” Being your own bitch doesn’t mean not leveraging platforms. It means getting in the habit of doing the following three things in a lightweight, continuous process:

  1. Explicitly evaluate platform adoption decisions, once you have sufficient information.  Having sufficient information usually involves more than reading a few blogs. For example, at Swoop we recently had to make a search platform choice. We decided to go with Elastic Search but not before I had talked to the company, not before Benchmark invested significantly in ES, and not before I’d talked to friends who ran some of the largest ES deployments to get the lowdown on what it was operate ES at scale. 
  2. Invest the time to learn about the platform and develop the relationships that would help you have special access to information about the platform. Here is my simple rule of thumb with respect to any platform critical to your business: someone on your team should be able to contact one of the platform’s leaders and get a response relatively quickly. This is especially important if you are dealing with new or not super-popular open-source projects. The best way to achieve this is to think about how you and your business can help the platform.
  3. Every now and then spend a few minutes to honestly evaluate your company’s level of platform risk and think about how you’d mitigate it and when you’d have to put mitigation in action.

Remember, the goal is not to eliminate platform risk. You cannot do this while at the same time taking advantage of a platform. The goal is to efficiently reduce the likelihood of Black Swan-like events related to the platform hurting your business. If you understand the mechanics of how platforms operate and how platform risk accrues, you will be able to predict and prepare for events that take others by surprise. These are sometimes the best times to scale fast and leapfrog competitors.

When it could help

Betting on a platform can be hugely helpful to a startup, despite some level of platform risk. There is never a benefit from platform risk increasing to the anti-pattern level.

***

The startup anti-pattern repository is work in progress helped immensely by the stories and experiences you share on this blog and in person. Subscribe to get updates and share this anti-pattern with others so that we can avoid more preventable startup failures.

Posted in Anti-pattern, startups | Tagged , , , , , , | 5 Comments

Startup anti-pattern: ignorance

Listening to a startup pitch a few weeks ago I had to exercise effort not to shake my head in disbelief. The presenters were describing the products and business model of an established company I was very familiar with and doing it with what seemed to be blatant disregard of reality. They forgot to mention an entire product line that directly competed with their would-be product. They claimed the company had a different business model than the one it did. They misrepresented its scale, number of customers, etc. The obvious question was whether they were knowingly misrepresenting their competitor or whether they were confused or simply unaware. After the Q&A it became clear that they were just ignorant.

The event took place during a startup competition. What struck me was that, without exception, following any one startup’s presentation, at least one of the reviewers who was an expert in the startup’s space would point out some way in which the entrepreneur(s) had either forgotten to investigate something of critical importance to their business or deeply misunderstood it following a typically brief investigation. The startups were gearing to spend time & money based on flawed conclusions. They had all taken on significant risk because of their ignorance.

Note: links to undocumented anti-patterns will take you to the main list.

Startup Anti-Pattern: Ignorance

What it is

Ignorance is not knowing what you don’t know. It may be the most common startup anti-pattern. It should not be confused with making an explicit, considered decision to ignore something (knowing what you don’t know).

Ignorance is often found with arroganceescapism, not knowing your investors and unrealistic expectations. It is usually deadly when combined with a big dose of arrogance, perhaps as a result of the Dunning-Kruger effect (a cognitive bias).

Why it matters

Ignorance hurts startups in countless ways, of which the following are pretty common:

  • Ignorance creates an invisible bias in decision-making, which often results in significant time and resources being wasted.
  • Ignorance sends a very early negative signal that turns experienced talent away. Would be co-founders, executives, employees, investors, board members and advisors may determine it is not worth their time to engage, especially if they see ignorance co-occurring with anti-patterns such as arrogance and unrealistic expectations.
  • Ignorance is self-perpetuating: it attracts equally or more ignorant talent to the company.
  • When equally or more ignorant investors join an ignorant company, they can experience a very rapid & destructive “falling out of faith” when they have to come to grips with reality. I have seen several companies hit the wall at high speed because expected financings did not come together as planned for this exact reason.

Diagnosis

Ignorance is easy to diagnose but it takes work. There are two main strategies.

The first strategy is introspective in nature. It involves comparing observed outcomes with clearly recorded expected outcomes and then analyzing the root cause of the difference. Ignorance creates an omitted variable bias (OVB) in decision making and will likely cause reality to not match expectations. If there is no obvious known root cause for the observed difference the likelihood of OVB increases and ignorance should be the suspect. The benefit of this approach is that anyone can practice it. The main problem with it is that it is a lagging indicator: it detects problems after they have already occurred, which is inefficient. The biggest reasons why this diagnostic cannot be applied are: (a) that expected outcomes are not clearly recorded or (b) that an incorrect root cause is identified (see the scapegoat anti-pattern).

The second strategy is a leading indicator. The idea is simple: seeks external feedback to diagnose and eliminate ignorance early, before it costs you. External feedback can come through materials or through people. In the latter case, the trick is to be humble and genuinely interested in people’s opinions or you may not get them or, worse, you may get an artificially positive opinion whose goal is to get you off their back. Another thing to watch out for when talking to experts is the mentor whiplash anti-pattern.

Arrogance and confirmation bias are the most common anti-patterns that make the diagnosis of ignorance difficult.

Misdiagnosis

When exogenous forces such as luck and timing affect the success of a startup, ignorance is often confused with genius, vision, and perseverance. As the saying goes, sometimes people can do something just because they don’t know it couldn’t be done. Statistically-speaking, this is not a good strategy for startup success.

Refactored solutions

Once diagnosed, the refactoring of the anti-pattern very much depends on the nature of ignorance involved and its root cause.

Some methodologies can fix the symptoms by making it very difficult to remain ignorant. For example:

  • Agile development methodologies make it difficult to hide issues related to engineering execution.
  • Customer development makes it difficult to remain ignorant about issues related to product/market fit

Fixing the symptoms is not the same as identifying and fixing the root cause. If the root cause is simply lack of knowledge or mis-understanding it is relatively easy to fix. If the root cause is deeply character-related, e.g., a fundamental lack of curiosity or excessive narcissism, then the fix typically has to involve finding more suitable roles for the people involved or transitioning them out of the company.

Effective startup execution requires agile handling of uncertainty. Eliminating ignorance has a cost, which needs to be considered relative to its potential benefit. A particular form of the analysis paralysis anti-pattern involves spending far more time than it is worth attempting to diagnose and eliminate ignorance.

When it could help

As with most things, ignorance happens on a scale and there are many cases in a startup’s life when measured amounts of ignorance can have positive effects, at least temporarily:

  • Ignorance can facilitate focus by artificially simplifying planning & execution.
  • Ignorance can be motivational by creating artificial certainty and hiding potentially bad, demotivating news.
  • Ignorance can bring resources to the company, e.g., investment capital from investors who might be put off by reality.

***

The startup anti-pattern repository is work in progress helped immensely by the stories and experiences you share on this blog and in person. Subscribe to get updates and share this anti-pattern with others so that we can avoid more preventable startup failures.

Posted in Anti-pattern, startups | Tagged , , , , , | 5 Comments

One more vote for functional languages

I’ve been doing a lot of work on startup anti-patterns recently so it seems fitting that I look at other things through that lens.

Every few months I fall into the same debugging anti-pattern:

  1. I start using a debugger to track down a difficult-to-pin-down issue.
  2. I set up a watch expression to look at some complex data structure.
  3. The debugger evaluates the watch expression outside its intended scope, which (sometimes) creates a side effect that cannot be observed directly at the time it happens.
one_tricky_problem + one_semi_random_problem == frustration

Yesterday, it happened when I was debugging a VBA script that’s part of my automation of tax preparation. Don’t ask: it seemed like a good idea a long time ago and is probably the main reason I occasionally use my old Windows machine.

The VBA Dictionary class will auto-create elements for any key it doesn’t have a value for. Therefore, a watch expression such as:

myClass.myDictionary(myKey)

can easily create a Nothing value with an Empty  key when myClass is in scope and when myKey is a Variant not in scope.

The previous time I experienced a similar problem was in Ruby when the watch expression had a side-effect on a cache whose logic I was trying to debug.

Eventually, I hope to develop a sixth sense for this debugging anti-pattern. In the meantime, I don’t like my options: (a) not use a debugger or (b) live with paranoia about watch expressions.

Alternatively, I could embrace functional programming languages without side effects. I wonder what their debuggers are like…

Posted in Code | Tagged , , , , , , | Leave a comment

Top startup anti-patterns

Following my post on the value of startup anti-patterns, with the help of readers of this blog and friends from the startup ecosystem, I put together a list of the more common ones that I encounter. The list used to be here but I moved it to a page on startup anti-patterns for easier linking and sharing.

Posted in startups | Tagged , , , | 7 Comments