Startup anti-pattern: if you build it, they will come

As part of the continued series on startup anti-patterns, we look at the battle between conviction and validation.

First, a story. In 2000, Intech technology, a fledgling startup out of Israel, was building a new type of billing software for property managers. Intech had one potential customer—the Israeli government—that shared the founders’ vision of software which could split bills across multiple tenants in a customizable fashion. For example, using this “killer” feature, the property manager could decide that one tenant pays 70% of the gardening bill while another pays the rest.

The excitement at Intech technologies was at its peak. The founders automatically assumed that if they had the vision and one customer wanted it, many others would. Eighteen months and several layoffs later, the truth was unveiled: end-users didn’t really care about the “killer” feature. Other prospective customers showed no interest in the product’s advanced bill-splitting capabilities. They opted for simpler and cheaper systems that generated invoices and connected to building meters.

After building a product that ended up being an overkill, the company shut down. The founders (Itamar was one of them) learned a hard lesson.

What it is

“If you build it, they will come” is the anti-pattern where startups make decisions based on their vision of how a solution should look, ignoring or underemphasizing customer needs and neglecting to collect sufficient product validation from prospective customers.

The origin of this anti-pattern is the allure of “a great idea”. Entrepreneurs, driven by their passion and conviction, tend to assume that their product’s brilliance alone will captivate customers and guarantee success.

Unfortunately, the mere existence of a product doesn’t automatically translate into customers flocking to buy it. The “if you build it, they will come” mentality often leads to a lack of product-market fit, a leading cause of early stage startup failure.

When combined with confirmation bias, another anti-pattern, this problem becomes even more acute. As with ignorance, it’s usually deadly when combined with a big dose of arrogance.

Why it matters

“If you build it, they will come” mentality can kill your company. It results in redundant product development and misalignment, a significant waste of resources, increased technical debt, and challenges in go-to-market. Hoping that a product will resonate with customers is often a recipe for disaster.

Building a product based on conviction as opposed to market validation can harm your startup in multiple ways:

  • Increased adoption friction. Instead of iterating and improving the product based on customer feedback, startups who fall into this anti-pattern often lack the features customers want. They find themselves trapped in a vicious cycle of slow growth, small capital raises, financial strain and, ultimately, the demise of the startup.
  • Slower product development. Development teams should aim to build what’s most valuable for the business as quickly as possible. Building on conviction without validation is risky because unnecessary features slow down development without creating sufficient business value. Solution complexity and the likelihood of incurring more technical debt, slowing down future development, and shortening a startup’s runway.
  • Low morale. Discovering post-launch that a product isn’t well-received can demoralize a team that worked hard on its development. Before then, team members who know that development is happening with insufficient validation may be demoralized by the company’s approach.

Building “in a vacuum” increases the risk of achieving product-market fit. This misalignment can manifest in various ways. The product might solve a problem that customers don’t care enough about or may fail to meet customers’ expectations or needs. Without product-market fit, it’s harder for a startup to build the right brand, launch effective marketing campaigns, and build the right sales playbook.


Diagnosis requires honest self-reflection. Look at how the company makes product decisions that commit it to significant expenditures of time and money:

  • Are you aware of all important decision points? Lack of awareness leads to implicit decision making. System 1 thinking, skewed by cognitive biases, dominates implicit decisions. Make decisions that commit the company to significant resource use explicitly.
  • When making important decisions, how much weight do you give to conviction (vision, gut feeling) vs. anecdotal evidence (hearsay, one or few data points collected by an ad hoc process) vs. sufficient evidence collected by a thoughtfully designed validation process? Making big decisions without a responsible amount of evidence is risky.
  • Does the evidence supporting decisions come from a sufficiently diverse range of stakeholders, both internal and external ones? Making decisions based on limited/skewed information is risky, especially when decision-makers aren’t aware of the bias and/or variability of the data.

When attempting to diagnose this anti-pattern, make an honest assessment of the extent to which conviction stems from fear. Sim knew a brilliant technical founder who’d rather spend 100 hours writing code than have a validation conversation with a stranger. He thought his product was going to be awesome. It was the only rational way to avoid talking to people who may give him negative feedback.

Fear often deters teams from engaging in validation processes due to a variety of psychological, organizational, and market factors:

  • Fear of being wrong. People often intertwine their ideas with their personal identity. They may perceive being wrong as a personal failure. Cognitive dissonance pushes individuals to avoid situations that might challenge their pre-existing beliefs. Confirmation bias pushes them to unconsciously ignore unfavorable feedback.
  • Fear of the unknown. If validation feedback suggests that significant changes are necessary, this can lead to an overwhelming feeling of uncertainty. The path forward might not be clear, which can be daunting. Even founders, who typically are comfortable with massive amounts of uncertainty, can fall prey to this.
  • Fear of authority. In some hierarchical organizations, when a person of authority has conviction, people lower down in the organization may avoid validation. They fear repercussions if it contradicts the authority figure’s conviction.
  • Fear of disclosure. Some entrepreneurs feel their intellectual property (IP) is so valuable that they fear validation processes might leak some of that IP. In his VC days, Sim met with several founders unwilling to talk about the details of their technology before a term sheet. You can imagine how these pitches went.
  • Fear of being late. Some teams may skip validation to hasten delivery. They may fear that competitors may beat them to market or feel pressure from stakeholders to deliver by a specific deadline. Discussing time pressure trade-offs honestly and explicitly is good. Replacing validation with conviction implicitly, for fear of being late, is a problem.
  • Fear of wasting an investment. Once a team has invested time and money in a particular direction, they might feel that continuing forward is the only option. This is known as the sunk cost fallacy. For fear of creating waste, they will ignore negative evidence. Humans often exhibit loss aversion, where the pain of losing is psychologically about twice as powerful as the pleasure of gaining.

Arrogance and confirmation bias are the most common anti-patterns that make the diagnosis of “if you build it, they will come” difficult.


Misdiagnosis occurs when companies set an unreasonably high bar for the validation required to make product decisions. It may lead to analysis paralysis in organizations, another anti-pattern, reducing the company’s competitiveness in the market and its ability to launch new products in a timely manner.

What is a reasonable, let alone optimal, split between conviction and validation when making decisions? There is no right answer. Context matters. Marissa Mayer famously asked a team at Google to test 41 different shades of blue for the toolbar on Google pages. Was that too much? It’s hardly excessive when considering Google’s scale and Google’s resources. A 41-way test may not have been that much more difficult to execute than a 2-way test. However, the request to test 41 options applied to a product with limited usage would be ridiculous. It’d take too long for the test to produce a valid result.

Refactored solutions

Once diagnosed, the refactoring of this anti-pattern requires changing the organization’s mindset and approach to product development:

  • Empower people to make data-driven decisions. Instrument products for data collection with good security and privacy controls. It should be easy to implement A/B and multivariate tests. Clean, machine-readable metadata should be available for data enhancement. Manage data consistently in a unified platform. Give key stakeholders self-service access to the analytics that matter. Operational dashboards that answer known questions aren’t enough: optimize for ad hoc analytics aimed at answering new questions quickly and precisely. Distribute organizational authority, responsibility, and accountability for making decisions based on data.
  • Embrace market research and customer feedback. Starting at the top, foster a culture of listening to markets by implementing methodologies such as customer development. Engage with potential customers through surveys, interviews, or beta testing to gather valuable feedback that shapes the product roadmap and the entire company. Pay special attention to statistical validity.
  • Share the voice of customers. Broadly distribute customer and market feedback within the organization. Spend time in all-hands and other company-wide communication channels to highlight customers. Empower your customer support/success team to work more closely with product teams and rotating engineers and product managers to support duty.

Your ability to make well-validated product decisions is like a muscle: the more you exercise it, the stronger it gets. Getting good at validation isn’t easy. It requires significant investments in culture, systems, and processes. It also requires overcoming fears.

To overcome fear and foster an environment that encourages validation, organizations and teams can foster a culture of learning and experimentation; encourage collaboration and open communication; and incorporate iterative processes with smart feedback cycles. By addressing fear, organizations can improve the likelihood of developing products that meet market and customer needs, ultimately enhancing their chances of success.

When it could help

This anti-pattern can help in two cases: when an excess of conviction can be useful and when the expected value of validation is low.

As with ignorance, an excess of conviction can be useful in very special circumstances:

  • Entrepreneurs vary wildly in their ability to predict the future. On average, they’re very wrong, but there are outliers. If you have solid evidence, without ego-boosting revisionist history, that you are such an outlier, it may be smart to put relatively more weight on your convictions.
  • If resources and timeframes are very tight, there truly may be no room for doubt or validation, and it may be worth taking on significant validation risk. It’s time for a Hail Mary pass. Startups often live or die by these decisions.
  • There’s a saying in venture capital that a little bit of data is a dangerous thing. Sometimes the presence of data that isn’t great is worse than having no data at all. This is especially true in tough fundraising climates, when investors who are slowing down their investment pace are looking for even more reasons to reject deals. Since hiding bad data is unethical, entrepreneurs sometimes make the decision to avoid or reduce validation instead of risking having to disclose unfavorable data. However, the absence of validation data may lead to fundraising failure.

All these strategies follow the strategy paradox: while they can be extremely successful, they can also lead to extreme failure. Even Steve Jobs, the quintessential product visionary, came up with Macintosh Portable and the Newton.

There are some cases where market validation has lower expected value because it produces fuzzy and/or biased results:

  • Highly disruptive products. One example is Uber/Lyft in the early days. When surveyed, early prospective customers were concerned about getting a ride with an unknown, unlicensed driver. However, after consumers got used to the convenience and cost efficiencies with ride-hailing, they became comfortable with it. Strong network effects compound this early on and it is difficult to imagine the value at scale.
  • Groundbreaking technology. It’s sometimes hard to articulate technology that works like magic to customers. When Steve Jobs introduced the iPhone, many didn’t understand why touch screens would matter so much. Previous smartphones had keyboards and regular touchscreens, and it wasn’t immediately apparent that capacitive touchscreens would change the world.
  • Category creation. In blue ocean scenarios, there are no (or almost no) prospective customers to talk with. The market or category doesn’t exist yet and will only unfold in the (hopefully near-term) future. For example, when Life360 first launched, investors, advisors, and even parents consistently said they don’t believe kids will have smartphones. Smartphones back then were business tools, not replacements for cell phones, and the general audience didn’t think kids would need them. They were clearly wrong (easily said in hindsight).

Some ideas are much harder to validate than others. Smart startups focus a lot of effort on validation to reduce the risk of achieving product-market fit.

Co-authored with Itamar NovickMore startup anti-patterns here.

Posted in Anti-pattern | Tagged , , , , | Leave a comment

Startup anti-pattern: elephant hunting

As part of the continued series on startup anti-patterns, we look at elephant hunting: chasing big customers/deals.

First, two stories that highlight two different sides of elephant hunting.

In 2005, Meridio was guaranteed to win a deal worth $15m+. Meridio was a small electronic documents and records management (EDRM) startup whose software ran inside some of the world’s most secure organizations: from banks to oil & gas companies to branches of government and the military. One of its happy customers, the UK Ministry of Defense (MoD), was looking to modernize its infrastructure in a massive IT procurement worth billions. Each of the two integrator consortia shortlisted for the deal had designed Meridio into the solution. It was the largest secure SharePoint deployment in the world at the time: a great proof point of the quality and scalability of Meridio’s software. The future looked bright.

Meridio did win the deal and get the money in the end, but the process nearly killed the company:

  • The product roadmap and development prioritization became more complicated.
  • Supporting the two fiercely competitive integrator consortia required staffing up teams with semi-duplicated responsibilities: a significant distraction and increase in burn far ahead of revenue.
  • Once the MoD deal was awarded to one of the consortia, Meridio had many employees it couldn’t put to productive use quickly. The resulting layoffs impacted culture.

The UK MoD deal was important for Meridio — it influenced the 2007 sale of the company to Autonomy, now part of OpenText — but it was less impactful from a valuation standpoint than the company imagined it’d be. Winning the deal came at the expense of distraction and operational inefficiency, both of which affected growth in other areas of the business. Also, there never was another deal like it.

And now for story #2. In 2014 Life360 hit gold. After 18 months of lengthy negotiations, Life360 landed a $50m investment deal from ADT, the global leader in Home Security, coupled with a strategic joint product development opportunity that could net the company tens of millions of dollars in revenue. The team was dancing on rooftops!

In 2019, long after the commercial deal was dead in the water, Life360 decided to go public early (compared to its peers), and one of the considerations was ADT’s significant position as an investor in the company. Further, after years of development that sucked, at times, half of our engineering team’s bandwidth, the product we launched was discontinued and made no contribution to our business. When the company struck the deal employees were initially very excited. They believed that the organization they were working with would be as devoted to the strategic deal’s success as their small startup was. Three management team changes later, it became clear that the deal, which was one of the highest priority items on Life360 plate, was a pretty low priority for ADT. New execs at the company didn’t feel a real commitment to it, and a Private Equity acquisition coupled with organizational changes didn’t help much either.

Everything is easier in hindsight, but Life360 could have avoided this. Luckily, the deal didn’t end up being a company killer and the other parts of the business helped Life360 cement a great spot as a public company. It’s probably fair to say Life360’s success happened despite the ADT deal, not because of it.

What it is?

“Elephant Hunting” is a buzz term describing the practice of targeting deals with very large customers. For example, hunting an elephant in the context of a startup could be a seed-stage company targeting the likes of Google or AT&T as a customer in a million-dollar deal. These customers can provide large contracts, but they can be hard to catch and require large teams to tackle. With business-to-business (B2B) startups, there’s almost nothing more exciting (or seductive) than hunting and bagging an elephant-sized deal. It can produce huge revenue growth, provide you with highly leverageable customer references, and it’ll excite investors. Once you hunt down an elephant, it can feed many mouths (and egos) at the company for a long time. What could be better?

Be warned: the pursuit of elephants can be a dangerous game. If you fail to “kill the elephant” it might well be the one killing you. Unlike young and dynamic startups, elephants are organizational dinosaurs and striking a deal with an elephant will require your entire team — from sales to engineering — to engage with the elephant at different levels of the organization. This engagement happens over months, sometimes years. Even if you succeed in getting an elephant, you may get less benefit than you expected, as the cases of both Meridio and Life360 demonstrate.

Why does it matter?

Elephant hunting can bring your company down on its knees. Here are some perils to be aware of:

  • No repeatability. Elephants are hard to catch and often there aren’t enough of them. Meridio never found another UK MoD. Life360 never found another ADT.
  • Heavy operational burden. When you pursue and, later, land an elephant, it’s tempting to put all your resources into serving them. But this can lead to neglecting other clients and missing out on potential opportunities. Both Meridio and Life360 suffered operationally while selling and, later, servicing their respective elephants. Elephants may demand extended payment terms or lower prices, which can put a strain on a startup’s finances. It’s important to carefully consider the financial implications of taking on an elephant client.
  • Missed learning opportunities. When you and your team are laser-focused on one client you might be missing the forest from the trees. As a startup, you seek scalable solutions that matter to most potential customers you want to serve. More feedback is better, and getting feedback from just one elephant makes it harder to identify the scalable, repeatable, products that your target audience needs.
  • Overpromising and underdelivering. In the rush to impress an elephant, startups may make unrealistic promises they can’t keep. This can damage their reputation and lead to the loss of the elephant and future clients. Elephants have tall expectations for products and services delivered, as well as a web of requirements across legal, compliance, cybersecurity, etc. that smaller companies may be incapable of servicing well.
  • Compromising your identity. When a startup lands an elephant, it’s easy to become absorbed in their world and lose sight of your own identity and values. This can lead to compromises that go against your startup’s mission and culture. Note, for example, how many big tech companies have had to compromise to do business in China.
  • Losing control. Elephants may have their own demands and expectations that clash with a startup’s way of doing things. This can lead to a loss of control and autonomy, as the startup becomes beholden to the elephant’s whims. On the partner/channel side, this relates to the platform risk anti-pattern. In conclusion, while landing an elephant can be a huge boost for a startup, it’s important to be aware of the perils that come with it. By maintaining a balance, staying true to your values, and carefully considering the operating implications, startups can avoid the dangers of elephant hunting and build sustainable growth.


Diagnosis is relatively straightforward. Here are a few signals that you might be spending too much time elephant hunting or are getting sucked into the Savannah:

  • Are you and your sales team spending most of your time focused on one deal with a big enterprise client? Has this been going on for an extended period?
  • Are you increasing spend ahead of revenue more than what you’d normally do for just one or two deals?
  • Is a significant chunk of your engineering team’s bandwidth focused on building custom features for one big customer? Does it feel like this customer is essentially dictating your roadmap for the foreseeable future? Do you find yourself having to promise steep SLAs and help desk hours that you know your existing team can’t support now or in the near future? Startups often do need to stretch to deliver, but if your team feels that servicing the elephants will consume the entire company, they’re probably right.


A common misdiagnosis stems from not fully understand or realizing the scope and bandwidth consumption of Elephants. Often, it’s easy for the team to get excited about big deals and they tend to look the other way. Developing and delivering products to Elephants comes with significant overhead, longer sales cycles, lower win rates, and, often, requirements and standards that don’t make a positive impact on the joint outcome, but suck a lot of time and energy from everybody in the room.

Put together KPIs and tools to help you measure the impact elephant hunting has on your Sales and Engineering teams and make data-based decisions.

If your startup is investor-backed, remember that your job is to grow equity value. Revenue, profits and growth are pieces of how equity value is determined. Ask yourself whether the pursuit or even the winning of an elephant will have a meaningful positive impact on equity value given all the positive and negative externalities.

Refactored solutions

Once diagnosed, the refactoring of this anti-pattern very much depends on the set of challenges and opportunities your company has at hand. A few ideas on how to make the most out of Enterprise customers without consuming your entire (small) organization in the process:

  • Try to strike a smaller, multi-phase, deal with the Elephant. That would help both sides build confidence and capabilities to better serve each other.
  • (Artificially) Limit the resources devoted to elephant hunting. Be ruthless about this with your sales and bizdev folks. They’re likely to gravitate towards elephant hunting — these deals tend to be very exciting.
  • Continuously measure and analyze how much your team spends on custom work (especially non-repeatable deals and non-productizable work). It might put a strain on your relationship with the Elephant customer, but good Sales and Customer success teams can help strike a balance and set expectations.
  • Do you have enough slack to sign a deal with an Elephant? One good rule of thumb is assuming that deal will require twice as much resource and time compared to your original expectations. If that’s the case, would you still execute on the deal?

When it could help?

Does this mean you should never try to hunt elephants? No, but it does mean you should think very carefully about it, and be prepared to answer a few questions: 

  1. Where does elephant hunting fit in your sales and growth strategy; near vs. longer term; lower-hanging fruit vs. higher up your sales tree?
  2. How many elephants are there for you to hunt? Is that a real market niche for your business?
  3. Do you have the human resources to hunt and satisfy elephant-sized customers?
  4. Do your sales, engineering and customer success people have the skillsets and experience to satisfy this species of customer? 
  5. Does your CEO have the bandwidth and skill to take down the elephant? This strategy often demands an inordinate amount of the CEO’s time. Which of the CEO’s other responsibilities might suffer?
  6. Does your company have the financial resources to survive and thrive in the face of typically slow decision and purchase cycles? Will investors give you (relatively) cheap cash so that you can wait for the revenue?

For many startups, the transition to spending more time on Elephant hunting is part of the startup journey from childhood to adolescence. If you have good answers to the above questions, a more mature product that is ready to scale, you and your team might be ready to make the move, but tread carefully so you don’t end up being yet another victim on the plains of the Serengeti.

Co-authored with Itamar Novick. More startup anti-patterns here.

Posted in Anti-pattern, startups, Venture Capital | Tagged , , , , , | 2 Comments

More startup anti-patterns

It’s been a decade since I first assembled the list of startup anti-patterns, the repeatable ways startups waste time and money. The project has always been near and dear to my heart, as I often find myself directing entrepreneurs to the list and coaching about specific examples.

I’m partnering with my friend Itamar Novick from Recursive Ventures to add more anti-pattern content similar to what exists about ignorance and platform risk.

Avoiding startup anti-patterns is important. By wasting time and money, they radically increase the chance of failure. The impact of falling for an anti-pattern is especially heavy in an environment where investor cash is again quite expensive.

Posted in Anti-pattern, startups, Venture Capital | Tagged , , , | Leave a comment

Apache Spark native functions

There are many ways to extend Apache Spark and one of the easiest is with functions that manipulate one of more columns in a DataFrame. When considering different Spark function types, it is important to not ignore the full set of options available to developers.

Beyond the two types of functions–simple Spark user-defined functions (UDFs) and functions that operate on Column–described in the previous link, there two more types of UDFs: user-defined aggregate functions (UDAFs) and user-defined table-generating functions (UDTFs). sum() is an example of an aggregate function and explode() is an example of a table-generating function. The former processes many rows to create a single value. The latter uses value(s) from a single row to “generate” many rows. Spark supports UDAFs directly and UDTFs indirectly, by converting them to Generator expressions.

Beyond all types of UDFs, Spark’s most exciting functions are Spark’s native functions, which is how the logic of most of Spark’s Column and SparkSQL functions is implemented. Internally, Spark native functions are nodes in the Expression trees that determine column values. Very loosely-speaking, an Expression is the internal Spark representation for a Column, just like a LogicalPlan is the internal representation of a data transformation (Dataset/DataFrame).

Native functions, while a bit more involved to create, have three fundamental advantages: better user experienceflexibility and performance.

Better user experience & flexibility comes from native functions’ lifecycle having two distinct phases:

  1. Analysis, which happens on the driver, while the transformation DAG is created (before an action is run).
  2. Execution, which happens on executors/workers, while an action is running.

The analysis phase allows Spark native functions to dynamically validate the type of their inputs to produce better error messages and, if necessary, change the type of their result. For example, the return type of sort_array() depends on the input type. If you pass in an array of strings, you’ll get an array of strings. If you pass in an array of ints, you’ll get an array of ints.

A user-defined function, which internally maps to a strongly-typed Scala/JVM function, cannot do this. We can parameterize an implementation by the type of its input, e.g.,

def mySortArray[A: Ordered](arr: Array[A]): Array[A]

but we cannot create type-parameterized UDFs in Spark, requiring hacks such as

spark.udf.register("my_sort_array_int", mySortArray[Int] _)
spark.udf.register("my_sort_array_long", mySortArray[Long] _)

Think of native functions like macros in a traditional programming language. The power of macros also comes from having a lifecycle with two execution phases: compile-time and runtime.

Performance comes from the fact that Spark native functions operate on the internal Spark representation of rows, which, in many cases, avoids serialization/deserialization to “normal” Scala/Java/Python/R datatypes. For example, internally Spark strings are UTF8String. Further, you can choose to implement the runtime behavior of a native function by code-generating Java and participating in whole-stage code generation (reinforcing the macro analogy) or as a simple method.

Working with Spark’s internal (a.k.a., unsafe) datatypes does require careful coding but Spark’s codebase includes many dozens of examples of native functions: essentially, the entire SparkSQL function library. I encourage you to experiment with native Spark function development. As an example, take a look at array_contains().

For user experience, flexibility and performance reasons, at Swoop we have created a number of native Spark functions. We plan on open-sourcing many of them, as well as other tools we have created for improving Spark productivity and performance, via the spark-alchemy library.

Posted in Big data | Tagged , , , , , | 3 Comments

Unicorn pressures and startup failures

The startup anti-patterns section of my blog summarizes the repeatable ways startups waste time & money and, often, fail. Learning from startup failure is valuable because there are many more examples of failures that successes. (Anti-)Patterns become more noticeable and easier to verify.

For the same reason, it’s useful to read the failure post-mortems founders write. It takes meaningful commitment to discover the posts and to distill the key insights from the sometimes lengthy prose (an exercise in therapy at least as much as reporting of the facts). Luckily, there is a shortcut: the CB Insights summary of startup failures. It’s part table of contents and part Cliff Notes. It can help you pick the ones that are worth reading in full.

Some of the insights from post-mortems come from understanding the emotional biases of founders, CXOs and investors. In the uncertain startup execution environment these biases have the ability to affect behavior much more than in situations where reality is inescapable and readily quantifiable.

Speaking of emotional biases, Bill Gurley’s post on the Unicorn pressure cooker now that the magic has worn off is a must.

Posted in angel investing, Anti-pattern, startups, VC, Venture Capital | Tagged , , , , | 1 Comment

Advertising marketplace design

In the past decade several Nobel prizes in Economics have been awarded in the broader area of market (mechanism/auction/game) design. This is not surprising as the combination of Internet connectivity and ample computing resources are causing automated markets to pop up all over. One of biggest and fastest-growing in recent years as been the programmatic advertising market. For example, variations of the Vickrey–Clarke–Groves auction power the Facebook and Google ad exchanges.

When lots of players are lining up to feed at the advertising money troth, it sometimes becomes difficult to separate reality from marketing hype. The programmatic hype is that it brings efficiency to advertising (and does your laundry to boot). The reality is very different. While there are many benefits to programmatic advertising, it also causes and exacerbates many problems in the advertising ecosystem that hurt publishers, advertisers and consumers in the long run. The root cause is that the leading open programmatic protocol–OpenRTBfails to align marketplace interests. This is what happens when adtech optimizes for volume as opposed to quality.

Posted in Advertising, Digital Media | Tagged , , , , , | 1 Comment

Angel investing strategies

My friend Jerry Neumann wrote a great post on angel investing strategies, dissecting truth and myth about different betting strategies and sharing his own approach.

The question of luck came up and a commenter linked to my work on data-driven patterns of successful angel investing with the subtext that being data driven implies index investing. That’s certainly not what I believe or recommend.

The goal of my Monte Carlo analysis was to shine a light on the main flaw I’ve seen in casual angel investing, which is the angel death spiral:

  1. Make a few relatively random investments
  2. Lose money
  3. Become disillusioned
  4. Give up angel investing
  5. Tell all your friends angel investing is terrible

Well, you can’t expect a quick win out of a highly skewed distribution (startup exits are a very skewed distribution). That’s just math and math is rather unemotional about these things.

You can get out of the angel death spiral in one of two ways. You can take the exit distribution for what it is. In that case, you need many more shots on goal (dozens of investments) to ensure a much better outcome. Alternatively, you can try to pick your investment opportunities from a different, better distribution. That’s what I like to do and this is what Jerry is advocating.

The main influencer of return for angel investors is the quality of deal flow that you can win. Why? Because this changes the shape of your personal exit distribution and, in most cases not involving unicorn hunting, improves your outcomes at any portfolio size.

As an investor, you sell cash + you and buy equity. To see better deals and win them you need to increase the value of “you.” After all, anyone’s cash is just as good as everyone else’s. The easiest way to do this is via deep, real, current expertise and relationships that are critical to the success of the companies you want to invest in, backed by a reputation that you are a helpful and easy to work with angel. One way to maximize the chance of this being true is to follow some of Jerry’s advice:

  • Invest in markets that you know
  • Make multiple investments in such markets
  • Help your companies

There is a bootstrap problem, however, when new markets are concerned. How do you get to know them? Well, one way to do it is to make a number of investments in a new space. In this case, your investments have dual value: in addition to the financial return expectations (which should be reduced) you have the benefit of learning. Yes, it can be an expensive way to learn but it may be well worth it when you consider the forward benefits that affect the quality of your deal flow and your ability to win deals.

As an aside, I’ve always advised angels to not invest just for financial return. Do angel investing to increase your overall utility (in the multi-faceted economic theory sense) and do it so that it generates a return you are happy with.

In summary:

  1. Don’t attempt to pick unicorns as an angel.
  2. Where you can get high-quality deal flow you can win, do a smaller number of deals.
  3. Where needed, and if you can afford it, use higher-volume investing as a way to signal interest in a market and to learn about it so that you can get higher-quality deal flow.
Posted in angel investing, VC, Venture Capital | Tagged , , , | 5 Comments

JSON and JSONlines from the command line

At Swoop we have many terabytes of JSON-like data in MongoDB, Redis, ElasticSearch, HDFS/Hadoop and even Amazon Redshift. While the internal representations are typically not JSON but BSON, MsgPack or native encodings, when it comes time to move large amounts of data for easy ad hoc processing I often end up using JSON and its bulk cousin, JSONlines. This post is about what you can quickly do with this type of data from the command line.

The best JSON(lines) command line tools

There has been a marked increase in the number of powerful & robust tools for validating and manipulating JSON and JSONlines from the command line. My favorites are:

  • jq: a blazingly fast, C-based stream processor for JSON documents with an easy yet powerful language. Think of it as sed and awk for JSON but without the 1970s syntax. Simple tasks are trivial. Powerful tasks are possible. The syntax is intuitive. Check out the tutorial and manual. Because of its stream orientation and speed, jq is the most natural fit when processing large amounts of JSONlines data. If you want to push the boundaries of what is sane to do on the command line there are conditionals, variables and UDFs.
  • underscore-cli: this is the Swiss Army knife for manipulating JSON on the command line. Based on Node.js, it supports JavaScript and CoffeeScript expressions with built-in functional programming primitives from the underscore.js library, relatively easy JSON traversal via json:select and more. This also is the best tool for debugging JSON data because of the multitude of output formats. A special plus in my book is that underscore-cli supports MsgPack, which we use in real-time flows and inside memory-constrained caches.
  • jsonpath: Ruby-based implementation of JSONPath with a corresponding command line tool. Speedy it is not but it’s great when you want JSONPath compatibility or can reuse existing expressions. There are some neat features such as pattern-based tree replace operations.
  • json (a.k.a., jsontool): another tool based on Node.js. Not as rich as underscore-cli but has a couple of occasionally useful features having to do with merging and grouping of documents. This tool also has a simple validation-only mode, which is convenient.

Keep in mind that you can modify/extend JSON data with these tools, not just transform it. jsontool can edit documents in place from the command line, something that can be useful for, for example, quickly updating properties in JSON config files.

JSON and 64-bit (BIGINT) numbers

JSON has undefined (as in implementation-specific ) semantics when it comes to dealing with 64-bit integers. The problem stems from the fact that JavaScript does not have this data type. There are Python, Ruby and Java JSON libraries that have no problem with 8-byte integers but I’d be suspicious of any Node.js implementation. If you have this type of data, test the edge cases with your tool of choice.

JSONlines validation & cleanup

There are times when JSONlines data does not come clean. It may include error messages or a mix of STDOUT and STDERR output (something Heroku is notorious for). At those times, it’s good to know how to quickly validate and clean up a large JSONlines file.

To clean up the input, we can use a simple sed incantation that removes all lines that do not begin with [ and {, the start of a JSON array or object. It is hard to think of a bulk export command or script that outputs primitive JSON types. To validate the remaining lines, we can filter through jq and output the type of the root object.

cat data.jsonlines | sed '/^[^[{]/d' > clean_data.jsonlines
cat clean_data.jsonlines | jq 'type' > /dev/null

This will generate output on STDERR with the line & column of any bad JSON.

Pretty printing JSON

Everyone has their favorite way to pretty print JSON. Mine uses the default jq output because it comes in color and because it makes it easy to drill down into the data structure. Let’s use the GitHub API as an example here.

# List of Swoop repos on GitHub
alias swoop_repos="curl $API"

# Pretty print the list of Swoop repos on GitHub in color
swoop_repos | jq '.'

JSON arrays to JSONlines

GitHub gives us an array of repo objects but let’s say we want JSONlines instead in order to prepare the API output for input into MongoDB via mongoimport. The –compact option of jq is perfect for JSONlines output.

# Swoop repos as JSONlines
swoop_repos | jq -c '.[]'

The .[] filter breaks up an array of inputs into individual inputs.

Filtering and selection

Say we want to pull out the full names of Swoop’s own repos as a JSON array. “Own” in this case means not forked.

swoop_repos | jq '[.[] | select(.fork == false) | .full_name]'

Let’s parse this one piece at a time:

  • The wrapping [...] merges any output into an array.
  • You’ve seen .[] already. It breaks up the single array input into many separate inputs, one per repo.
  • The select only outputs those repos that are not forked.
  • The .full_name filter plucks the value of that field from the repo data.

Here is the equivalent using underscore-cli and a json:select expression:

swoop_repos | underscore select \ 
    'object:has(.fork:expr(x=false)) > .full_name'

In both cases we are not saving that much code but not having to create files just keeps things simpler. For comparison, here is the code to output the names of Swoop’s own GitHub repos in Ruby.

require 'open-uri'
require 'json'
API = ''
open(API) do |io|
puts JSON.parse(
reject { |repo| repo['fork'] }.
map { |repo| repo['full_name'] }.

view raw


hosted with ❤ by GitHub

Posted in Code | Tagged , , , , , , , , | 5 Comments

My most favorite math proof ever

Math is beautiful and, sometimes, math becomes even more beautiful with the help of a bit of computer science. My favorite proof of all time combines the two in just such a way.

Goal: prove that the cardinality of the set of positive rational numbers is the same as that of the set of natural numbers.

This is an old problem dating back to Cantor with many proofs:

  • The traditional proof uses a diagonal argument: geometric insight that lays out the numerator and the denominator of a rational number along the x and y axes of a plane. The proof is intuitive but cumbersome to formalize.
  • There is a short but dense proof that uses a Cartesian product mapping and another theorem. Personally, I don’t find simplicity and beauty in referring to complex things.
  • There is a generative proof using a breadth-first traversal of a Calkin-Wilf tree (a.k.a, H tree because of its shape). Now we are getting some help from computer science but not in a way that aids simplicity.

We can do much better.


Given a rational number p/q, write it as the hexadecimal number pAq. QED


  • 0/1 → 0A1 (161 in decimal)
  • ¾ → 3A4 (932 in decimal)
  • 12/5 → 12A5 (4773 in decimal)

Code (because we can):

def to_natural(p, q)

It is trivial to extend the generation to all rationals, not just the positive ones, as long as we require p/q to be in canonical form:

def to_natural(p, q)
 "#{p < 0 ? 'A' : ''}#{p.abs}A#{q}"

To me, this CS-y proof feels much simpler and more accessible than any of the standard math-y proofs. It is generative, reducible to a line of code and does not require knowledge of any advanced concepts beyond number systems which are not base 10, a straight, intuitive extension of base 10 positional arithmetic.

Note: we don’t need to use hexadecimal. The first time I heard this proof it was done in base 11 but I feel that using an unusual base system does not make the proof better.

Posted in Uncategorized | 6 Comments

Monitoring Redis with MONITOR and WireShark

At Swoop we use Redis extensively for caching, message processing and analytics. The Redis documentation can be pithy at times and recently I found myself wanting to look in more depth at the Redis wire protocol. Getting everything set up the right way took some time and, hopefully, this blog post can save you that hassle.


The Redis logs do not include the commands that the database is executing but you can see them via the MONITOR command. As a habit, during development I run redis-cli MONITOR in a terminal window to see what’s going on.

Getting set up with WireShark

While normally we’d use a debugging proxy such as Charles to look at traffic in a Web application, here we need a real network protocol analyzer because Redis uses a TCP-based binary protocol. My go-to tool is WireShark because it is free, powerful and highly customizable (including Lua scriptable). The price for all this is dealing with an X11 interface from the last century and the expectation that you passed your Certified Network Engineer exams with flying colors.

To get going:

  1. WireShark needs X11. Since even Mac OS X stopped shipping X11 by default with Mountain Lion, you’ll most likely want to grab a copy, e.g., XQuartz for OS X or Xming for Windows.
  2. Download and install WireShark.
  3. Start WireShark. If you see nothing, it may be because the app shows as a window associated with the X11 server process. Look for that and you’ll find the main application window.

Redis protocol monitoring

WireShark’s plugin architecture allows it to understand dozens of network protocols. Luckily for us, jzwinck has written a Redis protocol plugin. It doesn’t come with WireShark by default so you’ll need to install it. Run the following:

mkdir ~/.wireshark/plugins && cd ~/.wireshark/plugins && curl -O

view raw

hosted with ❤ by GitHub

If WireShark is running, restart it to pick up the Redis plugin.

Now let’s monitor the traffic to a default Redis installation (port 6379) on your machine. In WireShark, you’ll have to select the loopback interface.

wireshark-startTo reduce the noise, filter capture to TCP packets on port 6379. If you need more sophisticated filtering, consult the docs.


Once you start capture, it’s time to send some Redis commands. I’ll use the Ruby console for that.

1.9.3p392 :001 > r =
=> #<Redis client v3.0.4 for redis://>
1.9.3p392 :002 > r.set("key:5", "\xad\xad")
=> "OK"
1.9.3p392 :003 > r.get("key:5")
=> "\xAD\xAD"

view raw


hosted with ❤ by GitHub

This will generate the following output from the MONITOR command:

1999[~]$ redis-cli MONITOR
1369526925.306016 [0] "set" "key:5" "\xad\xad"
1369526927.497785 [0] "get" "key:5"

In WireShark you’ll be able to see the binary data moving between the client and Redis with the benefit of the command and its parameters clearly visible.


Check out the time between request and response. Redis is fast!

Posted in Software Development | Tagged , , , , , | 2 Comments