The Unbundling of AWS

(I’m using AWS as an example, but this post applies just as much to other cloud providers)

Over the years AWS has grown to dozens of different services,  providing virtual machines,  databases, monitoring and  deployment tools on-demand. Today, it would be considered foolish to manage your own Postgres/MySQL server when you can set up an RDS instance with excellent scalability and availability characteristics in a matter of minutes.

But that’s changing.

Container infrastructure is starting to provide similar abstractions and benefits: One-click deployments, load balancing, auto-scaling, rolling deploys, recovery from failures, data migration, resource usage monitoring, and more. Increasingly, I see companies moving away from cloud provider services in favor of containers and container orchestration platforms. Core services like EC2 and S3 aren’t easily replaced, but others are, and there are good reasons to do so:

  • Costs. AWS prides itself on pay-per-use pricing, but many services aren’t fulfilling that promise. For example, an Elastic Load Balancer costs a fixed $25/month, even if it receives only a few requests. A database that runs only few queries per day (like this blog) also comes with a fixed price tag. Containers are almost free – instead of paying with dollars you pay with the CPU/network resources actually used by the service. Often, that turns out to be cheaper.
  • Features. Many AWS services are based either directly or indirectly on open source projects. But these open source projects are typically more feature-complete than their AWS counterparts. An Elastic Load Balancer has a limited set of features compared to a HAProxy instance; so does AWS Kinesis compared to Apache Kafka. Even services that are running open source software under the hood (such as EMR with Hadoop/Spark) don’t typically support the latest versions.
  • No cloud lock-in: Most container orchestration solutions work across clouds out of the box. This means you can host parts of your infrastructure on AWS, Google Cloud, Azure, DigitalOcean, you’re private cloud, or whatever else is the best fit.
  • Full control: When things don’t work as expected you’re relying on AWS support for help. That can be convenient, but usually it’s faster to debug a problem yourself. That’s only possible with access to the internals of a service, something you don’t have with hosted solutions. What if there’s a simple feature that’d take a small configuration change or a few lines of code to implement? AWS support can’t do that for you. With containers and open source software you can.

Just like Craigslist has been unbundled by purpose-built websites, it seems natural to (not only) me that cloud providers like AWS will be unbundled by purpose-built open source software. In a way that’s ironic because the value proposition of AWS is the exact opposite – bundling  open source software in a centralized place and giving them a consistent look and feel. Up until now we didn’t have the right technology to enable unbundling of PaaS solutions. It’s only recently that container infrastructure and orchestration are becoming mature enough to make this possible.

 

A Brief Guide to the Docker Ecosystem

Few other technologies have penetrated technology companies as rapidly as Docker (or more generally, containers). It seems like the majority of developers and companies are using containers in one way or another. Many use containers to simplify the setup of local development environments, but more and more companies are starting to completely re-architect their infrastructure and deployment processes around containers. In this post I’m hoping to provide a brief overview of the current state of the ecosystem.

Engines / Runtimes

Container Engines are the core piece of Container technology. The engine builds and runs containers, typically based on some declarative description, such as a Dockerfile.  When people talk about Docker, they typically refer to the Docker Engine, and not necessarily the rest of the ecosystem.

  • Docker Engine is the current industry standard and the by far most popular engine.
  • rkt is an open-source initiative to take on the Docker Engine, lead by the CoreOS team.

Cloud Services with built-in Docker support

Cloud providers have been quick in offering solutions to run containers on top of their platforms. Some built solutions in-house, and others rely on open source software. Of course, one could manually install Docker an run containers on any server, but most cloud providers go a step further and provide user interfaces that make managing containers easier.

  • Amazon EC2 Container Service allows running containerized applications on existing EC2 instances. ECS itself is free, you only pay for the EC2 usage.
  • Google Container Engine is built on top of Kubernetes, an open-source container orchestration project started by Google.
  • Azure has announced support for Docker containers on top of Mesos
  • Stackdock provides hosting for Docker containers.
  • Tutum provides hosting for Docker containers.
  • GiantSwarm is a cloud platform for defining and deploying microservice architectures running inside containers.
  • Joyent Triton provides hosting and monitoring for Docker containers.
  • Jelastic Docker provides cloud hosted orchestration for container deployments.

Container Orchestration

Container Orchestration is one of the most contended areas right now. Working with a few containers is easy, but scheduling, managing and monitoring containers at scale is extremely challenging. Container Orchestration software handles a variety of tasks, such as finding the best place/server to run a container, handling failures, sharing storage volumes, and creating load balancers and overlay networks to allow communication between containers.

  • Kubernetes is an open source effort started by Google. Kubernetes is based on Google’s internal container infrastructure, and in terms of features it is the most advanced orchestration platform currently available.
  • Docker Swarm allows scheduling containers on a cluster of Docker hosts. It is tightly integrated with the rest of the Docker ecosystem.
  • Rancher manages application stacks (linked containers) on a cluster of machines. Rancher features an intuitive user interface, excellent documentation, and runs inside a container itself.
  • Mesosphere  is a general purpose datacenter operating system. It was not specifically built for Docker, but it includes primitives that make it easy to run containers, or other orchestration systems like Kubernetes, next to traditional services like Hadoop .
  • CoreOS fleet  is part of the CoreOS operating system and manages the scheduling  of arbitrary commands (such as running Docker/rkt containers) within a CoreOS cluster.
  • Nomad is a general-purpose application scheduler with built-in support for Docker.
  • Centurion is a deployment tool internally used and developed by Newrelic.
  • Flocker assists with data/volume migration among containers running on different hosts.
  • Weave Run provides service discovery, routing, load balancing, and address management for microservice architectures.

Operating Systems

You can run containers on any operating system, but companies are increasingly moving towards containerizing their whole infrastructure. As such, it makes sense to to run a minimal operating systems optimized for Docker and related services.

  • CoreOS is designed for automatic updates and focuses on running containers across cluster of machines. It ships with fleet, a scheduler inspired by systemd, but also supports other orchestration systems.
  • Project Atomic is a lightweight operating system that runs Docker, Kubernetes, rpm and systemd.
  • Rancher OS  is a 20MB Linux distribution that runs the entire operating system within containers. It differentiates between “system containers” and “user containers”, each running in a separate Docker daemon.
  • Project Photon is an open source effort from VMWare.

Container Image Registries

Image Registries are the “Github for container images” and allow you to share container images with your team, or the world.

  • Docker Registry is the most popular open source registry. You can run it on your own infrastructure or use Dockerhub.
  • Dockerhub provides an intuitive UI, automated builds, private repositories, and a large number of official images maintained directly by the authors of the software.
  • Quay.io is a container registry developed by the CoreOS team.
  • CoreOS Enterprise Registry focuses on providing fine-grained permission and audit trails.

Monitoring

Containers write log files that can be ingested into any existing log collection tool. Container monitoring software typically focus on resource usage (CPU, memory) broken down by container.

  • cAdvisor is an open source project by Google. It analyzes resource usage and performance characteristics of running containers and optionally uses InfluxDB as a storage backend for analytics.
  • Datadog Docker is an agent that collects statistics of running Docker containers and sends them to Datadog for further analysis.
  • NewRelic Docker send container statistics to NewRelic’s cloud service.
  • Sysdig can also monitor container resource usage.
  • Weave Scope automatically generates a map of your containers, helping you understand, monitor, and control your applications.
  • AppFormix provides real-time infrastructure monitoring that works with Docker containers.

 

Why SaaS is in trouble

One of my favorite (fictitious) stories is The Dentist Office Software Story by Fred Wilson. The takeaway is that software is a commodity. If a  product is not defensible people will leave when someone builds a better mousetrap. A recent example of this is companies moving from Hipchat to Slack. Slack is the better product, and the cost associated with leaving Hipchat was low. Many people have written about how to create defensibility (e.g. marketplaces, brands and data network effects) so I won’t talk about that. If you’re interested in the topic I recommend reading what Formation 8 writes about platforms as well as things written by USV and Fred Wilson in general.

But there is something about the The Dentist Office Software Story that doesn’t quite ring true. In the last part of the story an open source movement replaces the sexy YC SaaS startup. How often have we actually seen this happen? Not very often. Most industries are still dominated by, often non-defensible, SaaS products. That’s about to change, and the reason are containers.

The main selling points SaaS had over open source software was ease of deployment and automated software updates. Most end-users are not technical. They can sign up, perhaps enter their credit card, and they’re good to go. My mom can do that. Well, deploying software using pre-built containers has the same benefits. Lots is happening in the space and soon running a container will be easier than signing up for a SaaS product. Just imagine that with the click of a button you can deploy an application within your internal company infrastructure (which can still be in the cloud). You don’t need a credit card or create new credentials. But that’s not all. There are other things that containerized open source software has going for it.

You own the data

When you enter data into your favorite SaaS product you essentially give it away. Most SaaS products don’t give data back to you and some may use it to lock you in. Even companies that are not evil and want to provide access to your data may not have the resources, legal ability, or infrastructure to do so. APIs are a good step towards this, but you have no control over what exactly an API provides or when it changes (Hi Linkedin, I’m looking at you). Giving your technical team access to raw data opens up a whole range of possibilities. Migrating to another product, integrating with other services (both internal an external) and running custom analytics and reporting are some of the things that become much easier when you own your data.

Then there’s the issue of data privacy. You may not feel comfortable giving away confidential information to a third party.  At least that’s what I feel when someone asks for access to my email account. Sometimes compliance requirements prevent you from giving data to someone else.

Trust and Transparency

Signing up for a SaaS product means you’re taking a leap of faith. How long will the company be around? What does the product development roadmap look like? What happens to your data? How good will the customer support be?  Getting honest answers to these questions is close to impossible. No SaaS company will mention in their welcome email that they will be out of cash in 2 months. But that’s what I’d like to know. Branding is a way to convince potential customers that answers to these questions are positive. It acts as a proxy between the truth and what companies want potential customers to believe.

With open source there is full transparency. You know how popular a project is, what issues have been filed, and who’s working on what. If the project has one main contributor, no updates within 2 months, and no commercial backers you can be pretty confident that you’re entering unstable territory. With open source you can make decisions based on facts, not assumptions. Communities around open source software also tend to be much stronger than those around commercial products.

End of the SaaS management nightmare

Have you worked with a company that uses and manages dozens of different SaaS services? These days that seems to be the norm rather than the exception. Connecting these services, integrating data, managing who has access to what and keeping track of all of it is a nightmare. Yes, companies like IFTTT allow you to connect products, but that’s just adding to the overall complexity.

Containers have the ability to solve this problem from the ground up. All products hosted within your infrastructure can go through the same authentication layer and have the same login. They all write to and read from the same central data repository. Think of your own internal Google Apps, but with all apps you can imagine.

The road ahead

In the future many more businesses will be deploying their software internally. As container management becomes simpler it will become accessible to non-technical users. This is almost ironical considering that the “new” model looks a lot like  old model in the Oracle days before SaaS came around.

That doesn’t mean cloud services will go away, just that their main offering won’t be software. They may become hubs of data offering value-added services that rely on scale. For example, by connecting your own data to a cloud service you may receive additional features (like recommendations) that require access to the combined data of many users. Many SaaS companies will need to rethink their business models, or the last part of the Dentist Office Software story will come true in many industries.

For entrepreneurs this is a great time to start an open source movement that challenges one of the big SaaS companies out there. Open Source adoption has never been so easy.