Deep Learning Startups, Applications and Acquisitions – A Summary

Most major tech companies are use Deep Learning techniques in one way or another, and many have new initiatives on the way. Self-driving cars use Deep Learning to model their environment. Siri, Cortana and Google Now use it for speech recognition, Facebook for facial recognition, and Skype for real-time translation.

Naturally there are a lot of startups doing cool things in the space. I tried to do my best to categorize the companies below based on where their main focus seems to be. If you’re a Deep Learning company and I forgot you, please do let me know!

General / Infrastructure

Because Deep Learning is such a generic approach, some companies are focusing on creating infrastructure, algorithms, and tools that can be applied across a variety of domains.

DeepMind, which was acquired by Google for more than $500M in 2014, is working on general-purpose AI algorithms using a combination of Deep Learning and Reinforcement Learning. DeepMind is the company behind an algorithm that learns to play Atari games better than humans. It is a largely a research company and does not provide products for use by businesses or consumers.

MetaMind focuses on providing cutting-edge performance for image and natural language classification tasks. Richard Socher, the founder of Metamind, is very active in the academic community and teaches Stanford’s Deep Learning for Natural Language Processing class. The company offers a cloud service to train Deep Learning classifiers.

Nervana is the company behind the open source Python-based neon framework, a GPU-optimized library to build Deep Learning architectures. Nervana also provides a cloud services where it runs algorithms on proprietary hardware specifically designed for Deep Learning. Nervana raised $20.5M in a June 2015 round led by Data Collective.

Skymind is the company behind the Deeplearning4j framework. Deeplearning4j makes efficient use of GPUs and integrates with distributed systems such as Hadoop and Spark to scale to large data sets. Skymind sells an enterprise editions of its software together with training and support.

Ersatz Labs offers a cloud service to manage data and train Deep Learning models through a web interface (video) or an API. Pricing is based on minutes of GPU time used.

Computer Vision

It would be fair to say that Deep Learning gained most of its popularity through excellent performance on a variety of computer visions tasks: Recognizing objects in images, understanding scenes, and finding semantically similar images. Convolutional Neural Networks (CNNs), a popular type of Deep Learning architecture, are now considered the standard for most of the above. The rapid success of Deep Learning in Computer Vision has spurred a lot of startup activity.

Madbits was acquired by Twitter in 2014  before it got a chance to launch publicly. In its own words, it “built visual intelligence technology that automatically understands, organizes and extracts relevant information from raw media (images)”.

Perceptio was acquired by Apple In October 2015 while still in stealth mode. The website was shut down after the acquisition, but Perceptio seems to have been developing technology to run image-classifications algorithms on smartphones.

Lookflow was acquired by Yahoo/Flickr in October 2013. It’s unclear what exactly Lookflow was offering, but it was using Deep Learning algorithms for image classifications to help organize photos.

HyperVerge builds technology for a range of visual recognition tasks, including facial recognition, scene recognition, and image similarity search. HyperVerge is also working on a smart photo organization app called Silver. The company came out of  IIT and raised a $1M seed round from NEA in August 2015.

Deepomatic builds object recognition technology to identify products (e.g. shoes) in images, which can then be monetized through e-commerce links. It focuses on the fashion vertical and has raised $1.4M from Alven Capital (a French VC) and Angels in September 2015.

Descartes Labs focuses on understanding large datasets of images, such as satellite images. An example use case is tracking agriculture development across the country. Descartes Labs came out of the Los Alamos National Laboratory and has raised $3.3M of funding to date.

Clarifai uses CNNs to provide an API for image and video tagging. In April 2015, Clarifai raised a $10M Series A led by USV.

Tractable trains image classifiers to automate inspection tasks currently done by humans, for example detecting cracks on industrial pipes or inspecting cars.

Affectiva classifies emotional reactions based on facial images. It raised $12 million in Series C funding Horizon Ventures and Mary Meeker and Kleiner Perkins in 2012.

Alpaca is the company behind Labellio, a cloud service to build your own deep learning image classifier using a graphical interface.

Orbital Insight uses Deep Learning to analyze satellite imagery and understand global and national trends.

Natural Language

After the rapid success in Computer Vision, researchers were quick in adopting Deep Learning techniques for Natural Language Processing (NLP) tasks. In fact, the exact same algorithm that categorizes images can be used to analyze text. Since then, new Deep Learning techniques specifically for NLP have been developed, and are being applied to tasks such as categorizing text, finding content themes, analyzing sentiment, recognizing entities, or answering free-form questions.

AlchemyAPI was acquired by IBM (Watson group) in March 2015. It provides a range of  Natural Language Processing APIs, including Sentiment Analysis, Entity Extraction and Concept Tagging.  (AlchemyAPI also provides computer vision APIs, but their primary product seems to be language-related so I decided to put them in this category).

VocalIQ was working on a conversational voice-dialog system before being acquired by Apple in October 2015.

Idibon develops general-purpose NLP algorithms that can be applied to any language. Idibon’s public API does Sentiment Analysis for English, but more languages, and  support for Named Entity Recognition are coming soon. Idibon raised a $5.5M Series A led by Altpoin, Khosla, and Morningside Ventures in October 2014.

Indico provides a variety of Natural Language APIs based on Deep Learning models. APIs include Text Tagging, Sentiment Analysis, Language Prediction, and Political Alignment Prediction.

Semantria provides APIs and Excel plugins to perform various NLP tasks in 10+ languages. Pricing starts at $1,000/month for both Excel plugins and API access. Lexalytics, an on-premise NLP platform, acquired Semantria in 2014.

ParallelDots provides APIs for Semantic Proximity, Entity Extraction, Taxonomy Classification and Sentiment Analysis, as well as tools for social media analytics and automated timeline construction. 

Xyggy  is a search engine for all data types (text and non-text) represented by deep-learning vectors. With text for example, a search can be with keywords, snippets or entire documents to find documents with similar meaning.

Vertical-Specific

Instead of focusing on general-purpose vision or language applications, some companies are applying Deep Learning techniques to specific verticals. My research surfaced mostly Healthcare companies, but It’s likely that many others are using Deep Learning without explicitly mentioning in on their website.

Enlitic applies deep learning techniques to medical diagnostics. By classifying x-rays, MRIs and CT scans, Enlitic can recognize early signs of cancer more accurately than humans. The company raised $3M from undisclosed investors in February 2015.

Quantified Skin uses selfies to track and analyze a person’s skin and recommends beneficial products and activities. The company raised a total of $280k in 3 rounds.

Deep Genomics uses Deep Learning to classify and interpret genetic variants. Its first product is SPIDEX, a dataset of genetic variants and their predicted effects.

StocksNeural uses Recurrent Neural Networks to predict stock prices based on historical time-series data.

Analytical Flavor Systems use Deep Learning to understand what people taste and optimize food and beverage production.

Artelnics builds open source libraries and graphical users interfaces to train Deep Learning models for a variety of industries.

 

Are there any Deep Learning startups I missed? I’d love to hear about them in the comments.

 

The Unbundling of AWS

(I’m using AWS as an example, but this post applies just as much to other cloud providers)

Over the years AWS has grown to dozens of different services,  providing virtual machines,  databases, monitoring and  deployment tools on-demand. Today, it would be considered foolish to manage your own Postgres/MySQL server when you can set up an RDS instance with excellent scalability and availability characteristics in a matter of minutes.

But that’s changing.

Container infrastructure is starting to provide similar abstractions and benefits: One-click deployments, load balancing, auto-scaling, rolling deploys, recovery from failures, data migration, resource usage monitoring, and more. Increasingly, I see companies moving away from cloud provider services in favor of containers and container orchestration platforms. Core services like EC2 and S3 aren’t easily replaced, but others are, and there are good reasons to do so:

  • Costs. AWS prides itself on pay-per-use pricing, but many services aren’t fulfilling that promise. For example, an Elastic Load Balancer costs a fixed $25/month, even if it receives only a few requests. A database that runs only few queries per day (like this blog) also comes with a fixed price tag. Containers are almost free – instead of paying with dollars you pay with the CPU/network resources actually used by the service. Often, that turns out to be cheaper.
  • Features. Many AWS services are based either directly or indirectly on open source projects. But these open source projects are typically more feature-complete than their AWS counterparts. An Elastic Load Balancer has a limited set of features compared to a HAProxy instance; so does AWS Kinesis compared to Apache Kafka. Even services that are running open source software under the hood (such as EMR with Hadoop/Spark) don’t typically support the latest versions.
  • No cloud lock-in: Most container orchestration solutions work across clouds out of the box. This means you can host parts of your infrastructure on AWS, Google Cloud, Azure, DigitalOcean, you’re private cloud, or whatever else is the best fit.
  • Full control: When things don’t work as expected you’re relying on AWS support for help. That can be convenient, but usually it’s faster to debug a problem yourself. That’s only possible with access to the internals of a service, something you don’t have with hosted solutions. What if there’s a simple feature that’d take a small configuration change or a few lines of code to implement? AWS support can’t do that for you. With containers and open source software you can.

Just like Craigslist has been unbundled by purpose-built websites, it seems natural to (not only) me that cloud providers like AWS will be unbundled by purpose-built open source software. In a way that’s ironic because the value proposition of AWS is the exact opposite – bundling  open source software in a centralized place and giving them a consistent look and feel. Up until now we didn’t have the right technology to enable unbundling of PaaS solutions. It’s only recently that container infrastructure and orchestration are becoming mature enough to make this possible.

 

A Brief Guide to the Docker Ecosystem

Few other technologies have penetrated technology companies as rapidly as Docker (or more generally, containers). It seems like the majority of developers and companies are using containers in one way or another. Many use containers to simplify the setup of local development environments, but more and more companies are starting to completely re-architect their infrastructure and deployment processes around containers. In this post I’m hoping to provide a brief overview of the current state of the ecosystem.

Engines / Runtimes

Container Engines are the core piece of Container technology. The engine builds and runs containers, typically based on some declarative description, such as a Dockerfile.  When people talk about Docker, they typically refer to the Docker Engine, and not necessarily the rest of the ecosystem.

  • Docker Engine is the current industry standard and the by far most popular engine.
  • rkt is an open-source initiative to take on the Docker Engine, lead by the CoreOS team.

Cloud Services with built-in Docker support

Cloud providers have been quick in offering solutions to run containers on top of their platforms. Some built solutions in-house, and others rely on open source software. Of course, one could manually install Docker an run containers on any server, but most cloud providers go a step further and provide user interfaces that make managing containers easier.

  • Amazon EC2 Container Service allows running containerized applications on existing EC2 instances. ECS itself is free, you only pay for the EC2 usage.
  • Google Container Engine is built on top of Kubernetes, an open-source container orchestration project started by Google.
  • Azure has announced support for Docker containers on top of Mesos
  • Stackdock provides hosting for Docker containers.
  • Tutum provides hosting for Docker containers.
  • GiantSwarm is a cloud platform for defining and deploying microservice architectures running inside containers.
  • Joyent Triton provides hosting and monitoring for Docker containers.
  • Jelastic Docker provides cloud hosted orchestration for container deployments.

Container Orchestration

Container Orchestration is one of the most contended areas right now. Working with a few containers is easy, but scheduling, managing and monitoring containers at scale is extremely challenging. Container Orchestration software handles a variety of tasks, such as finding the best place/server to run a container, handling failures, sharing storage volumes, and creating load balancers and overlay networks to allow communication between containers.

  • Kubernetes is an open source effort started by Google. Kubernetes is based on Google’s internal container infrastructure, and in terms of features it is the most advanced orchestration platform currently available.
  • Docker Swarm allows scheduling containers on a cluster of Docker hosts. It is tightly integrated with the rest of the Docker ecosystem.
  • Rancher manages application stacks (linked containers) on a cluster of machines. Rancher features an intuitive user interface, excellent documentation, and runs inside a container itself.
  • Mesosphere  is a general purpose datacenter operating system. It was not specifically built for Docker, but it includes primitives that make it easy to run containers, or other orchestration systems like Kubernetes, next to traditional services like Hadoop .
  • CoreOS fleet  is part of the CoreOS operating system and manages the scheduling  of arbitrary commands (such as running Docker/rkt containers) within a CoreOS cluster.
  • Nomad is a general-purpose application scheduler with built-in support for Docker.
  • Centurion is a deployment tool internally used and developed by Newrelic.
  • Flocker assists with data/volume migration among containers running on different hosts.
  • Weave Run provides service discovery, routing, load balancing, and address management for microservice architectures.

Operating Systems

You can run containers on any operating system, but companies are increasingly moving towards containerizing their whole infrastructure. As such, it makes sense to to run a minimal operating systems optimized for Docker and related services.

  • CoreOS is designed for automatic updates and focuses on running containers across cluster of machines. It ships with fleet, a scheduler inspired by systemd, but also supports other orchestration systems.
  • Project Atomic is a lightweight operating system that runs Docker, Kubernetes, rpm and systemd.
  • Rancher OS  is a 20MB Linux distribution that runs the entire operating system within containers. It differentiates between “system containers” and “user containers”, each running in a separate Docker daemon.
  • Project Photon is an open source effort from VMWare.

Container Image Registries

Image Registries are the “Github for container images” and allow you to share container images with your team, or the world.

  • Docker Registry is the most popular open source registry. You can run it on your own infrastructure or use Dockerhub.
  • Dockerhub provides an intuitive UI, automated builds, private repositories, and a large number of official images maintained directly by the authors of the software.
  • Quay.io is a container registry developed by the CoreOS team.
  • CoreOS Enterprise Registry focuses on providing fine-grained permission and audit trails.

Monitoring

Containers write log files that can be ingested into any existing log collection tool. Container monitoring software typically focus on resource usage (CPU, memory) broken down by container.

  • cAdvisor is an open source project by Google. It analyzes resource usage and performance characteristics of running containers and optionally uses InfluxDB as a storage backend for analytics.
  • Datadog Docker is an agent that collects statistics of running Docker containers and sends them to Datadog for further analysis.
  • NewRelic Docker send container statistics to NewRelic’s cloud service.
  • Sysdig can also monitor container resource usage.
  • Weave Scope automatically generates a map of your containers, helping you understand, monitor, and control your applications.
  • AppFormix provides real-time infrastructure monitoring that works with Docker containers.