Engineering is the bottleneck in (Deep Learning) Research

Warning: This a rant post containing a bunch of unorganized thoughts.

When I was in graduate school working on NLP and information extraction I spent most of my time coding up research ideas. That’s what grad students with advisors who don’t like to touch code, which are probably 95% of all advisors, tend to do. When I raised concerns about problems I would often hear the phrase “that’s just an engineering problem; let’s move on”. I later realized that’s code speech for “I don’t think a paper mentioning this would get through the peer review process”. This mindset seems pervasive among people in academia. But as an engineer I can’t help but notice how the lack of engineering practices is holding us back.

I will use the Deep Learning community as an example, because that’s what I’m familiar with, but this probably applies to other communities as well. As a community of researchers we all share a common goal: Move the field forward. Push the state of the art. There are various ways to do this, but the most common one is to publish research papers. The vast majority of published papers are incremental, and I don’t mean this in a degrading fashion. I believe that research is incremental by definition, which is just another way of saying that new work builds upon what other’s have done in the past. And that’s how it should be. To make this concrete, the majority of the papers I come across consist of more than 90% existing work, which includes datasets, preprocessing techniques, evaluation metrics, baseline model architectures, and so on. The authors then typically add a bit novelty and show improvement over well-established baselines.

So far nothing is wrong with this. The problem is not the process itself, but how it is implemented. There are two issues that stand out to me, both of which can be solved with “just engineering.” 1. Waste of research time and 2. Lack of rigor and reproducibility. Let’s look at each of them.

Waste of research time (difficulty of building on other’s work)

Researchers are highly trained professionals. Many have spent years or decades getting PhDs and becoming experts in their respective fields. It only makes sense that those people should spend the majority of their time doing what they’re good at – innovating by coming up with novel techniques. Just like you wouldn’t want a highly trained surgeon spending several hours a day inputting patient data from paper forms. But that’s pretty much what’s happening.

In an ideal world, a researcher with an idea could easily build on top of what has already been done (the 90% I mention above) and have 10% of work left to do in order to test his or her hypothesis. (I realize there are exceptions to this if you’re doing something truly novel, but the majority of published research falls into this category). In practice, quite the opposite is happening. Researchers spend weeks re-doing data pre- and post-processing and re-implementing and debugging baseline models. This often includes tracking down authors of related papers to figure out what tricks were used to make it work at all. Papers tend to not mention the fine print because that would make the results look less impressive. In the process of doing this, researchers introduce dozens of confounding variables, which essentially make the comparisons meaningless. But more on that later.

What I realized is that the difficulty of building upon other’s work is a major factor in determining what research is being done. The majority of researchers build on top of their own research, over and over again. Of course, one may argue that this is a result becoming an expert in some very specific subfield, so it only makes sense to continue focusing on similar problems. While no completely untrue, I don’t think that’s what’s happening (especially in Deep Learning, where many subfields are so closely related that knowledge transfers over pretty nicely). I believe the main reason for this is that it’s easiest, from an experimental perspective, to build upon one’s own work. It leads to more publications and faster turnaround time. Baselines are already implemented in familiar code, evaluation is setup, related work is written up, and so on. It also leads to less competition – nobody else has access to your experimental setup and can easily compete with you. If it were just as easy to build upon somebody else’s work we would probably see more diversity in published research.

It’s not all bad news though. There certainly are a few trends going into the right direction. Publishing code is becoming more common. Software packages like OpenAI’s gym (and Universe), ensure that at least evaluation and datasets are streamlined. Tensorflow and other Deep Learning frameworks remove a lot of potential confounding variables by implementing low-level primitives. With that being said, we’re still a far cry from where we could be. Just imagine how efficient research could be if we had standardized frameworks, standard repositories of data, well-documented standard code bases and coding styles to build upon, and strict automated evaluation frameworks and entities operating on exactly the same datasets. From an engineering perspective all of these are simple things – but they could have a huge impact.

I think we’re under-appreciating the fact that we’re dealing with pure software. That sounds obvious, but it’s actually a big deal. Setting up tightly controlled experiments in fields like medicine or psychology is almost impossible and involves an extraordinary amount of work. With software it’s essentially free. It’s more unique than most of us realize. But we’re just not doing it. I believe one reason for why these changes (and many others) are not happening is a misalignment of incentives. Truth the told, most researchers care more about their publications, citations, and tenure tracks than about actually driving the field forward. They are happy with a status quo that favors them.

Lack of rigor

The second problem is closely related to the first. I hinted at it above. It’s a lack of rigor and reproducibility. In an ideal world, a researcher could hold constant all irrelevant variables, implement a new technique, and then show some improvement over a range of baselines within some margin of significance. Sounds obvious? Well, if you happen to read a lot Deep Learning papers this sounds like it’s coming straight from a sci-fi movie.

In practice, as everyone re-implements techniques using different frameworks and pipelines, comparisons become meaningless. In almost every Deep Learning model implementation there exist a huge number “hidden variables” that can affect results. These include non-obvious model hyperparameters baked into the code, data shuffle seeds, variable initializers, and other things that are typically not mentioned in papers, but clearly affect final measurements. As you re-implement your LSTM, use a different framework, pre-process your own data, and write thousands of lines of code, how many confounding variables will you have created? My guess is that it’s in the hundreds to thousands. If you then show a 0.5% marginal improvement over some baseline models (with numbers usually taken from past papers and not even averaged across multiple runs) how can you ever prove causality? How do you know it’s not a result of some combination of confounding variables?

Personally, I do not trust paper results at all. I tend to read papers for inspiration – I look at the ideas, not at the results. This isn’t how it should be. What if all researchers published code? Wouldn’t that solve the problem? Actually, no. Putting your 10,000 lines of undocumented code on Github and saying “here, run this command to reproduce my number” is not the same as producing code that people will read, understand, verify, and build upon. It’s like Shinichi Mochizuki’s proof of the ABC Conjecture, producing something that nobody except you understands.

Again, “just engineering” has the potential to solve this. The solution is pretty much equivalent to problem #1 (standard code, datasets, evaluation entities, etc), but so are the problems. In fact, it may not even be in the best interest of researchers to publish readable code. What if people found bugs in it and you need to retract your paper? Publishing code is risky, without a clear upside other than PR for whatever entity you work for.

/ END OF RANT

Talking to Machines – The Rise of Conversational Interfaces and NLP

Depending on where you’ve lived you’re probably using at least one of  WhatsApp, WeChat, LINE, or Kakaotalk. At first glance, these apps look like communication tools. They allow you talk to your friends, family, coworkers and business partners. Sometimes all at once. But messenger apps are just the beginning of a much larger trend. Pretty soon, you’ll use such apps not to talk to your friends, but to talk to machines. Chat applications will become a new interface used to consume information and services. In fact, many companies have started to move into this direction. We’ll get to that later, but bear with me for a little while.

Let me clarify what I mean by interface first. For the purpose of this post, an interface is a layer of technology that facilitates the exchange of information and services between humans and machines. The perhaps most ubiquitous interface today is the web browser. We’ve invented several layers of complex technologies: HTTP, HTML, CSS, Javascript, and many others, to make the web browser interface work well.  The same is true for mobile apps, which share a lot of the same principles. Some people argue that a browsers and mobile apps are fundamentally different, but I disagree. There are subtle differences, but you’re essentially looking at a screen and clicking or touching stuff. They’re quite similar. In both cases, we use them to communicate with other people (chat apps), find information (Google), and consume services (Amazon).

By design, all interfaces impose constraints on their users. Some constraints are good – they force us to focus on the essential, making us more efficient. But Some are bottlenecks to our productivity, preventing us from doing what feels natural to us. Take search as an example. Humans are quite good at asking questions. We’ve been doing that for thousands of years. But can you have a conversation with Google to get the information you need? No, you can’t. (Okay, except for a tiny fraction of very simple and commonly asked questions). Instead, we’ve adjusted to the interface that Google is providing us. We’re entering keywords that we think Google will be able to efficiently utilize to give us what we want. Most of us are doing this unconsciously already because we’ve been conditioned to do so. But if watch your mom, who didn’t grow up with Google, you’ll see that she has a hard time wrapping her head around what “good” keywords are, and why she can’t find what she needs. She also can’t tell spam for non-spam. Keyword search doesn’t come naturally to her.

Now, there’s an alternative interface that we use to communicate, get information and consume services almost every day: Natural Language. Natural Language is nothing more than the transmission of information, or meaning, through a medium such as sound or script. That’s also why I won’t make a distinction between speech and text-based solution in this post. They both have their place, but they embody the same thing. Language is an interface that, right now, mostly facilitates communication between humans. But you could imagine using this same interface to facilitate communication between humans and machines. Instead of putting keywords into Google you’d ask a question, potentially have a quick conversation, and get a response. Instead of clicking through a dozen app screens you’d just say what food you want to order, or tell your car where you want to go.

Just like other interfaces, the natural language interface has constraints. It isn’t suited for everything. Obviously it’s bad for visual tasks – shopping clothes for example. It also doesn’t seem so great for discovery, like browsing items on Amazon without a clear goal. But you can certainly imagine use cases where natural language excels. Asking questions to get information or conveying tasks with a clear intention – ordering a specific food item, cab, or giving directions for example. Natural Language places a low cognitive load on our brains because we’re so used to it. Hence, it feels more natural and effortless than using an app or browser.

Of course, Facebook (M), Google (Now), Apple (Siri), Microsoft (Cortana) and Amazon (Echo) have been working on natural languages interface for a while now. They’re calling it personal assistants (PAs). However, there’s another group of companies who are attacking the same problem from a different angle, but we’ve been ignoring that part of their business.  Messenger apps. These platforms are uniquely positioned to enable communication with machines. The leader of the pack is probably WeChat, which allows you to order everything from food to taxis from within the app, using a messaging as the interface. Slack is moving into the same direction by encouraging the creation of various botsLINE also has a range of robotic accounts, ranging from translation services to erotic entertainment.

Why do I think these two groups of companies are working on the same problem? PAs follow what I would call a top-down approach. They’re trying to solve a challenging problem with a long history in AI: Understanding your intentions and acting on them. They are trying to build general purpose software that is smart enough to also fulfill specific tasks. It’s a research area with lots of stuff left to figure out. Messenger apps follow a bottom-up approach. They are starting with simple bots (“Hi, translate this sentence for me”), solving very specific problems, and are gradually moving towards more sophisticated AI. Over time, these two approaches will converge.

When it comes to the adoption of natural language interfaces, messengers apps may actually have a leg up. Here’s why. Talking to Siri on a crowded train still feels awkward, even in SF. It’s also annoying because you have to open yet another app. However, many of us are spending our time in chat apps anyway, so it feels completely natural to just add conversation with a bot. Ditto for Slack. The transition for consumers seems so much smoother. As conversational interfaces penetrate our world, voice interfaces will start feeling just as natural as taking selfies (and hopefully more than selfie sticks). But another reason for why messenger apps are well positioned is data. These companies have collected huge amounts of conversational data that can be used to train better Natural Language models. Facebook has that too, both with their messenger and with WhatsApp. Microsoft doesn’t. Amazon doesn’t. Apple may or may not, depending on how secure iMessage really is.

If you’ve actually used any of the PAs you may be skeptical. Siri still barely understands what you want, and Facebook has put hordes of human workers behind M to get it to do anything useful. How will these things ever replace all the complex tasks we’re doing apps and browsers? That’s another reason for why the bottom-up approach seems promising. It yields immediate benefits and practical applications.

But I also believe that over the coming years we will see rapid improvements in conversational interfaces. There are clear enabling technologies for this trend: Natural Language Processing (NLP) and Deep Learning. Deep Learning techniques have led to breakthroughs in Computer Vision, and they are now penetrating natural language research. Whether or not Deep Learning itself will lead to breakthroughs in NLP is hard to say, but one thing is clearly happening: Many smart people who didn’t previously focus on NLP are now seeing the potential and are starting working on NLP problems, and we can surely expect something to come out of this.

Deep Learning Startups, Applications and Acquisitions – A Summary

Most major tech companies are use Deep Learning techniques in one way or another, and many have new initiatives on the way. Self-driving cars use Deep Learning to model their environment. Siri, Cortana and Google Now use it for speech recognition, Facebook for facial recognition, and Skype for real-time translation.

Naturally there are a lot of startups doing cool things in the space. I tried to do my best to categorize the companies below based on where their main focus seems to be. If you’re a Deep Learning company and I forgot you, please do let me know!

General / Infrastructure

Because Deep Learning is such a generic approach, some companies are focusing on creating infrastructure, algorithms, and tools that can be applied across a variety of domains.

DeepMind, which was acquired by Google for more than $500M in 2014, is working on general-purpose AI algorithms using a combination of Deep Learning and Reinforcement Learning. DeepMind is the company behind an algorithm that learns to play Atari games better than humans. It is a largely a research company and does not provide products for use by businesses or consumers. MetaMind focuses on providing cutting-edge performance for image and natural language classification tasks. Richard Socher, the founder of Metamind, is very active in the academic community and teaches Stanford’s Deep Learning for Natural Language Processing class. The company offers a cloud service to train Deep Learning classifiers. Nervana is the company behind the open source Python-based neon framework, a GPU-optimized library to build Deep Learning architectures. Nervana also provides a cloud services where it runs algorithms on proprietary hardware specifically designed for Deep Learning. Nervana raised$20.5M in a June 2015 round led by Data Collective.

Skymind is the company behind the Deeplearning4j framework. Deeplearning4j makes efficient use of GPUs and integrates with distributed systems such as Hadoop and Spark to scale to large data sets. Skymind sells an enterprise editions of its software together with training and support.

Ersatz Labs offers a cloud service to manage data and train Deep Learning models through a web interface (video) or an API. Pricing is based on minutes of GPU time used.

Computer Vision

It would be fair to say that Deep Learning gained most of its popularity through excellent performance on a variety of computer visions tasks: Recognizing objects in images, understanding scenes, and finding semantically similar images. Convolutional Neural Networks (CNNs), a popular type of Deep Learning architecture, are now considered the standard for most of the above. The rapid success of Deep Learning in Computer Vision has spurred a lot of startup activity.

Madbits was acquired by Twitter in 2014  before it got a chance to launch publicly. In its own words, it “built visual intelligence technology that automatically understands, organizes and extracts relevant information from raw media (images)”.

Perceptio was acquired by Apple In October 2015 while still in stealth mode. The website was shut down after the acquisition, but Perceptio seems to have been developing technology to run image-classifications algorithms on smartphones.

Lookflow was acquired by Yahoo/Flickr in October 2013. It’s unclear what exactly Lookflow was offering, but it was using Deep Learning algorithms for image classifications to help organize photos.

HyperVerge builds technology for a range of visual recognition tasks, including facial recognition, scene recognition, and image similarity search. HyperVerge is also working on a smart photo organization app called Silver. The company came out of  IIT and raised a $1M seed round from NEA in August 2015. Deepomatic builds object recognition technology to identify products (e.g. shoes) in images, which can then be monetized through e-commerce links. It focuses on the fashion vertical and has raised$1.4M from Alven Capital (a French VC) and Angels in September 2015.

Descartes Labs focuses on understanding large datasets of images, such as satellite images. An example use case is tracking agriculture development across the country. Descartes Labs came out of the Los Alamos National Laboratory and has raised $3.3M of funding to date. Clarifai uses CNNs to provide an API for image and video tagging. In April 2015, Clarifai raised a$10M Series A led by USV.

Tractable trains image classifiers to automate inspection tasks currently done by humans, for example detecting cracks on industrial pipes or inspecting cars.

Affectiva classifies emotional reactions based on facial images. It raised $12 million in Series C funding Horizon Ventures and Mary Meeker and Kleiner Perkins in 2012. Alpaca is the company behind Labellio, a cloud service to build your own deep learning image classifier using a graphical interface. Orbital Insight uses Deep Learning to analyze satellite imagery and understand global and national trends. Natural Language After the rapid success in Computer Vision, researchers were quick in adopting Deep Learning techniques for Natural Language Processing (NLP) tasks. In fact, the exact same algorithm that categorizes images can be used to analyze text. Since then, new Deep Learning techniques specifically for NLP have been developed, and are being applied to tasks such as categorizing text, finding content themes, analyzing sentiment, recognizing entities, or answering free-form questions. AlchemyAPI was acquired by IBM (Watson group) in March 2015. It provides a range of Natural Language Processing APIs, including Sentiment Analysis, Entity Extraction and Concept Tagging. (AlchemyAPI also provides computer vision APIs, but their primary product seems to be language-related so I decided to put them in this category). VocalIQ was working on a conversational voice-dialog system before being acquired by Apple in October 2015. Idibon develops general-purpose NLP algorithms that can be applied to any language. Idibon’s public API does Sentiment Analysis for English, but more languages, and support for Named Entity Recognition are coming soon. Idibon raised a$5.5M Series A led by Altpoin, Khosla, and Morningside Ventures in October 2014.

Indico provides a variety of Natural Language APIs based on Deep Learning models. APIs include Text Tagging, Sentiment Analysis, Language Prediction, and Political Alignment Prediction.

Quantified Skin uses selfies to track and analyze a person’s skin and recommends beneficial products and activities. The company raised a total of $280k in 3 rounds. Deep Genomics uses Deep Learning to classify and interpret genetic variants. Its first product is SPIDEX, a dataset of genetic variants and their predicted effects. StocksNeural uses Recurrent Neural Networks to predict stock prices based on historical time-series data. Analytical Flavor Systems use Deep Learning to understand what people taste and optimize food and beverage production. Artelnics builds open source libraries and graphical users interfaces to train Deep Learning models for a variety of industries. Are there any Deep Learning startups I missed? I’d love to hear about them in the comments. The Unbundling of AWS (I’m using AWS as an example, but this post applies just as much to other cloud providers) Over the years AWS has grown to dozens of different services, providing virtual machines, databases, monitoring and deployment tools on-demand. Today, it would be considered foolish to manage your own Postgres/MySQL server when you can set up an RDS instance with excellent scalability and availability characteristics in a matter of minutes. But that’s changing. Container infrastructure is starting to provide similar abstractions and benefits: One-click deployments, load balancing, auto-scaling, rolling deploys, recovery from failures, data migration, resource usage monitoring, and more. Increasingly, I see companies moving away from cloud provider services in favor of containers and container orchestration platforms. Core services like EC2 and S3 aren’t easily replaced, but others are, and there are good reasons to do so: • Costs. AWS prides itself on pay-per-use pricing, but many services aren’t fulfilling that promise. For example, an Elastic Load Balancer costs a fixed$25/month, even if it receives only a few requests. A database that runs only few queries per day (like this blog) also comes with a fixed price tag. Containers are almost free – instead of paying with dollars you pay with the CPU/network resources actually used by the service. Often, that turns out to be cheaper.
• Features. Many AWS services are based either directly or indirectly on open source projects. But these open source projects are typically more feature-complete than their AWS counterparts. An Elastic Load Balancer has a limited set of features compared to a HAProxy instance; so does AWS Kinesis compared to Apache Kafka. Even services that are running open source software under the hood (such as EMR with Hadoop/Spark) don’t typically support the latest versions.
• No cloud lock-in: Most container orchestration solutions work across clouds out of the box. This means you can host parts of your infrastructure on AWS, Google Cloud, Azure, DigitalOcean, you’re private cloud, or whatever else is the best fit.
• Full control: When things don’t work as expected you’re relying on AWS support for help. That can be convenient, but usually it’s faster to debug a problem yourself. That’s only possible with access to the internals of a service, something you don’t have with hosted solutions. What if there’s a simple feature that’d take a small configuration change or a few lines of code to implement? AWS support can’t do that for you. With containers and open source software you can.

Just like Craigslist has been unbundled by purpose-built websites, it seems natural to (not only) me that cloud providers like AWS will be unbundled by purpose-built open source software. In a way that’s ironic because the value proposition of AWS is the exact opposite – bundling  open source software in a centralized place and giving them a consistent look and feel. Up until now we didn’t have the right technology to enable unbundling of PaaS solutions. It’s only recently that container infrastructure and orchestration are becoming mature enough to make this possible.

A Brief Guide to the Docker Ecosystem

Few other technologies have penetrated technology companies as rapidly as Docker (or more generally, containers). It seems like the majority of developers and companies are using containers in one way or another. Many use containers to simplify the setup of local development environments, but more and more companies are starting to completely re-architect their infrastructure and deployment processes around containers. In this post I’m hoping to provide a brief overview of the current state of the ecosystem.

Engines / Runtimes

Container Engines are the core piece of Container technology. The engine builds and runs containers, typically based on some declarative description, such as a Dockerfile.  When people talk about Docker, they typically refer to the Docker Engine, and not necessarily the rest of the ecosystem.

• Docker Engine is the current industry standard and the by far most popular engine.
• rkt is an open-source initiative to take on the Docker Engine, lead by the CoreOS team.

Cloud Services with built-in Docker support

Cloud providers have been quick in offering solutions to run containers on top of their platforms. Some built solutions in-house, and others rely on open source software. Of course, one could manually install Docker an run containers on any server, but most cloud providers go a step further and provide user interfaces that make managing containers easier.

• Amazon EC2 Container Service allows running containerized applications on existing EC2 instances. ECS itself is free, you only pay for the EC2 usage.
• Google Container Engine is built on top of Kubernetes, an open-source container orchestration project started by Google.
• Azure has announced support for Docker containers on top of Mesos
• Stackdock provides hosting for Docker containers.
• Tutum provides hosting for Docker containers.
• GiantSwarm is a cloud platform for defining and deploying microservice architectures running inside containers.
• Joyent Triton provides hosting and monitoring for Docker containers.
• Jelastic Docker provides cloud hosted orchestration for container deployments.

Container Orchestration

Container Orchestration is one of the most contended areas right now. Working with a few containers is easy, but scheduling, managing and monitoring containers at scale is extremely challenging. Container Orchestration software handles a variety of tasks, such as finding the best place/server to run a container, handling failures, sharing storage volumes, and creating load balancers and overlay networks to allow communication between containers.

• Kubernetes is an open source effort started by Google. Kubernetes is based on Google’s internal container infrastructure, and in terms of features it is the most advanced orchestration platform currently available.
• Docker Swarm allows scheduling containers on a cluster of Docker hosts. It is tightly integrated with the rest of the Docker ecosystem.
• Rancher manages application stacks (linked containers) on a cluster of machines. Rancher features an intuitive user interface, excellent documentation, and runs inside a container itself.
• Mesosphere  is a general purpose datacenter operating system. It was not specifically built for Docker, but it includes primitives that make it easy to run containers, or other orchestration systems like Kubernetes, next to traditional services like Hadoop .
• CoreOS fleet  is part of the CoreOS operating system and manages the scheduling  of arbitrary commands (such as running Docker/rkt containers) within a CoreOS cluster.
• Nomad is a general-purpose application scheduler with built-in support for Docker.
• Centurion is a deployment tool internally used and developed by Newrelic.
• Flocker assists with data/volume migration among containers running on different hosts.
• Weave Run provides service discovery, routing, load balancing, and address management for microservice architectures.

Operating Systems

You can run containers on any operating system, but companies are increasingly moving towards containerizing their whole infrastructure. As such, it makes sense to to run a minimal operating systems optimized for Docker and related services.

• CoreOS is designed for automatic updates and focuses on running containers across cluster of machines. It ships with fleet, a scheduler inspired by systemd, but also supports other orchestration systems.
• Project Atomic is a lightweight operating system that runs Docker, Kubernetes, rpm and systemd.
• Rancher OS  is a 20MB Linux distribution that runs the entire operating system within containers. It differentiates between “system containers” and “user containers”, each running in a separate Docker daemon.
• Project Photon is an open source effort from VMWare.

Container Image Registries

Image Registries are the “Github for container images” and allow you to share container images with your team, or the world.

• Docker Registry is the most popular open source registry. You can run it on your own infrastructure or use Dockerhub.
• Dockerhub provides an intuitive UI, automated builds, private repositories, and a large number of official images maintained directly by the authors of the software.
• Quay.io is a container registry developed by the CoreOS team.
• CoreOS Enterprise Registry focuses on providing fine-grained permission and audit trails.

Monitoring

Containers write log files that can be ingested into any existing log collection tool. Container monitoring software typically focus on resource usage (CPU, memory) broken down by container.

• cAdvisor is an open source project by Google. It analyzes resource usage and performance characteristics of running containers and optionally uses InfluxDB as a storage backend for analytics.
• Datadog Docker is an agent that collects statistics of running Docker containers and sends them to Datadog for further analysis.
• NewRelic Docker send container statistics to NewRelic’s cloud service.
• Sysdig can also monitor container resource usage.
• Weave Scope automatically generates a map of your containers, helping you understand, monitor, and control your applications.
• AppFormix provides real-time infrastructure monitoring that works with Docker containers.

A Few Tips To Make Distributed Teams Work Well

An increasing number of startups seem to be building distributed teams, particularly in engineering. But making a distributed team work well isn’t easy. From what I’ve seen most distributed teams are less productive than their centralized counterparts. Here are a few things from my own experience that I think are crucial to the effectiveness such teams. If I had to summarize all of the below I would do it as: Remove synchronous communication, and when in doubt, over-communicate.

Hire people who thrive in remote environments

An otherwise excellent hire may underperform in a distributed environment. This has nothing to do with skill, it’s mostly a result of past experience and personality. Some of us enjoy the social atmosphere and tighter supervision of an office environment. Others thrive with little or no supervision away from all distractions. I found that the latter type of people tend to have experience with starting or managing their own projects, e.g. an open source project, side project, or startup. Explain the company’s goals and they will figure out the little details themselves, without the need for a lot of meetings. Look for these people.

Be proactive, not reactive, in giving access to resources

Communication in a remote environment is asynchronous, so your goal should be to minimize the need for synchronous communication. For example, a developer may be blocked by not having access to a code repository or SaaS service the company uses. In an office environment this isn’t a big deal, she can just walk over to a manager or teammate and get access. In a remote environment she may have to wait several hours to get it. If this happens frequently it adds up to a lot of lost time. Be proactive in giving everyone the resources they need to make their own decisions.

Don’t be a hybrid

It’s not uncommon to simultaneously run a distributed team and a “core” team in a centralized location. This is appealing, but it’s also difficult to get right. It only works if everyone follows the same procedures. It’s tempting for members of the core team to make quick decisions through in-person meetings and neglect discussing them with people who aren’t there. If everyone talks in Slack then the core team should do the same. Not doing so will result in an imbalance of information, or worse, frustration of people who are not “in the loop”.

Focus on process, not outcome

As long as a centralized team is small you can run it without a lot of formal procedures. Not so with distributed teams. An example for engineering teams would be creating formal procedures for Pull Requests. What should be in the title/description and acceptance criteria? Who should review/merge, and when? Defining all of this formally seems like overkill, but in a remote environment it ensures that everyone is on the same page and knows what to expect. More generally, your goal shouldn’t be to ship the next version of your product as quickly as possible (though that’s a nice side effect). It should be to build a scalable process , get out of the way, and make your team productive without requiring a lot of supervision. Formal processes help do that.

Have the right technology stack in place

There’s lots of software that makes working in a distributed team easier. It would be foolish not to use it. Use Slack for team communication. Blossom or Pivotal Tracker for task tracking . Screenhero for screen sharing. Google Hangouts for meetings. And so on. Of course the above are just examples, use any software that meets your needs. Again, make sure that you have processes in place that define how to use your software stack (e.g. how to manage and reviews tasks).

Get everyone on the same page

I found that many inefficiencies in distributed teams stem from the fact that team members aren’t on the same page about company priorities, or that they don’t know what everyone else is working on. This is absolutely crucial. Processes that I’ve found effective include using company and team OKRs, regular standups (either in Slack or via video chat), and doing weekly retrospectives of what has been accomplished, what didn’t go so well, and what the goals for next week are.

Consider Transparency

Buffer is probably the best example of a company that’s incredibly transparent. Transparency works well with distributed teams because it removes the need for communication. If something is open to everyone, employees don’t need ask around for access. You don’t need to become as transparent as Buffer is, but it’s worth considering what you could be transparent about both publicly and internally.

Why Startups Really Succeed: Strings of Luck

Luck plays a huge role in everything we do, and where we’re born is the perhaps biggest lottery of our lives. But acknowledging luck makes us feel uncomfortable. Our brain seeks causal stories, and tries to create them from whatever information is currently available. This helps us maintain the illusion that the world is an orderly place we have control over.

Startups are no exception. The stories of those that succeeded, and the post-mortems of those that failed, are always causal stories. In the case of success they are typically stories about visionary founders in a fast-growing market pursuing an idea at just the right time. That’s exactly the kind of story that appeals to our brains (and the press). There’s no mention of luck. Surely, if we could turn back time, and those founders were to start the business again under the same circumstances, it would also succeed, right?

That’s an illusion.

We tend to overestimate the influence that founders, or any element we can control, have on the outcome. I am not discrediting the hard work of startup founders. The intelligence, resilience, resourcefulness, and optimism of the founders certainly play a big role in the success of a startup. But I believe that it’s a required and not a sufficient condition. Let’s take Airbnb as an example. Paul Graham writes:

Airbnb now seems like an unstoppable juggernaut, but early on it was so fragile that about 30 days of going out and engaging in person with users made the difference between success and failure.

There are an infinite number of events, from family problems to legal issues, that did not happen but would have resulted in Airbnb going out of business at some time during its inception. A chance encounter with someone offering an attractive job to the founders would probably have been enough (the founders started renting out mattresses because they couldn’t afford rent in SF). It was lucky that none of this happened.

The combined absence of all events that would’ve resulted in the founders shutting down Airbnb was very unlikely. Similarly, there were a few crucial (lucky) events that had a large impact on Airbnb. What if the initial 2 customers had never seen the website? What if nobody ever recommended that the founders take prettier pictures of the listed places? You can come up with similar examples for most other billion-dollar startups. Google almost sold their company for \$750k in 1999 and just barely escaped death.  All companies are fickle in their early days, and it’s usually a stroke of random events that leads the founders to continue instead of shutting down or prematurely selling the business.

In Thinking Fast and Slow, nobel-winning Daniel Kahneman puts it well:

Narrative fallacies arise inevitably from our continuous attempt to make sense of the world. The explanatory stories that people find compelling are simple; are concrete rather than abstract; assign a larger role to talent, stupidity, and intentions than to luck; and focus on a few striking events that happened rather than on the countless events that failed to happen.

This also gives us the top reason startups fail: Because it’s the default action. In the absence of continuous random events that keep a startup alive there are just too many things that can go wrong, and too many seemingly better opportunities the founders could choose to pursue. Statistically, it is more likely that something leads to the (voluntary or involuntary) shutdown of a startup than it is that everything goes just according to plan. That’s the reason VCs don’t focus on “Will this startup succeed?”, but on “If this startup succeeds, how big could it be?” Some have recognized that there are just too many variables to consider, and that it’s impossible to predict the future of a startup.

The reason so many successful startups come out of Silicon Valley is because it’s a numbers game. SV has the highest concentration of startups anywhere in the world (maybe even more than the rest of the world combined). People move to SV to start risky companies. Statistically it should come as no surprise that most successes start here. To avoid sounding like a hopeless pessimist I want to clarify that  I am not saying that all the other factors (culture, available of talent, etc) are irrelevant. It’s just that we tend to overvalue them because they make for good stories.

Optimism, or blissful ignorance, could be called the secret sauce of startup founders. Being relentlessly optimistic leads the founder to make the (irrational) decision of continuing with their startup when they could be pursuing an opportunity with a higher expected value. And given the large number of samples, this works out just fine in Silicon Valley.

Reimagining Language Learning with NLP and Reinforcement Learning

The way we learn natural languages hasn’t really changed for decades. We now have beautiful apps like Duolingo and Spaced Repetition software like Anki, but I’m talking about our fundamental approach. We still follow pre-defined curricula, and do essentially random exercises. Learning isn’t personalized, and learning isn’t driven by data. And I think there’s a big opportunity to change that. With the unlimited supply of natural language data online, and with the advances in Natural Language Processing (NLP) techniques, shouldn’t we be able to do something smarter? Here’s what I’m thinking.

The foundation: Modeling Knowledge

At the heart of making learning more efficient is the ability to model a learner’s knowledge. Once you understand what a learner knows you can present her with material that’s most beneficial. Modeling knowledge in general is a difficult problem.  How would you quantify your knowledge about ancient Rome, English literature or mechanics? Knowledge in most disciplines is based on connecting disparate facts and then and reasoning about them in one way or another. Language learning is different, and it’s unique in that it’s quite simple. Comprehending a sentence doesn’t require higher level reasoning, and we can actually measure a learner’s knowledge by presenting her with the right challenges, such as sentence comprehension or completion.

We also need to model language itself. In order to present a learner with a sentence she can comprehend we must know which knowledge (vocabulary, grammar, etc) that sentence depends on. In a way that’s what courses do “manually”.  They present you with a predefined sequence of material that builds on top of each other. I believe we can do this automatically. NLP techniques are sufficiently sophisticated that we should be able to figure out the knowledge dependencies of a text. And that would open up a whole new world of possibilities.

A mathematical formulation

To make things concrete, let’s actually try to define the above mathematically. What follows is invariably an oversimplification of language learning, but I think it’s a useful enough model to do something interesting with. Let’s assume a learner’s language knowledge can be quantified by how well she knows vocabulary and grammar items. I’m not saying this is the right, or the only definition, but it’s something has worked quite well in practice. It’s what most courses and textbooks do.

A learner’s knowledge is defined by a state $s$, which captures our belief about what the learner knows. For example, $s$ could be a sparse vector of real numbers where each element (e.g. 0.73) is a score quantifying how well a learner knows a word or grammar rule. The score could be calculated based on the learner’s performance on reading/listening comprehension and writing/speaking production tasks. Note that $s$ models our belief about the learner’s knowledge, not necessary the actual state of the world. Thus it would probably be a good idea to also include uncertainty (in the form of confidence bounds or distributions) in the representation above. But it’s easier to think about $s$ as just a vector of scores.

We can perform actions $a \in A$ to modify a learner’s knowledge. Actions could include vocabulary reviews, sentence comprehension tasks, or grammar exercises. Just think about what textbooks do. All of these actions have an effect on $s$.  They could increase or decrease the scores based on how well the learner did (or could change the uncertainty about our beliefs). In other words, if a learner is in state $s_t$ at time $t$, then an action $a \in A$ will transition her to a new state $s_{t+1}$. The number of possible states is obviously huge, or infinite.

This now starts to look a bit like a Markov Decision Process (MDP), except that we don’t have uncertainty in our state transition, and that we haven’t defined a reward function.

Learning towards a specific goal

Most approaches ignore the fact that students have different motivations for learning a language. That’s clearly a mistake. The knowledge required to understand your favorite TV drama is different from the knowledge required to comprehend scientific journals. Obviously there is a lot of overlap, but taking a class focused on daily conversation probably isn’t the fastest way towards reading academic literature. With the ability to model knowledge on a fine-grained level we can have truly personalized learning.

Let’s assume the learner’s goal is to understand a certain text, an online article or Youtube video for example. Because we know the knowledge dependencies of that text we know which target states $s^t$ would allow the learner to comprehend it (with high probability at least). Our goal is to find a policy $\pi(s_t)$ that tells us which actions to take at any given point in time in order to reach some target state as quickly as possible. The policy tells us the stochastically optimal path towards a learner’s goal. In an MDP, the policy is defined as maximizing the sum of rewards from some reward function $R_a(s, s')$, and by defining that function in the right way we can solve the problem of finding an optimal policy using Reinforcement Learning techniques.

This task is challenging due to several reasons. The state space is infinite and actions have stochastic results. We can never explicitly model the whole space. We may also need to trade immediate rewards for long term rewards. For example, instead of learning a complicated term that frequently appears in the target text it may be better to learn a common word that has a low frequency in the target text but makes more actions available to the learner in the future. Luckily, all these are well-known problems that have been solved in one way or another.

Picking the right actions

A key challenge in language learning is to present the learner with material that is neither too difficult nor too easy, or the learner will become frustrated or bored, respectively. In text comprehension there is research that shows that one unknown word for about 50 known words is a good ratio to encourage learning. Learning vocabulary from context is generally more effective than rode memorization because it forces the brain to make connections to things you already know. If we can accurately model the knowledge of a learner and the knowledge dependencies of text, then this task becomes trivial. We could find articles, social media posts or other content that are just right for the learner’s current level and create actions based on them. And of course, by presenting such material to the learner we would refine our model of what the learner actually knows. In order words, the set of actions available at a state $s_t$ should be limited  to those actions that are appropriate for a learner at that stage.

Data Network Effects

The more actions a learner performs the more accurately we will be able model his knowledge, and the more confident we can be in presenting him with the right actions. But that’s not all. As more learners are performing actions we can become more certain about how actions affect a learner’s state, essentially answering the question: Which material is most effective for a learner with a certain background knowledge and goal? This not only allows for making optimal recommendations about what a learner should do next, but may even provide insights about how people learn in general.

These are just some examples of things we can do with a more analytical approach to language learning, but it’s already pretty exciting.

The human side of technical debt

When we talk about technical debt we typically talk about its business impact. It allows us to gain short-term efficiency at the expense of long-term productivity. Technical debt isn’t always bad. Sometimes it’s the right business decision. When running experiments that may or may not become part of a final product taking on technical debt is often a good idea. If the experiment doesn’t work out we can throw away the piece the incurred the debt and there’s no need to “pay it back”.

But technical debt has side effects that we often forget about. Developers hate working with technical debt that isn’t their own. Nobody likes cleaning up somebody else’s mess. Whenever a developer touches code or infrastructure plagued by technical debt she is likely to feel frustrated and demotivated, and that feeling will spill over to other aspects of her work. This effect on human motivation is hard to quantify, but extremely important to consider.

Then there’s the cascading effect. The presence technical debt increases the probability of developers adding more technical debt in adjacent components. It’s messed up anyway, so adding a bit more won’t hurt, right? Individual developers often make such decisions on the spot, and the result can get out of hand rather quickly.

The price of technical grows with number of people and the number of components that touch it.  We need to think of its impact in terms of people, not just code or infrastructure. As much as possible debt should stay confined to isolated components that are “owned” by individuals or teams who are responsible for managing and paying back the debt over time.

How to make decisions, stick with them, and be productive

The list of new technologies I want to learn about grows every day. My Amazon wish list contains hundreds of books on more than a dozen subjects, ranging from Machine Learning to Public Speaking. I have a spreadsheet with over 50 business ideas and associated research. Some I briefly started working on, only to abandon them a few weeks later and move on to the next shiny thing.

What the hell am I doing?

We live in a world of unlimited opportunity. But deciding how to spend our time is becoming increasingly difficult. There are so many choices. How do we pick the best one? Which metric do we use to define best?  And once we’ve made a decision, how can we follow through and avoid getting distracted? I often feel paralyzed. I am scared that whatever choice I make is suboptimal and may lead to misery down the road.

I’ve always admired people who can set their sight on a goal and follow through without getting distracted by other opportunities. I have a couple of friends who are like that. How are some people so incredibly productive? And why do some people feel empty even though they’ve achieved huge success? I figured that learning about human motivation may help me solve my problem. So that’s what I did and what this post is about.

I have’t found one answer that applies to all situations but I thought it would be good share what I’ve learned nonetheless. So this post is a collection of ideas and techniques that helped me become more motivated and productive. Not all of them may work for you. Everyone faces unique challenges after all.

Before diving into the practical stuff let’s try break down our problem. I think there are two subproblems:

1. How do we decide what to spend our time on?
2. Once we decided, how do we stick with our decision and become more productive?

Of course, these two problems are interrelated. Picking a project we’re passionate about may lead to increased productivity down the road. And knowing why we do something helps us sticking with it and ignore distractions. Still, I think it’s useful to consider these two question separately.

How do we decide what to spend our time on?

Don’t follow your passion

Successful people will tell you that they love what they do. There clearly is a correlation, but nobody has been able to prove a causation. Don’t confuse the two. In other words, these people didn’t go into deep introspection, figure out what they were passionate about, and then became successful because of that. They learned to love what they were doing. There’s a good chance that you are passionate about what you’re already good it. Because that’s the reason you became good at it in the first place.

Ben Horowitz recently gave a speech at Columbia University about this. Two points that stood out to me are that  passions hard to prioritize and that passions change. Are you more passionate about math or engineering? Are you more passionate about history or literature? What you’re passionate about now may not be the same thing you’re passionate about in 10 years.

In his book So Good They Can’t Ignore You Cal Newport, a computer science professor and the author of the popular Study Hacks blog, suggests looking are your strengths to figure out what you can offer to the world. What are you uniquely destined to do? Passion will be the side effect.

Follow Resistance

If you haven’t read Steven Pressfield’s The War of Art you should. Out of all the resources in this post it’s probably the one that has had the biggest impact on me. The basic idea is that a force called Resistance is trying to prevent us from doing meaningful things in our life. It’s not a scientifically rigorous theory, but a very helpful metaphor to think about human motivation.

Any act that rejects immediate gratification in favor of long-term growth, health, or integrity. Or, expressed another way, any act that derives from our higher nature instead of our lower. Any of these will elicit Resistance.

Goals that commonly evoke resistance include dieting, exercising, starting a business, and creative work like writing or drawing. Resistance is stronger the more meaningful the project is. That sucks, right? Nope, actually it’s great. It allows you to use resistance as a compass. The project you’re most scared about, the one that has been on your list forever, is probably the most fulfilling thing you can do.

I’ll talk a bit more about how Resistance manifests itself later in the post.

Be optimistic and have a bias towards action

When I read the biographies of successful people (Getting There: A Book of Mentors is a good resource) it stood out to me that many zig-zagged through their lives and made drastic career changes several times. It’s important to realize that the decisions you make are not set in stone. It’s usually impossible to determine the best path in advance. You’re better off making a well-informed decision quickly than to overanalyze the situation and do nothing. This is also known as Satisficing. Picking a strategy that achieves a minimum threshold of “goodness” often leads to better outcomes than spending a huge amount of time to finding optimal decision.

One thing to be aware of is that we tend to overestimate the potential downsides of our decisions because we’re naturally loss-averse. We’re  scared to lose our comfort and safety. In reality there are few career choices that will completely ruin your life. There’s almost always a way to recover. Look at upside opportunity instead of downside risk and be optimistic about uncertainty. If an opportunity has a lot of uncertainty (an entrepreneurial venture for example) you are certain to learn a lot from it, even if it doesn’t work out.

Inaction is often riskier than we think. A good example are supposedly stable jobs. They are stable until they aren’t. A market crash, legal dispute or shareholder issues can lead to layoffs that nobody expected. Relying on the stability of your job is riskier than continuously reinventing yourself (e.g. by developing skills currently in demand or switching jobs) and diversifying your activities.

Death

Most of us ignore death. It’s not something we typically think about. We act as if we are in this world forever. That just isn’t true. All of us will die eventually. Once we let death enter our life everything changes.

The moment a person learns he’s got terminal cancer, a profound shift takes place in his psyche. At one stroke in the doctor’s office he becomes aware of what really matters to him. Things that sixty seconds earlier had seemed all-important suddenly appear meaningless, while people and concerns that he had till then dismissed at once take on supreme importance. What about that gift he had for music? What became of the passion he once felt to work with the sick and the homeless? Why do these unlived lives return now with such power and poignancy?

Steve jobs shares a similar experience in his graduation speech at Stanford. Remembering that he’ll be dead soon helped him forget about external expectation, pride, and fear of embarrassment, and figure out what’s important to him. One of the biggest regrets of the dying is “I wish I’d had the courage to live a life true to myself, not the life others expected of me”. Regularly thinking about death is a useful technique that may help us get closer to that.

How do we stick with our decisions and become more productive?

Label Your Enemy

Resistance (as described in the War of Art) manifests itself in many forms: Self-doubt, fear of public evaluation, rationalization, or fantasizing about the outcome of your decisions. My personal favorite is rationalization:

Rationalization is Resistance’s right-hand man. Its job is to keep us from feeling the shame we would feel if we truly faced what cowards we are for not doing our work.

Simply understanding what forms of resistance we experience is helpful. Next time you experience any of the above try to identify and label it. This is also known as affect labeling and has long been used in meditation to manage negative emotions. By noticing “I am rationalizing my behavior right now” you may just be able to get rid of your feeling. Actually noticing a specific feeling when it arises is difficult, often you’re just “swept away” by it, but you get better at it with practice.

Time Blocking and Pomodoro

Time blocking is a technique that has had a huge impact on my productivity. The basic idea is that you divide your day into fixed chunks of time and assign tasks to them. During each block you work on nothing else but the task at hand. If you finish your task early you can work on a secondary task. If you don’t finish within the allotted time you can reschedule your existing blocks or continue tomorrow. It’s important that you evaluate your day based on how well you adhere to your blocks, not based on how many tasks you finish. Cal Newport describes this technique in more detail on bis blog. A side effect of time blocking is that it helps you build habits by using the same block structure every day.

Directly related to time blocking is Parkinson’s Law:

Work expands so as to fill the time available for its completion.

By assigning work to discrete time slots you are forcing yourself to be more productive and cut down on unnecessary actions.

If you have trouble staying focused for a full-time block you can break it down into 25-minute intervals with little breaks in between. These are also known as Pomodoros.

Mastery

Autonomy, Mastery and Purpose. According to Daniel Pink’s Drive these are the three pillars of intrinsic motivation. Most of us already have autonomy and I talked about purpose in the first part of this post. But what about Mastery?

Mastery is the feeling of progression, the feeling that your skills are improving. In video games Mastery often takes on the form of levels, ranks or better items. It’s one element that makes games so addictive. But how do we implement this in our own lives?

In order to gain a sense of mastery from your actions you need to set measurable goals. I like to use Google’s OKR system for this. (Okay it wasn’t invented by Google but at least they made it popular). OKR stands for Objectives and Key Results. An objective is something like “Lose 10 pounds” and its associated key results could be “Go to the gym 15 times for at least one hour” and “Run a total of 50 miles”. Key Results are always fully quantifiable.

An important characteristic of OKRs is that they are ambitious. Achieving 60 to 70% of an Objective at the end of a period is what you’re aiming for.

If you get 100%, you’re not crushing it, you’re sandbagging.

I use a spreadsheet to manage my personal OKRs. I review and update their progress once a week and create new OKRs at the end of each month. The OKRs feed directly into my time blocking schedule. Every time block is related to a Key Result.

Building Habits

According to Power of Habit up to 40% of our waking life may be dictated by habits. During these times we’re on autopilot. Habits are hard to change. This is both good and bad. Once you’ve created a habit of going to the gym every day it’s difficult to break it. But when you’re in the middle of doing your work and somehow end up on Facebook without realizing it, that’s a habit too.

Habits are made up of a cue/trigger, behavior and reward. The reward satisfies some kind of craving you have. It could be hunger, the need for socialization, or boredom. To get rid of a bad habit you can use the following procedure:

1. Identify the behavior, e.g. visiting Facebook. This is the easiest part because it’s usually obvious.
2. Experiment with various rewards to figure out your craving. What if you went for a walk instead? What if you listened to music for a few minutes? Does that satisfy you?
3. Try to identify the cue. It could be location, time, emotional state, other people, preceding action or something else.
4. Make a new plan based on your cue and try to replace your habit. E.g. “At 3pm listen to music for 5 minutes.” Write it down and follow your plan when you experience the cue next time.

Mindfulness

Regular meditation practice can help with various aspects of productivity. It can help you stay focused for longer periods of time. It will also help you to identify feelings and thoughts (e.g. rationalization) you may not notice otherwise. If you’re looking for an easy way to get started with meditation I recommend the Headspace app.

If you made it this far, thanks for reading. What are your favorite techniques for staying motivated and productive?