A Few Tips To Make Distributed Teams Work Well

An increasing number of startups seem to be building distributed teams, particularly in engineering. But making a distributed team work well isn’t easy. From what I’ve seen most distributed teams are less productive than their centralized counterparts. Here are a few things from my own experience that I think are crucial to the effectiveness such teams. If I had to summarize all of the below I would do it as: Remove synchronous communication, and when in doubt, over-communicate.

Hire people who thrive in remote environments

An otherwise excellent hire may underperform in a distributed environment. This has nothing to do with skill, it’s mostly a result of past experience and personality. Some of us enjoy the social atmosphere and tighter supervision of an office environment. Others thrive with little or no supervision away from all distractions. I found that the latter type of people tend to have experience with starting or managing their own projects, e.g. an open source project, side project, or startup. Explain the company’s goals and they will figure out the little details themselves, without the need for a lot of meetings. Look for these people.

Be proactive, not reactive, in giving access to resources

Communication in a remote environment is asynchronous, so your goal should be to minimize the need for synchronous communication. For example, a developer may be blocked by not having access to a code repository or SaaS service the company uses. In an office environment this isn’t a big deal, she can just walk over to a manager or teammate and get access. In a remote environment she may have to wait several hours to get it. If this happens frequently it adds up to a lot of lost time. Be proactive in giving everyone the resources they need to make their own decisions.

Don’t be a hybrid

It’s not uncommon to simultaneously run a distributed team and a “core” team in a centralized location. This is appealing, but it’s also difficult to get right. It only works if everyone follows the same procedures. It’s tempting for members of the core team to make quick decisions through in-person meetings and neglect discussing them with people who aren’t there. If everyone talks in Slack then the core team should do the same. Not doing so will result in an imbalance of information, or worse, frustration of people who are not “in the loop”.

Focus on process, not outcome

As long as a centralized team is small you can run it without a lot of formal procedures. Not so with distributed teams. An example for engineering teams would be creating formal procedures for Pull Requests. What should be in the title/description and acceptance criteria? Who should review/merge, and when? Defining all of this formally seems like overkill, but in a remote environment it ensures that everyone is on the same page and knows what to expect. More generally, your goal shouldn’t be to ship the next version of your product as quickly as possible (though that’s a nice side effect). It should be to build a scalable process , get out of the way, and make your team productive without requiring a lot of supervision. Formal processes help do that.

Have the right technology stack in place

There’s lots of software that makes working in a distributed team easier. It would be foolish not to use it. Use Slack for team communication. Blossom or Pivotal Tracker for task tracking . Screenhero for screen sharing. Google Hangouts for meetings. And so on. Of course the above are just examples, use any software that meets your needs. Again, make sure that you have processes in place that define how to use your software stack (e.g. how to manage and reviews tasks).

Get everyone on the same page

I found that many inefficiencies in distributed teams stem from the fact that team members aren’t on the same page about company priorities, or that they don’t know what everyone else is working on. This is absolutely crucial. Processes that I’ve found effective include using company and team OKRs, regular standups (either in Slack or via video chat), and doing weekly retrospectives of what has been accomplished, what didn’t go so well, and what the goals for next week are.

Consider Transparency

Buffer is probably the best example of a company that’s incredibly transparent. Transparency works well with distributed teams because it removes the need for communication. If something is open to everyone, employees don’t need ask around for access. You don’t need to become as transparent as Buffer is, but it’s worth considering what you could be transparent about both publicly and internally.


Why Startups Really Succeed: Strings of Luck

Unicorn Cafe

Luck plays a huge role in everything we do, and where we’re born is the perhaps biggest lottery of our lives. But acknowledging luck makes us feel uncomfortable. Our brain seeks causal stories, and tries to create them from whatever information is currently available. This helps us maintain the illusion that the world is an orderly place we have control over.

Startups are no exception. The stories of those that succeeded, and the post-mortems of those that failed, are always causal stories. In the case of success they are typically stories about visionary founders in a fast-growing market pursuing an idea at just the right time. That’s exactly the kind of story that appeals to our brains (and the press). There’s no mention of luck. Surely, if we could turn back time, and those founders were to start the business again under the same circumstances, it would also succeed, right?

That’s an illusion.

We tend to overestimate the influence that founders, or any element we can control, have on the outcome. I am not discrediting the hard work of startup founders. The intelligence, resilience, resourcefulness, and optimism of the founders certainly play a big role in the success of a startup. But I believe that it’s a required and not a sufficient condition. Let’s take Airbnb as an example. Paul Graham writes:

Airbnb now seems like an unstoppable juggernaut, but early on it was so fragile that about 30 days of going out and engaging in person with users made the difference between success and failure.

There are an infinite number of events, from family problems to legal issues, that did not happen but would have resulted in Airbnb going out of business at some time during its inception. A chance encounter with someone offering an attractive job to the founders would probably have been enough (the founders started renting out mattresses because they couldn’t afford rent in SF). It was lucky that none of this happened.

The combined absence of all events that would’ve resulted in the founders shutting down Airbnb was very unlikely. Similarly, there were a few crucial (lucky) events that had a large impact on Airbnb. What if the initial 2 customers had never seen the website? What if nobody ever recommended that the founders take prettier pictures of the listed places? You can come up with similar examples for most other billion-dollar startups. Google almost sold their company for $750k in 1999 and just barely escaped death.  All companies are fickle in their early days, and it’s usually a stroke of random events that leads the founders to continue instead of shutting down or prematurely selling the business.

In Thinking Fast and Slow, nobel-winning Daniel Kahneman puts it well:

 Narrative fallacies arise inevitably from our continuous attempt to make sense of the world. The explanatory stories that people find compelling are simple; are concrete rather than abstract; assign a larger role to talent, stupidity, and intentions than to luck; and focus on a few striking events that happened rather than on the countless events that failed to happen.

This also gives us the top reason startups fail: Because it’s the default action. In the absence of continuous random events that keep a startup alive there are just too many things that can go wrong, and too many seemingly better opportunities the founders could choose to pursue. Statistically, it is more likely that something leads to the (voluntary or involuntary) shutdown of a startup than it is that everything goes just according to plan. That’s the reason VCs don’t focus on “Will this startup succeed?”, but on “If this startup succeeds, how big could it be?” Some have recognized that there are just too many variables to consider, and that it’s impossible to predict the future of a startup.

The reason so many successful startups come out of Silicon Valley is because it’s a numbers game. SV has the highest concentration of startups anywhere in the world (maybe even more than the rest of the world combined). People move to SV to start risky companies. Statistically it should come as no surprise that most successes start here. To avoid sounding like a hopeless pessimist I want to clarify that  I am not saying that all the other factors (culture, available of talent, etc) are irrelevant. It’s just that we tend to overvalue them because they make for good stories.

Optimism, or blissful ignorance, could be called the secret sauce of startup founders. Being relentlessly optimistic leads the founder to make the (irrational) decision of continuing with their startup when they could be pursuing an opportunity with a higher expected value. And given the large number of samples, this works out just fine in Silicon Valley.


Reimagining Language Learning with NLP and Reinforcement Learning

The way we learn natural languages hasn’t really changed for decades. We now have beautiful apps like Duolingo and Spaced Repetition software like Anki, but I’m talking about our fundamental approach. We still follow pre-defined curricula, and do essentially random exercises. Learning isn’t personalized, and learning isn’t driven by data. And I think there’s a big opportunity to change that. With the unlimited supply of natural language data online, and with the advances in Natural Language Processing (NLP) techniques, shouldn’t we be able to do something smarter? Here’s what I’m thinking.

The foundation: Modeling Knowledge

At the heart of making learning more efficient is the ability to model a learner’s knowledge. Once you understand what a learner knows you can present her with material that’s most beneficial. Modeling knowledge in general is a difficult problem.  How would you quantify your knowledge about ancient Rome, English literature or mechanics? Knowledge in most disciplines is based on connecting disparate facts and then and reasoning about them in one way or another. Language learning is different, and it’s unique in that it’s quite simple. Comprehending a sentence doesn’t require higher level reasoning, and we can actually measure a learner’s knowledge by presenting her with the right challenges, such as sentence comprehension or completion.

We also need to model language itself. In order to present a learner with a sentence she can comprehend we must know which knowledge (vocabulary, grammar, etc) that sentence depends on. In a way that’s what courses do “manually”.  They present you with a predefined sequence of material that builds on top of each other. I believe we can do this automatically. NLP techniques are sufficiently sophisticated that we should be able to figure out the knowledge dependencies of a text. And that would open up a whole new world of possibilities.

A mathematical formulation

To make things concrete, let’s actually try to define the above mathematically. What follows is invariably an oversimplification of language learning, but I think it’s a useful enough model to do something interesting with. Let’s assume a learner’s language knowledge can be quantified by how well she knows vocabulary and grammar items. I’m not saying this is the right, or the only definition, but it’s something has worked quite well in practice. It’s what most courses and textbooks do.

A learner’s knowledge is defined by a state s, which captures our belief about what the learner knows. For example, s could be a sparse vector of real numbers where each element (e.g. 0.73) is a score quantifying how well a learner knows a word or grammar rule. The score could be calculated based on the learner’s performance on reading/listening comprehension and writing/speaking production tasks. Note that s models our belief about the learner’s knowledge, not necessary the actual state of the world. Thus it would probably be a good idea to also include uncertainty (in the form of confidence bounds or distributions) in the representation above. But it’s easier to think about s as just a vector of scores.

We can perform actions a \in A to modify a learner’s knowledge. Actions could include vocabulary reviews, sentence comprehension tasks, or grammar exercises. Just think about what textbooks do. All of these actions have an effect on s.  They could increase or decrease the scores based on how well the learner did (or could change the uncertainty about our beliefs). In other words, if a learner is in state s_t at time t, then an action a \in A will transition her to a new state s_{t+1}. The number of possible states is obviously huge, or infinite.

This now starts to look a bit like a Markov Decision Process (MDP), except that we don’t have uncertainty in our state transition, and that we haven’t defined a reward function.

Learning towards a specific goal

Most approaches ignore the fact that students have different motivations for learning a language. That’s clearly a mistake. The knowledge required to understand your favorite TV drama is different from the knowledge required to comprehend scientific journals. Obviously there is a lot of overlap, but taking a class focused on daily conversation probably isn’t the fastest way towards reading academic literature. With the ability to model knowledge on a fine-grained level we can have truly personalized learning.

Let’s assume the learner’s goal is to understand a certain text, an online article or Youtube video for example. Because we know the knowledge dependencies of that text we know which target states s^t would allow the learner to comprehend it (with high probability at least). Our goal is to find a policy \pi(s_t) that tells us which actions to take at any given point in time in order to reach some target state as quickly as possible. The policy tells us the stochastically optimal path towards a learner’s goal. In an MDP, the policy is defined as maximizing the sum of rewards from some reward function R_a(s, s'), and by defining that function in the right way we can solve the problem of finding an optimal policy using Reinforcement Learning techniques.

This task is challenging due to several reasons. The state space is infinite and actions have stochastic results. We can never explicitly model the whole space. We may also need to trade immediate rewards for long term rewards. For example, instead of learning a complicated term that frequently appears in the target text it may be better to learn a common word that has a low frequency in the target text but makes more actions available to the learner in the future. Luckily, all these are well-known problems that have been solved in one way or another.

Picking the right actions

A key challenge in language learning is to present the learner with material that is neither too difficult nor too easy, or the learner will become frustrated or bored, respectively. In text comprehension there is research that shows that one unknown word for about 50 known words is a good ratio to encourage learning. Learning vocabulary from context is generally more effective than rode memorization because it forces the brain to make connections to things you already know. If we can accurately model the knowledge of a learner and the knowledge dependencies of text, then this task becomes trivial. We could find articles, social media posts or other content that are just right for the learner’s current level and create actions based on them. And of course, by presenting such material to the learner we would refine our model of what the learner actually knows. In order words, the set of actions available at a state s_t should be limited  to those actions that are appropriate for a learner at that stage.

Data Network Effects

The more actions a learner performs the more accurately we will be able model his knowledge, and the more confident we can be in presenting him with the right actions. But that’s not all. As more learners are performing actions we can become more certain about how actions affect a learner’s state, essentially answering the question: Which material is most effective for a learner with a certain background knowledge and goal? This not only allows for making optimal recommendations about what a learner should do next, but may even provide insights about how people learn in general.

These are just some examples of things we can do with a more analytical approach to language learning, but it’s already pretty exciting.

The human side of technical debt

When we talk about technical debt we typically talk about its business impact. It allows us to gain short-term efficiency at the expense of long-term productivity. Technical debt isn’t always bad. Sometimes it’s the right business decision. When running experiments that may or may not become part of a final product taking on technical debt is often a good idea. If the experiment doesn’t work out we can throw away the piece the incurred the debt and there’s no need to “pay it back”.

But technical debt has side effects that we often forget about. Developers hate working with technical debt that isn’t their own. Nobody likes cleaning up somebody else’s mess. Whenever a developer touches code or infrastructure plagued by technical debt she is likely to feel frustrated and demotivated, and that feeling will spill over to other aspects of her work. This effect on human motivation is hard to quantify, but extremely important to consider.

Then there’s the cascading effect. The presence technical debt increases the probability of developers adding more technical debt in adjacent components. It’s messed up anyway, so adding a bit more won’t hurt, right? Individual developers often make such decisions on the spot, and the result can get out of hand rather quickly.

The price of technical grows with number of people and the number of components that touch it.  We need to think of its impact in terms of people, not just code or infrastructure. As much as possible debt should stay confined to isolated components that are “owned” by individuals or teams who are responsible for managing and paying back the debt over time.