Warm Handoffs

Warm Handoffs are a procedure I’ve implemented with several teams and organizations to improve communication and collaboration in Slack. As organizations grow and ownership shifts and splits as teams are formed and re-formed, Slack channels proliferate, and it gets harder for folks across an org to know the right channel to get help for a particular problem. This often leads to someone asking a question of the team they think is right then being redirected to another team and channel (sometimes more than once). Whether redirects are driven by a desire to get someone to the right place or a “not my job” mentality, they subtely contribute to a siloed “us vs. them” culture, and over time reduce collaboration and cohesiveness between teams.

Sometimes, to avoid redirects, if the wrong team is asked about something, they will tag members of the right team in their channel. While pulling in the right folks is an improvement over a redirect, it reinforces asking for help in the wrong channel, and mean that other members of the owning team don’t see the questions or issues their users have.

Warm Handoffs address both of these issues by moving questions to the right team and channel while supporting the requester throughout the process. The basic rule is: “If someone asks you a question you can’t answer, take them to someone who can answer it. If you don’t know who that is, help find someone who can.”

The exact guidance will vary based on how your organization implements support channels in Slack, but generally the guidelines are:

If someone asks a question in the wrong channel, and you know the right channel, link their question in the right channel, tagging them, and explain why you’re making the handoff, along with any additional context you have.
If someone asks a question in the wrong channel, and you’re not sure where it should go, link their question in a shared channel asking who can help, along with any additional context you have.

That’s it!

Implementing Warm Handoffs⌗

Warm Handoffs need to be a team or organizational standard. You can’t just recommend them, you have to encourage and reinforce them.¹ They are premised on two values: 1) most people want to help their peers and 2) most people want to get their own work done. When inter-team support requests create a conflict between these two values, folks tend to resort to the quickest solution that meets both needs, the Cold Redirect. To break this tendency, you need to establish that a Warm Handoff is expected in lieu of a Cold Redirect, not just preferred.

Your first step should be to write up the procedure with whatever organization-specific guidelines are necessary, and get buy-in from other leaders. Then you should share it at whichever level you’re implementing it as a policy (I usually start with my manager’s entire reporting line, then work outward to the full engineering org). This policy should be pretty short and easy to grok. Take the summary and bullet points above and add some org-specific examples, then share it around. If questions come up, provide more context to address them.

Making it Stick⌗

Warm Handoffs are more work than Cold Redirects, so you’ll likely have to overcome individual inertia to make them an organizational habit. After you’ve established the policy, you’ll need to reinforce it. I find the best time to remind folks is when they forget to do a Warm Handoff, and instead do a Cold Redirect. To make this easy, I create a Slack workflow that will DM someone a reminder with a link to the doc based on an emoji response to their message. This makes the reminder really quick, and provides consistent, non-confrontational language. e.g.:

Reminder to perform a Warm Handoff. Thank you!

Keep this reminder short and to the point. You’ve already established the expectation, and most folks will recognize the benefits. You don’t need to re-convince them, you just need to remind them.

Still Not Convinced?⌗

Hopefully the benefits of Warm Handoffs are evident. If not, here’s an example that hopefully makes the value clearer.

Read Example

The Scenario⌗

Imagine this scenario. Andre has a question about an API rate limit, he asks the API Gateway team for help in their team Slack channel:

#team-api-gateway
	Andre 11:46 AM Hey folks! I’m working with a new-to-me endpoint (`/v2/bulk-update`), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. Can you help?

A Cold Redirect⌗

The API Gateway team provides a platform that is configured by each service owner, so they’re not responsible for rate limit configurations. Miles responds accordingly:

#team-api-gateway
	Andre 11:46 AM Hey folks! I’m working with a new-to-me endpoint (`/v2/bulk-update`), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. Can you help?
	Miles 11:49 AM Hey @Andre. Rate limits are set by service owners. You might try #team-accounts, I think they own that endpoint. Cheers!
	Andre 11:50 PM Okay, thanks.

Now Andre takes his question to the Account team, and gets an answer:

#team-accounts
	Andre 11:50 AM Hey folks! I’m working with a new-to-me endpoint (`/v2/bulk-update`), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. The API Gateway team said you’ve configured a rate limit that is causing a problem. Can you help?
	Gina 11:55 AM Hey, @Andre! Our rate limit is 100 requests/minute. Are you setting a `client_id`? We’ve seen this before a few times, and it seems that setting the `client_id` fixes it. The API itself doesn’t need one, but we’ve seen it help in the past.
	Andre 11:56 AM Thanks, @Gina. I’m not setting a `client_id`, but I’ll give that a shot!
	Gina 11:56 AM np, let me know if you’re still having problems, and we can see if there is anything in the logs.
	Andre 11:59 AM Just wanted to confirm, setting the `client_id` seems to have solved my issues. I sent a bunch of requests just now, and no 429s. Thanks again!
	Gina 12:03 PM \o/

You probably don’t need to imagine this. Just search your company Slack for “try asking,” “that’s owned by,” or “you should actually ask.” You might also say, “What’s the problem? Miles responded, Andre asked the Accounts team, and Gina was able to help.” It’s true, this interaction isn’t dreadful. Andre got a quick, friendly response and Gina unblocked him, no relationships are going to be broken over this. But consider some other possibilities when Andre takes his question to the Accounts team:

Potential Problems⌗

A Lack of Context⌗

Gina might not guess that a missing client_id is the issue:

#team-accounts
	Andre 11:50 AM Hey folks! I’m working with a new-to-me endpoint (`/v2/bulk-update`), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. The API Gateway team said you’ve configured a rate limit that is causing a problem. Can you help?
	Gina 11:53 AM Hey, @Andre! Our rate limit is 100 requests/minute. It sounds like you’re under that. Did the API Gateway team give you any more information? Why do they think it’s a service configuration issue?
	Andre 11:54 AM No, they just said “Rate limits are set by service owners. You might try #team-accounts, I think they own that endpoint.”
	Gina 11:57 AM Yeah, we do own the endpoint. I double checked the configuration, and we’re set to 100 requests/minute. I’m also a bit confused, because my understanding is that we report a `client_rate_limited` metric, and I don’t see any instances of rate limiting for this endpoint in the last week. Can you check with #team-api-gateway again. I’m really sorry.

Andre has now been redirected twice, and each time it’s up to him to carry context over to the new thread:

#team-api-gateway
	Andre 11:59 AM Hey, @Miles, me again! I chatted with @Gina, and she confirmed that their rate limit is set to 100 requests/second. She also said that they don’t see any rate limiting happening for that endpoint. Is it possible that there’s a broader API Gateway issue?
	Miles 1:02 PM I doubt it. I think we’d have a lot of customer issues if that were the case, and we haven’t heard any other complaints.
	Miles 1:04 PM I’ll do some quick checking though.
	Miles 1:10 PM @Gina, what makes you say there is no rate limiting happening? I see it happening pretty consistently for your endpoint over the last week.
	Gina 1:52 PM Where do you see that?
	Miles 1:56 PM The `ip_rate_limited` metric. @Andre, does this roughly align with the 429s you’ve seen?
	Andre 1:59 PM Yeah, I think it does.
	Gina 2:02 PM Hmm, our dashboard only has the `client_rate_limited` metric, do we need both?
	Miles 2:04 PM Yeah, you probably want to track both, just in case. Our new convention is that every client needs to send a `client_id`, and all our new clients do this across the board. We still have some customers using older clients that don’t send `client_id`, so we still allow requests without it. The default behavior for the API Gateway is to only set the rate limit by client, but you can optionally set an `ip_rate_limit`, if you need one. Otherwise it defaults to a very low default value of 5 requests per minute. How new is this endpoint? Is there any chance that older clients are using it?
	Gina 2:06 PM We deployed it early this year, so none of the older clients know about it.
	Miles 2:06 PM In that case, @Andre, you probably just need to set `client_id`, and you’ll be fine.
	Andre 2:08 PM Okay, I’ll do that. I didn’t realize that was required. Thanks both of you.

Phew, that took half the day, and I even ommitted the worst case where Miles tells Andre there is rate limiting, and sends him back to Gina.

Unclear Expectations⌗

Without context from Miles, Gina may not understand why her team is being asked, and could push back:

#team-accounts
	Andre 11:50 AM Hey folks! I’m working with a new-to-me endpoint (`/v2/bulk-update`), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. The API Gateway team said you’ve configured a rate limit that is causing a problem. Can you help?
	Gina 11:53 AM Hey, @Andre. We’re just users of the API Gateway. We don’t know how to troubleshoot issues with it. Can you please take this back to #team-api-gateway?

#team-api-gateway
	Andre 11:54 AM @Miles, I asked in #team-accounts, but they don’t know how to troubleshoot the API Gateway.

#team-accounts
	Miles 11:57 AM @Gina, the API Gateway is provided as a platform to service teams, but it’s up to your team to configure it as necessary. Here’s the configuration that you set for your endpoint. I can’t determine if this is configuration is what’s expected. That’s something you need to decide, and then assist Andre accordingly.
	Gina 12:02 PM @Miles, I’ve never worked with this config before. @Tom (OOO til 11/28) set this up with help from your team. I assume it’s working correctly, since it’s been in place for a few months. If Andre is having issues, that seems like it’s probably an API Gateway issue.
	Grace 12:10 PM @Miles @Gina, let’s hop on a call to discuss this and figure out how to help Andre. Are you both free at 2pm?
	Miles 12:11 PM Sure.
	Gina 12:12 PM Yes.
	Grace 12:14 PM Thanks. @Andre, sorry to keep you waiting, we’ll have a plan by 2:30.
	Andre 12:20 PM Okay, @Grace, thank you, I can work on something else in the meantime.
	Gina 2:34 PM
	Andre 2:40 PM Sure, I didn’t realize it was something so simple. I’m really sorry for the trouble.

Because Gina didn’t have any context about the redirect, she sent it back to Miles, who was frustrated. Rather than providing useful context, Miles defended the redirect. Andre’s simple and urgent request turned into a debate about cross-team responsibility. Gina’s manager, Grace, had to step in to address the conflict and get both engineers focused on helping Andre, rather than debating ownership responsibilities. This delayed getting Andre a solution by hours, and made him feel guilty for asking in the first place. Miles and Gina’s relationship also suffered due to an entirely avoidable conflict. Due to the emotionally exhausting experience, all the valuable information about how the rate limiting worked was lost, as Gina provided the most succinct solution, and Miles left the conversation entirely.

With a Warm Handoff⌗

Now we’ll look at the same situation, but Miles will use a Warm Handoff to redirect Andre. With a Warm Handoff, when Miles realizes that Andre needs to be redirected to another team, Miles will take the conversation there himself, and provide additional context on why he’s making the handoff:

#team-api-gateway
	Andre 11:46 AM Hey folks! I’m working with a new-to-me endpoint (`/v2/bulk-update`), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. Can you help?

#team-accounts
	Miles 11:50 AM Hey folks! @Andre came to us because he’s getting 429s against the `/v2/bulk-update` endpoint, which I believe your team owns. We don’t see any issues with the API Gateway, can you help him determine if there’s an issue with the way he’s using this endpoint?
	Gina 11:52 AM Thanks, @Miles. @Andre Our rate limit is 100 requests/minute. Can you confirm you’re under that?
	Andre 11:53 AM @Gina, yeah, I’m just doing some local testing, so it’s less than 10 a minute.
	Gina 11:53 AM Hmmm, okay. Let me check our dashboard. I don’t see any instances of rate limiting in the last week. You’re 100% sure it’s the `/v2/bulk-update` endpoint? cc/ @Miles.
	Andre 11:56 AM Yeah, definitely that endpoint.
	Miles 11:58 AM Oh, @Gina, are you tracking both `client_rate_limited` and `ip_rate_limited` for your endpoints?
	Gina 11:58 AM Looks like only `client_rate_limited`. I didn’t even know about `ip_rate_limited`. What’s the difference?
	Miles 11:59 AM All our new clients should provide `client_id`, but some older clients that only work with a subset of older endpoints don’t send that, so we also have rate limiting by IP as a fallback. The default configuration behavior for the API Gateway is to only configure the client rate limit, but you can explicitly set an IP rate limit as well, if you expect older clients to use the endpoint via the `ip_rate_limit` key. @Andre, guessing you aren’t sending `client_id` in your testing?
	Gina 12:00 PM TIL!
	Andre 12:01 PM Ah, yeah, I didn’t realize I needed that. I’ll add it, and then I should be good?
	Gina 12:02 PM Sounds like it.
	Miles 12:02 PM Yeah.
	Andre 12:03 PM Thanks, both!

This exchange was much quicker, and Andre didn’t have to relay any information back and forth between teams. Because Miles was engaged in the handoff, he was able to recognize where there might be a gap in understanding, and he helped Gina and Andre understand what was happening.

A Warm Handoff With Context⌗

We can also imagine ways it could be even more streamlined. Miles could have checked the metrics ahead of time, and deduced that it was likely the IP rate limit kicking in:

#team-accounts
	Miles 11:50 AM Hey folks! @Andre came to us because he’s getting 429s against the `/v2/bulk-update` endpoint, which I believe your team owns. I checked the metrics for that endpoint, and I don’t see any client rate limiting, but I do see IP rate limiting. Do you intend for older clients without `client_id` to hit this endpoint? If so, you probably want to explicitly set `ip_rate_limit` in your API Gateway config. If not, then it’s probably as simple as adding `client_id` to your requests, Andre.
	Gina 11:53 AM This endpoint is quite new, so we don’t expect requests without `client_id`. Thanks for catching this and letting us know, @Miles!
	Andre 11:55 AM Ah, TIL! I’ll set `client_id` going forward. Thanks!

So much better. There’s almost no back and forth, and both Miles and Gina are able to contribute the knowledge that is specific to their team to bring clarity to the problem quickly.

The original version of this article read, “Warm Handoffs need to be a policy. You can’t just recommend them, you have to enforce them.” After publishing it, I felt that this sounded stricter than the intention. “Policy” & “Enforce” may imply formal consequences, when what I’m actually advocating is a documented social norm, with nudges as reinforcement. ↩︎