Warm Handoffs are a procedure I’ve implemented with several teams and organizations to improve communication and collaboration in Slack. As organizations grow and ownership shifts and splits as teams are formed and re-formed, Slack channels proliferate, and it gets harder for folks across an org to know the right channel to get help for a particular problem. This often leads to someone asking a question of the team they think is right then being redirected to another team and channel (sometimes more than once). Whether redirects are driven by a desire to get someone to the right place or a “not my job” mentality, they subtely contribute to a siloed “us vs. them” culture, and over time reduce collaboration and cohesiveness between teams.

Sometimes, to avoid redirects, if the wrong team is asked about something, they will tag members of the right team in their channel. While pulling in the right folks is an improvement over a redirect, it reinforces asking for help in the wrong channel, and mean that other members of the owning team don’t see the questions or issues their users have.

Warm Handoffs address both of these issues by moving questions to the right team and channel while supporting the requester throughout the process. The basic rule is: “If someone asks you a question you can’t answer, take them to someone who can answer it. If you don’t know who that is, help find someone who can.”

The exact guidance will vary based on how your organization implements support channels in Slack, but generally the guidelines are:

  • If someone asks a question in the wrong channel, and you know the right channel, link their question in the right channel, tagging them, and explain why you’re making the handoff, along with any additional context you have.

  • If someone asks a question in the wrong channel, and you’re not sure where it should go, link their question in a shared channel asking who can help, along with any additional context you have.

That’s it!

Implementing Warm Handoffs

Warm Handoffs need to be a policy. You can’t just recommend them, you have to enforce them. They are premised on two values: 1) most people want to help their peers and 2) most people want to get their own work done. When inter-team support requests create a conflict between these two values, folks tend to resort to the quickest solution that meets both needs, the Cold Redirect. To break this tendency, you need to establish that a Warm Handoff is expected in lieu of a Cold Redirect, not just preferred.

Your first step should be to write up the procedure with whatever organization-specific guidelines are necessary, and get buy-in from other leaders. Then you should share it at whichever level you’re implementing it as a policy (I usually start with my manager’s entire reporting line, then work outward to the full engineering org). This policy should be pretty short and easy to grok. Take the summary and bullet points above and add some org-specific examples, then share it around. If questions come up, provide more context to address them.

Making it Stick

Warm Handoffs are more work than Cold Redirects, so you’ll likely have to overcome individual inertia to make them an organizational habit. After you’ve established the policy, you’ll need to reinforce it. I find the best time to remind folks is when they forget to do a Warm Handoff, and instead do a Cold Redirect. To make this easy, I create a Slack workflow that will DM someone a reminder with a link to the doc based on an emoji response to their message. This makes the reminder really quick, and provides consistent, non-confrontational language. e.g.:

Reminder to perform a Warm Handoff. Thank you!

Keep this reminder short and to the point. You’ve already established the policy, and most folks will recognize the benefits. You don’t need to re-convince them, you just need to remind them.


Still Not Convinced?

Hopefully the benefits of Warm Handoffs are evident. If not, here’s an example that hopefully makes the value clearer.

The Scenario

Imagine this scenario. Andre has a question about an API rate limit, he asks the API Gateway team for help in their team Slack channel:

Andre: Hey folks! I’m working with a new-to-me endpoint (/v2/bulk-update), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. Can you help?

A Cold Redirect

The API Gateway team provides a platform that is configured by each service owner, so they’re not responsible for rate limit configurations. Miles responds accordingly:

Miles: Hey @Andre. Rate limits are set by service owners. You might try #team-accounts, I think they own that endpoint. Cheers!

Andre: Okay, thanks.

Now Andre takes his question to the Account team, and gets an answer:

Andre: Hey folks! I’m working with a new-to-me endpoint (/v2/bulk-update), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. The API Gateway team said you’ve configured a rate limit that is causing a problem. Can you help?

Gina: Hey, @Andre! Our rate limit is 100 requests/minute. Are you setting a client_id? We’ve seen this before a few times, and it seems that setting the client_id fixes it. The API itself doesn’t need one, but we’ve seen it help in the past.

Andre: Thanks, @Gina. I’m not setting a client_id, but I’ll give that a shot!

Gina: np, let me know if you’re still having problems, and we can see if there is anything in the logs.

Andre: Just wanted to confirm, setting the client_id seems to have solved my issues. I sent a bunch of requests just now, and no 429s. Thanks again!

Gina: \o/

You probably don’t need to imagine this. Just search your company Slack for “try asking,” “that’s owned by,” or “you should actually ask.” You might also say, “What’s the problem? Miles responded, Andre asked the Accounts team, and Gina was able to help.” It’s true, this interaction isn’t dreadful. Andre got a quick, friendly response and Gina unblocked him, no relationships are going to be broken over this. But consider some other possibilities when Andre takes his question to the Accounts team:

Potential Problems

A Lack of Context

Gina might not guess that a missing client_id is the issue:

Andre: Hey folks! I’m working with a new-to-me endpoint (/v2/bulk-update), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. The API Gateway team said you’ve configured a rate limit that is causing a problem. Can you help?

Gina: Hey, @Andre! Our rate limit is 100 requests/minute. It sounds like you’re under that. Did the API Gateway team give you any more information? Why do they think it’s a service configuration issue?

Andre: No, they just said “Rate limits are set by service owners. You might try #team-accounts, I think they own that endpoint.”

Gina: Yeah, we do own the endpoint. I double checked the configuration, and we’re set to 100 requests/minute. I’m also a bit confused, because my understanding is that we report a client_rate_limited metric, and I don’t see any instances of rate limiting for this endpoint in the last week. Can you check with #team-api-gateway again. I’m really sorry.

Andre has now been redirected twice, and each time it’s up to him to carry context over to the new thread:

Andre: Hey, @Miles, me again! I chatted with @Gina, and she confirmed that their rate limit is set to 100 requests/second. She also said that they don’t see any rate limiting happening for that endpoint. Is it possible that there’s a broader API Gateway issue?

Miles: I doubt it. I think we’d have a lot of customer issues if that were the case, and we haven’t heard any other complaints.

Miles: I’ll do some quick checking though.

Miles: @Gina, what makes you say there is no rate limiting happening? I see it happening pretty consistently for your endpoint over the last week.

Gina: Where do you see that?

Miles: The ip_rate_limited metric. @Andre, does this roughly align with the 429s you’ve seen?

Andre: Yeah, I think it does.

Gina: Hmm, our dashboard only has the client_rate_limited metric, do we need both?

Miles: Yeah, you probably want to track both, just in case. Our new convention is that every client needs to send a client_id, and all our new clients do this across the board. We still have some customers using older clients that don’t send client_id, so we still allow requests without it. The default behavior for the API Gateway is to only set the rate limit by client, but you can optionally set an ip_rate_limit, if you need one. Otherwise it defaults to a very low default value of 5 requests per minute.

How new is this endpoint? Is there any chance that older clients are using it?

Gina: We deployed it early this year, so none of the older clients know about it.

Miles: In that case, @Andre, you probably just need to set client_id, and you’ll be fine.

Andre: Okay, I’ll do that. I didn’t realize that was required. Thanks both of you.

Phew, that took half the day, and I even ommitted the worst case where Miles tells Andre there is rate limiting, and sends him back to Gina.

Unclear Expectations

Without context from Miles, Gina may not understand why her team is being asked, and could push back:

Andre: Hey folks! I’m working with a new-to-me endpoint (/v2/bulk-update), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. The API Gateway team said you’ve configured a rate limit that is causing a problem. Can you help?

Gina: Hey, @Andre. We’re just users of the API Gateway. We don’t know how to troubleshoot issues with it. Can you please take this back to #team-api-gateway?

Andre (in #team-api-gateway): @Miles, I asked in #team-accounts, but they don’t know how to troubleshoot the API Gateway.

Miles (in #team-accounts): @Gina, the API Gateway is provided as a platform to service teams, but it’s up to your team to configure it as necessary. Here’s the configuration that you set for your endpoint. I can’t determine if this is configuration is what’s expected. That’s something you need to decide, and then assist Andre accordingly.

Gina: @Miles, I’ve never worked with this config before. @Tom (OOO til 11/28) set this up with help from your team. I assume it’s working correctly, since it’s been in place for a few months. If Andre is having issues, that seems like it’s probably an API Gateway issue.

Grace: @Miles @Gina, let’s hop on a call to discuss this and figure out how to help Andre. Are you both free at 2pm?

Miles: Sure.

Gina: Yes.

Grace: Thanks. @Andre, sorry to keep you waiting, we’ll have a plan by 2:30.

Andre: Okay, @Grace, thank you, I can work on something else in the meantime.

Gina: @Andre, we looked at what’s happening on the call, and @Miles pointed out that your requests are missing client_id, so they’re hitting a per-IP rate limit. Can you please include client_id? That should solve it.

Andre: Sure, I didn’t realize it was something so simple. I’m really sorry for the trouble.

Because Gina didn’t have any context about the redirect, she sent it back to Miles, who was frustrated. Rather than providing useful context, Miles defended the redirect. Andre’s simple and urgent request turned into a debate about cross-team responsibility. Gina’s manager, Grace, had to step in to address the conflict and get both engineers focused on helping Andre, rather than debating ownership responsibilities. This delayed getting Andre a solution by hours, and made him feel guilty for asking in the first place. Miles and Gina’s relationship also suffered due to an entirely avoidable conflict. Due to the emotionally exhausting experience, all the valuable information about how the rate limiting worked was lost, as Gina provided the most succinct solution, and Miles left the conversation entirely.

With a Warm Handoff

Now we’ll look at the same situation, but Miles will use a Warm Handoff to redirect Andre. With a Warm Handoff, when Miles realizes that Andre needs to be redirected to another team, Miles will take the conversation there himself, and provide additional context on why he’s making the handoff:

Andre: Hey folks! I’m working with a new-to-me endpoint (/v2/bulk-update), and I keep getting 429s. I’m not generating many requests (a few per minute while testing locally), so I’m worried that something is either misconfigured, or I’m making a mistake. Can you help?

Miles (in #team-accounts): Hey folks! @Andre came to us because he’s getting 429s against the /v2/bulk-update endpoint, which I believe your team owns. We don’t see any issues with the API Gateway, can you help him determine if there’s an issue with the way he’s using this endpoint?

Gina: Thanks, @Miles. @Andre Our rate limit is 100 requests/minute. Can you confirm you’re under that?

Andre: @Gina, yeah, I’m just doing some local testing, so it’s less than 10 a minute.

Gina: Hmmm, okay. Let me check our dashboard.

Gina: I don’t see any instances of rate limiting in the last week. You’re 100% sure it’s the /v2/bulk-update endpoint? cc/ @Miles.

Andre: Yeah, definitely that endpoint.

Miles: Oh, @Gina, are you tracking both client_rate_limited and ip_rate_limited for your endpoints?

Gina: Looks like only client_rate_limited. I didn’t even know about ip_rate_limited. What’s the difference?

Miles: All our new clients should provide client_id, but some older clients that only work with a subset of older endpoints don’t send that, so we also have rate limiting by IP as a fallback. The default configuration behavior for the API Gateway is to only configure the client rate limit, but you can explicitly set an IP rate limit as well, if you expect older clients to use the endpoint via the ip_rate_limit key. @Andre, guessing you aren’t sending client_id in your testing?

Gina: TIL!

Andre: Ah, yeah, I didn’t realize I needed that. I’ll add it, and then I should be good?

Gina: Sounds like it.

Miles: Yeah.

Andre: Thanks, both!

This exchange was much quicker, and Andre didn’t have to relay any information back and forth between teams. Because Miles was engaged in the handoff, he was able to recognize where there might be a gap in understanding, and he helped Gina and Andre understand what was happening.

A Warm Handoff With Context

We can also imagine ways it could be even more streamlined. Miles could have checked the metrics ahead of time, and deduced that it was likely the IP rate limit kicking in:

Miles (in #team-accounts): Hey folks! @Andre came to us because he’s getting 429s against the /v2/bulk-update endpoint, which I believe your team owns. I checked the metrics for that endpoint, and I don’t see any client rate limiting, but I do see IP rate limiting. Do you intend for older clients without client_id to hit this endpoint? If so, you probably want to explicitly set ip_rate_limit in your API Gateway config. If not, then it’s probably as simple as adding client_id to your requests, Andre.

Gina: This endpoint is quite new, so we don’t expect requests without client_id. Thanks for catching this and letting us know, @Miles!

Andre: Ah, TIL! I’ll set client_id going forward. Thanks!

So much better. There’s almost no back and forth, and both Miles and Gina are able to contribute the knowledge that is specific to their team to bring clarity to the problem quickly.