If your Discord bot keeps going offline at random, you're not alone. Most of the bot crashes our support team sees every week come down to the same handful of causes, and almost all of them are fixable in an afternoon. Here's what actually breaks bots, and the changes that keep them online.
The crash that takes the whole bot down
By far the most common ticket is some version of "my bot just stops responding." When we ask for the logs, the story is nearly always the same. One command threw an error nobody caught, and the whole process died.
In discord.py this shows up as an unhandled exception. In discord.js it's an unhandled promise rejection. Same idea, different language. A single line of code fails, there's nothing to catch it, and Node or Python prints a traceback and exits. Your bot looks fine for hours, then someone runs the one command that hits a bad value and it's gone.
Here's the kind of thing that does it. In Python:
@bot.command()
async def balance(ctx, user: discord.Member):
data = accounts[user.id] # KeyError if the user has no account
await ctx.send(f"{user.name} has {data['coins']} coins")
If that user isn't in your accounts dict, you get a KeyError, and depending on how your code is structured that can take everything down. The fix is to wrap the risky bit and handle the failure on purpose:
@bot.command()
async def balance(ctx, user: discord.Member):
try:
data = accounts[user.id]
except KeyError:
await ctx.send("That user doesn't have an account yet.")
return
await ctx.send(f"{user.name} has {data['coins']} coins")
In discord.js the equivalent trap is a rejected promise nobody awaited or caught. Wrap your command handlers in try/catch and always handle the async calls:
client.on('interactionCreate', async (interaction) => {
if (!interaction.isChatInputCommand()) return;
try {
await runCommand(interaction);
} catch (err) {
console.error('Command failed:', err);
if (interaction.deferred || interaction.replied) {
await interaction.followUp({ content: 'Something went wrong.', ephemeral: true });
} else {
await interaction.reply({ content: 'Something went wrong.', ephemeral: true });
}
}
});
And add a top level safety net so a stray rejection logs instead of killing the process:
process.on('unhandledRejection', (reason) => {
console.error('Unhandled rejection:', reason);
});
A quick warning on that last one. It's a net, not a cure. If you're catching the same rejection every few seconds, fix the actual code. Swallowing errors silently just hides the real problem until it gets worse.
Rate limits and 429 errors
The second big category is bots that hammer the Discord API too fast. Discord enforces rate limits, and when you go over, the API replies with a 429 and a retry time. Both libraries usually handle a normal 429 for you by waiting and retrying. The trouble starts when your code creates them faster than they can clear.
The classic example is editing a message inside a tight loop, or sending one message per user in a big server. We've seen a leveling bot try to send 200 individual DMs in a for loop with no delay. It hit the global rate limit, got temporarily blocked, and looked dead for a while.
Some things that help:
- Don't edit a message more than about once a second. If you're building a progress bar or a live counter, batch the updates.
- Send one message instead of five. Use embeds and line breaks to combine information.
- For anything that loops over members, add a small
await asyncio.sleep()or queue the work instead of firing it all at once. - Watch your logs for repeated 429s. If you see them, you're doing too much too fast, not too little.
If you keep getting hit with a Cloudflare ban (a 429 that won't clear for an hour), that's the API telling you something in your code is in a runaway loop. Find it before you restart, or you'll just trip it again.
Gateway intents that aren't turned on
This one trips up nearly every new bot, and the error message isn't always obvious. Your bot connects fine, but it never sees messages, or it can't read member lists, or on_member_join never fires.
Discord splits events into intents. Some are privileged and have to be enabled in two places: in your code, and in the Developer Portal. People usually do one and forget the other.
In discord.py:
intents = discord.Intents.default()
intents.message_content = True
intents.members = True
bot = commands.Bot(command_prefix="!", intents=intents)
In discord.js:
const { Client, GatewayIntentBits } = require('discord.js');
const client = new Client({
intents: [
GatewayIntentBits.Guilds,
GatewayIntentBits.GuildMessages,
GatewayIntentBits.MessageContent,
GatewayIntentBits.GuildMembers,
],
});
Then go to the Developer Portal, open your application, click Bot, and flip on the Message Content Intent and Server Members Intent toggles. If you ask for a privileged intent in code that you haven't enabled in the portal, the bot will throw on startup and refuse to log in. Honestly, if your bot suddenly stopped reading messages after a library update, this is the first thing to check.
Memory leaks from caches that keep growing
This is the sneaky one. Your bot runs fine for a day or two, gets slower, then crashes with an out of memory error. On a box with a fixed RAM limit, the host kills the process the moment it goes over.
The usual cause is a collection that only ever grows. A dictionary keyed by message ID that you never clean up. A list of every command ever run, kept "for stats." A cache of user objects you add to but never expire. discord.js also caches a lot by default, and on a big bot that adds up.
A few practical fixes:
- Use a bounded structure. In Python,
collections.deque(maxlen=1000)drops old items automatically. In Node, libraries likelru-cachedo the same. - Tune discord.js cache options if you don't need every member and message in memory. The sweeper options let you drop stale entries on a schedule.
- Watch memory over time. If it climbs in a straight line and never comes down, you have a leak, not normal usage.
In our experience, a bot that crashes roughly every 36 hours almost always has a slow leak. The clock is your biggest clue.
An expired or reset token
Sometimes the bot won't start at all, and the error mentions an invalid token. There are a couple of reasons this happens.
The first is that someone reset the token in the Developer Portal and forgot to update the bot. The old token stops working instantly, so the bot can't log in until you paste the new one into your .env file. The second is more serious. If you accidentally commit your token to a public GitHub repo, Discord scans for that and resets it for you automatically. Helpful, but it means your bot drops offline with no warning.
Keep your token in a .env file, never in the source, and make sure .env is listed in .gitignore. Load it like this:
# Python, with python-dotenv
import os
from dotenv import load_dotenv
load_dotenv()
bot.run(os.getenv("DISCORD_TOKEN"))
If you ever think a token leaked, reset it right away and update the bot. Don't wait. A leaked token lets anyone control your bot.
Blocking the event loop with sync code
Both discord.py and discord.js run on a single event loop. The bot stays responsive by never sitting and waiting on one thing. The moment you run slow synchronous code, the whole bot freezes until it finishes. It doesn't crash exactly, but it stops responding, misses heartbeats, and Discord disconnects it. To users, that looks identical to a crash.
Common offenders are time.sleep() instead of await asyncio.sleep(), the requests library instead of aiohttp, heavy file reads, and big synchronous loops crunching data. In Node, the same goes for fs.readFileSync on a large file or a tight CPU loop.
The rule of thumb: if something takes more than a few milliseconds, it should be awaited or pushed off the main loop. Use the async version of libraries. For genuinely heavy CPU work, run it in a thread or worker so the event loop stays free. If your bot goes quiet for a few seconds every time someone runs a particular command, this is almost certainly why.
The fixes that keep it online
Catching individual bugs matters, but the setup around your bot is what keeps it running when something slips through. Here's what we recommend to every customer who opens a "my bot keeps crashing" ticket.
Log everything to a file
You can't fix a crash you can't see. Print statements scroll away. Set up proper logging that writes to a file with timestamps, so when the bot dies at 3am you have a traceback waiting in the morning. Python's logging module does this in a few lines, and for Node a logger like pino or even structured console.error piped to a file works.
Use try/except and try/catch around risky code
Wrap anything that touches the network, parses user input, or reads a file. Catch the specific error where you can, log it, and reply to the user with a friendly message instead of letting the whole bot fall over. One bad command should never take the rest down.
Let the library handle reconnects, and don't fight it
Both libraries reconnect to the gateway automatically when the connection drops. Don't write your own reconnect loop that calls client.login again on top of that. We've seen custom reconnect code cause more outages than network blips ever did. Trust the built in logic and just log when it happens so you can spot patterns.
Run it under a process manager
This is the single biggest win. A process manager restarts your bot automatically if it ever does exit. For Node, pm2 is the standard:
pm2 start bot.js --name mybot
pm2 logs mybot
pm2 startup # so it survives a server reboot
For Python on a VPS, a small systemd service with Restart=always does the same job and starts the bot on boot. On a Bytte.cloud bot plan the panel handles restarts for you, so you get this without setting anything up. Either way, the goal is the same: if the bot exits, something brings it straight back.
Add a basic health check
A restart loop is great until your bot is "running" but frozen, the blocked event loop problem from earlier. A process manager won't catch that, because the process is technically alive. A simple health check fixes it. Have the bot update a timestamp every minute, then a tiny external check that restarts it if that timestamp goes stale. Even a small HTTP endpoint your monitor can ping is enough.
Where to start
If your bot is crashing right now, work through it in order. Turn on file logging first so you can actually see what's happening. Check your intents in the portal. Wrap your command handlers in try/except. Then put the whole thing under pm2 or systemd so a single bad moment can't keep it down. Most bots we see go from daily crashes to weeks of uptime with just those four changes, and the rest is tuning as your bot grows.



