Moving CediRates Off MongoDB

Why I migrated CediRates from MongoDB Atlas to Postgres and Redis running on a single EC2 instance, and what the project actually looked like on the other side.

Why I migrated CediRates from MongoDB Atlas to Postgres + Redis on a single EC2 instance, and what I learned doing it.

CediRates started out as a thin Express app. It tracks cedi exchange rates and fuel prices across Ghanaian banks, forex bureaus, and OMCs, plus reviews, articles, polls, and a developer-facing public API on top of that data. For a long time the data lived in MongoDB Atlas.

For a while that was fine. Atlas on the free tier, documents flowing in, no schema arguments with myself at 1am. For something still figuring out what it wanted to be, Mongo made sense.

Then the data shape settled. Then the same handful of query patterns started showing up everywhere. Then Atlas sent me the email.

I ended up doing the thing I'd been quietly avoiding for months: moving the whole project off Mongo and onto Postgres and Redis, running in Docker on a single EC2 instance. Here's why, and what was actually annoying about it.

Why Mongo had to go

Two things pushed it. The cheap reason, and the real reason.

The cheap reason. The Mongo Atlas free tier gives you 512MB on a shared cluster. CediRates writes a rate row per company, per currency pair, per day, and a fuel price row per company per day, and historical snapshots stack up fast. Eventually I got the email. Storage full. Upgrade or move.

Paying for a managed Mongo cluster sitting in a different cloud from where the API already ran was a non-starter. If I'm paying for a database, I want it on the same box the API lives on, and I want the database to actually fit the data.

The real reason. CediRates isn't a key-value dump. It's companies being tracked, the subcategories they belong to, rates and prices keyed by company and day, accounts and profiles, reviews belonging to both companies and users, articles joined to companies and authors, polls with options and votes, and a developer API layered on top with its own plans, keys, and monthly usage counters. That's a relational problem in a trench coat pretending to be documents.

The specific pain that finally decided it:

Nearly every aggregate, whether it was the average buying rate across a group of companies today, the latest row per company, or comparing yesterday to today to compute whether a rate went up or down, was a $lookup + $group + $sort chain. Readable once, a nightmare three months later.
No foreign keys. A company could get deleted and its rate and price documents would just sit there, pointing at a ghost. Nothing failed loudly; you just got silent holes.
Duplicated fields to avoid expensive lookups. Then the copies drifted. Then you can't trust anything.
A user's watchlist of companies was just an array of string IDs with no referential integrity behind it. Stale IDs accumulated silently over time.

I didn't move to Postgres because it's fashionable. I moved because the data is genuinely relational, and pretending otherwise was costing me something every week.

Designing the schema

Going from documents to tables is a forcing function. You can't stay vague about shape. You declare the invariants or the database refuses to start.

Most of the shape was obvious once I stopped letting myself avoid it. Companies as a first-class thing. Rates and prices as their own tables, each keyed on the company, the day, and (for rates) the currency pair, so scraping the same company twice in the same day overwrites cleanly instead of piling up duplicates. Accounts split cleanly from profiles, because they had always wanted to be split and I'd only kept them together because Mongo let me get away with it. Reviews with a real uniqueness rule, one review per user per company, enforced by the database instead of by a hopeful if in the app.

A few decisions that mattered more than they looked like they should:

Daily rows keyed on the tuple of company, date, and currency pair. The business model is "today's rate for this company and this pair." Scraping is allowed to run many times a day, but writes target the same row the rest of the day shares. A unique constraint on the tuple is what lets upserts be the default write path (scrapes overwrite, they don't append), and a separate "last updated" timestamp records when the value actually changed, as distinct from when the row was first created.

Indexes aimed at the real queries, not at what looked clever. Three access patterns do almost all the work: today's rates for one currency pair across all companies, history for one company, and aggregates across a subcategory. Three focused indexes on the rate table cover all three cleanly. I didn't add any more than that.

Inflation as a stored string, not a recomputed float. Whether a rate went up, down, or stayed flat compared to yesterday is stored as a short string on the row. Computing and storing that once at write time is cheaper than re-deriving it on every read, and it means every client sees the same answer without re-implementing the tiebreaker rules.

A dedicated rollover job instead of a "find the latest row" fallback. Every midnight, Accra time, a cron job copies yesterday's rates into today as the starting point, with carry-over rules that refuse to propagate nulls or zeros as if they were real values. Reads for "today's rate" always hit an actual row, not a fallback path that pretends to be one.

Floats for rates, not decimals. This one is a deliberate compromise, not an oversight. Rates are small numbers displayed to four decimal places, and both scraping and display round explicitly. The day pricing math starts crossing currencies server-side, fixed-precision decimals become the right call. It's a known debt, written down, not a blind spot.

None of this is clever. It's just what the data actually wants, written down.

The migration itself

There's no clean export-to-import pipeline between Mongo and Postgres. The data had to come out of Mongo as JSON, get transformed in Node, and go back in as rows.

The shape that ended up working was unsurprising in the end: export each collection to JSON, load it in a script, resolve references, write the rows. The parts that actually mattered were the ones I'd have talked myself out of if I'd been in a hurry.

The first thing was normalizing Mongo's extended-JSON types (the little wrapped $oid and $date shapes) in one place, at the edge. Everywhere else in the code got to work with ordinary strings and dates. This alone killed a whole class of bugs I kept hitting in the first draft, where a string would look right in a log line and then fail silently as a foreign key.

The second was idempotency. Every entity got written with an upsert keyed on the same uniqueness rules the schema enforces. Running the migration twice is a no-op. I ran it many more times than twice. It's the difference between a migration you tiptoe around and one you run casually in the middle of fixing something else.

The third was order. Migrations respect foreign keys whether you want them to or not. Subcategories first, then companies, then the things that belong to companies (prices, rates, reviews), then users, then the things that belong to users. An orchestrator script ran each entity in order, bailed on the first real failure, and told me which step died. Nothing exotic; just discipline.

The fourth was accepting that user data has history. Real user records come with missing fields, duplicate provider IDs, half-filled details rows, display names smeared across three different columns because three different versions of the app had written to three different places. You don't clean that up by being clever. You clean it up by walking through it field by field, writing the fallbacks down, and running the migration again when you find the next edge case. The migration script for users ended up being the longest one in the project for exactly this reason.

The unglamorous truth about migrations is that they're 80% data hygiene and 20% code. This one was no exception. A few evenings of running the scripts, spotting bad rows, fixing the transform, running them again. Nothing dramatic happened. That was the goal.

Rewriting the API

The API had to be rewritten, not refactored. The Mongo query shapes didn't translate, the aggregation pipelines became joins and subqueries, and the document shapes became related rows.

Prisma does most of the work. For the routine stuff like listing rates filtered by company, subcategory, currency, date range, inserting a scrape batch, fetching a company's profile, or paginating reviews, it's excellent. The queries are obvious from the schema, the types are generated, and when something feels slow you can see exactly what SQL got emitted.

The one place where Prisma genuinely isn't the right tool is aggregates across many rows. Computing average buying, selling, and mid rates for a currency pair across every active company, grouped by date, is a plain SQL AVG with a GROUP BY and a couple of CASE expressions to keep nulls and zeros out of the average. Writing that as raw SQL and typing the result row is two small functions. Trying to bend Prisma into it is a day you'll never get back. The mental model of Prisma for CRUD and raw SQL for analytics keeps the codebase honest and each tool doing what it's good at.

One small gotcha worth calling out: Postgres AVG returns the arbitrary-precision numeric type, which comes back from raw queries as something you shouldn't just trust is a JS number. Casting to double precision in the SQL and typing the result explicitly keeps the boundary honest. A couple of extra characters, and a lot fewer surprises three weeks later.

What Redis actually does

Redis runs in the same Docker network as the API, and it earns its spot for three specific things, none of which is a general-purpose route cache.

Rate limiting. The API's rate limits are backed by Redis so the counters are shared across processes instead of stuck inside a single Node instance. Auth endpoints use a strict bucket, logged-in endpoints use a looser one, and everything in between sits in the middle.

Short-lived OAuth exchange codes. When a user finishes the Google OAuth flow, the frontend gets a short-lived code it trades for tokens. That code needs to live somewhere that isn't a single process's memory, expire on its own, and be usable exactly once. Redis is exactly that.

A hard startup check. The API refuses to start if Redis isn't reachable. The rate limiter depends on it, and silently degrading to "no rate limiting" is the wrong default.

What Redis doesn't do yet is cache the hot read paths like today's rates for a currency pair, or today's fuel prices for a station. Those are the obvious next candidates, and the read paths are already shaped to make slotting in a cache cheap. But calling it done before it's built would be a lie. Today, those endpoints hit Postgres, and Postgres is fast enough that I haven't made it the priority.

One EC2 instance, Docker, nothing fancy

The whole stack runs on a single EC2 instance. Everything app-related is in Docker: one image for the Node code, official images for Postgres and Redis, all sharing an internal network. The only thing outside Docker is Nginx, which sits on the host and handles TLS termination and reverse-proxying into the app.

A few things that make the single-box setup honest to run in production, not just convenient to run locally.

Migrations happen in a one-shot container that runs and exits before the app is allowed to start. Every app service waits on that container's successful exit. It's impossible to run an app process against a database that hasn't been migrated yet, not because I remembered to, but because the orchestration refuses to do it.

Postgres is optional in the same Compose file. A profile flag turns the bundled database on for the EC2 instance and for local development, or off for environments where Postgres lives somewhere else, like a managed service, a separate host, or a tunnel to a shared dev database. Same file, same commands, different substrate.

Every container has an explicit memory limit, tuned by environment variables on the host. On a box where an out-of-memory kill is a real failure mode, having the worst-case process bounded and automatically restarted is the difference between a hiccup and an outage.

Nginx on the host means TLS certs and public-network config live in one obvious place. The containers never have to know anything about the public internet. They speak HTTP on the internal Docker network; the edge is the one thing that speaks HTTPS.

The single-box constraint is a feature, not a limitation. It forces honesty. You can't handwave "we'll scale horizontally" when there's no horizontal to scale to. Indexes have to be right. Queries have to be efficient. The stack either holds up on this hardware or it doesn't, and so far, it does.

There's something clarifying about running everything on one machine. When something gets slow, you know where to look. No network topology, no service mesh, no distributed tracing fog. A handful of containers, one reverse proxy on the host, logs you can follow with your eyes.

What's still on the list

I'd rather write the things I know still need doing than pretend they're already shipped.

Off-box database backups. Running Postgres on the same box that serves the API means the "what if this disk dies" problem is mine to solve, and the answer needs to be "there's a current dump sitting in object storage somewhere else." The infrastructure for that is mostly in place (CediRates already uses Cloudflare R2 for user uploads), so pointing a scheduled dump at the same bucket with scoped credentials and a rolling retention window is the obvious next move. Until I've actually restored a dump into a fresh Postgres and watched the app come up against it, what I have is wishful storage, not backups.

A real read cache in front of the hot endpoints. Redis is already in the stack. The read paths are already shaped the right way. Today's rates and today's fuel prices are the obvious first targets, and invalidation should hang off the same code paths that write the rows, not a TTL and a prayer.

Tighter analytics isolation. The aggregation queries run against the live tables. For current scale that's fine; at the shape the public API is growing into, a thin read-side projection or materialized view is the right answer before it becomes a performance problem instead of a cleanliness one.

Revisit floats for money-adjacent columns. Fine today, wrong eventually. Fixed-precision decimals with a consistent precision policy, migrated column by column, is the endgame.

What I'd do differently

Stream the migration, don't load it. The first-draft scripts read each export into memory in one go. Fine on a laptop, fragile everywhere else. A streaming pass would have saved a few awkward evenings on the bigger collections.

Stand up backups before celebrating production. The gap between "I'm in production on a single box" and "my data is backed up off-box" should be measured in hours, not weeks. Mine was weeks.

Add richer observability before you need it. Request timing and structured logs are in. Threading a request ID through to the database log would have made a few migration-day investigations much shorter than they were.

The takeaway

MongoDB isn't broken. It was the wrong tool for data that turned out to be genuinely relational, and the free-tier cap was the forcing function I needed to stop pretending otherwise.

Postgres and Prisma for structure, raw SQL where it earns its place, Redis for rate limiting and short-lived state, Nginx out front, everything else in Docker on one EC2 instance. Unglamorous. Easy to reason about. Fast enough that I've stopped thinking about performance and started thinking about the product again.

The migration wasn't painless. Every query had to be rewritten. Every document shape had to be reshaped. Every assumption I'd quietly made about the data had to be declared out loud. But the codebase is dramatically cleaner on the other side, and when something goes wrong now, there's usually one obvious place to look.

That, honestly, is what I wanted.

CediRates tracks cedi exchange rates and fuel prices across Ghanaian banks, forex bureaus, and OMCs. A product built out of Accra by a small team, and the subject of most of my war stories here.