017 —Databases
Reading raw .sql dumps without importing: a CLI playbook
A 3 GB backup lands in Slack and the client wants to know if one row is in there. The CLI workflow for reading raw .sql dumps without importing them.
A client sends you a 3.2 GB backup-prod.sql.gz over WeTransfer with the note: "the missing user account is in there somewhere, we just don't know where." Importing it means provisioning a scratch database, waiting forty minutes for the load to finish, then writing SELECT statements against a server you'll throw away tomorrow. Reading the raw .sql dump on disk takes under a minute, costs no RAM, and answers exactly the same question.
The CLI workflow for reading a raw .sql dump is older than most of the frameworks we ship sites on, and it still beats every GUI for forensic work on big files. What follows is the toolkit a handful of us reach for whenever a dump lands in a Slack thread.
What the file is before you open it
A MySQL .sql dump is a plain-text script of CREATE TABLE, INSERT INTO, and a handful of session settings at the top. mysqldump writes them in a predictable order, which is the reason every trick below works.
Start with the cheapest possible questions. Is the file compressed at all?
file backup-prod.sql.gz
# backup-prod.sql.gz: gzip compressed data, from Unix, original size 3221225472
Good. Now stream it without unpacking to disk:
zcat backup-prod.sql.gz | head -50
The first fifty lines tell you the MySQL version that produced the dump, the character set, and (if your client remembered) the database name. If head hangs forever, the file isn't actually gzipped and you're staring at a binary blob someone renamed.
Listing every table in seconds
The CREATE TABLE statements are line-anchored in a mysqldump output, which means grep finds them without scanning row data:
zcat backup-prod.sql.gz | grep -nE "^CREATE TABLE"
# 142:CREATE TABLE `wp_commentmeta` (
# 168:CREATE TABLE `wp_comments` (
# 234:CREATE TABLE `wp_options` (
# 412:CREATE TABLE `wp_postmeta` (
# 1238941:CREATE TABLE `wp_posts` (
The line numbers matter because they tell you, roughly, how the file is laid out on disk. If wp_options sits at line 234 and wp_posts sits at line 1.2 million, that's not a bug. wp_options is small, and wp_posts is where the weight lives.
Want a row count estimate per table without parsing SQL? Count the INSERT statements:
zcat backup-prod.sql.gz | grep -c "^INSERT INTO \`wp_postmeta\`"
# 8742
That isn't the row count (each INSERT contains many tuples), but it's a usable proxy for "is this table 50 rows or 5 million."
Pulling one table out of a four-gigabyte dump
This is the trick that pays for the whole post. mysqldump writes each table between two comment blocks that look like this:
-- Table structure for table `wp_users`
DROP TABLE IF EXISTS `wp_users`;
CREATE TABLE `wp_users` (...);
INSERT INTO `wp_users` VALUES (...);
-- Table structure for table `wp_usermeta`
sed can excise exactly one table by matching the marker lines:
zcat backup-prod.sql.gz \
| sed -n '/^-- Table structure for table `wp_users`/,/^-- Table structure for table/p' \
> wp_users.sql
You now have a 40 KB file that imports in two seconds into any throwaway database. The same one-liner works for Drupal's users_field_data, Magento's customer_entity, or whatever your legacy site uses. Swap the table name and run.
Hunting a single row
The "missing user" question becomes a one-liner. Find the email anywhere in the dump:
zgrep -n "marieke@example.nl" backup-prod.sql.gz
If the email is buried inside a giant extended INSERT, you'll get a line number and a wall of text. Reformat the matching line so each tuple sits on its own line:
zgrep "marieke@example.nl" backup-prod.sql.gz \
| sed 's/),(/),\n(/g' \
| grep "marieke@example.nl"
That last filter prints only the tuple you care about. You now know the user's ID, hashed password, registration date, and which meta rows you'd need to copy back. Total time, under five seconds on a 3 GB dump.
Diffing two dumps without importing either
Clients send "before" and "after" backups all the time, usually with the question "what changed?" Importing both into temporary databases and running comparison queries is the obvious answer, and it's almost always the wrong one for a first pass. Diff the schemas first:
zcat backup-monday.sql.gz | grep -E "^CREATE TABLE" | sort > schema-mon.txt
zcat backup-friday.sql.gz | grep -E "^CREATE TABLE" | sort > schema-fri.txt
diff schema-mon.txt schema-fri.txt
If the schema is identical, the change is in row data and you can keep digging with grep. If a table appeared or vanished, you've found it without paying the import cost.
For row-level diffing of a single table, extract the same table from both dumps using the sed recipe above, then run diff on the two wp_options.sql files. Most "the site broke on Friday" incidents resolve at this step, because the changed row is something obvious: a stale URL in siteurl, a plugin list that flipped in active_plugins, a cache invalidator that wrote garbage.
Watching a slow operation finish
One last tool that earns its keep on big dumps: pv (pipe viewer). When you do eventually import the dump, pipe it through pv so you can see progress:
pv backup-prod.sql.gz | gunzip | mysql -u root -p target_db
# 1.34GiB 0:04:12 [5.42MiB/s] [=======> ] 41% ETA 0:06:01
No more staring at a frozen terminal wondering whether MySQL is working or dead. The same trick works for any long pipe: pv in the middle, progress bar at the bottom of the screen.
When we built Pier for editing legacy WordPress and Drupal sites over FTP and MySQL, this was one of the workflows we kept reaching for at the terminal even after the in-app MySQL editor shipped. The recipes above belong in every developer's muscle memory whether or not you ever open our app.
The smallest thing you can do today: take a .sql backup already on your laptop and run zcat backup.sql.gz | grep -E "^CREATE TABLE" on it. You'll see your database's shape in three seconds.
— Questions —
Do these tricks work on PostgreSQL dumps?
Mostly. pg_dump uses different comment markers and COPY statements instead of INSERTs, so adjust the patterns. The structural workflow (grep schemas, sed to extract one table) is identical.
What if the dump uses extended INSERTs with millions of tuples per statement?
Reformat the line with sed 's/),(/),\n(/g' so each tuple sits on its own line. After that grep targets individual rows in milliseconds, even on multi-gigabyte dumps.
Will grep miss rows that contain literal newlines in TEXT columns?
It can. For exhaustive searches use grep -z to treat the file as null-delimited, or extract the table with sed and let MySQL parse it properly into a scratch database.