Sqitch - Sane database change management

mrbig4545 · on Aug 31, 2015

I used to use sqitch. It drive me mad, it wants to do too much. I have got for version control, I don't need sqitch to do it as well. And it doesn't play nice with other developers. If you add migration a in one branch, and someone adds migration b in another, then they merge theirs before yours, you're in for a world of pain whne you try to delpoy yours.

In the end I wrote a replacement that did was I needed in under 100 lines of perl. DB management should be simple. No point over complicating it

Not forgetting the guy who writes it is not that friendly, we sent a patch to change the default answer to revert to no, instead of yes. First off he said it was useless, then after some prodding over several months he copied it in and committed it in his name, no mention of where it came from. Real classy

eli · on Aug 31, 2015

Are there other database migration systems that make dealing with migrations in conflicting branches really easy?

jcoby · on Aug 31, 2015

It's been my experience that simpler is better. At least when it comes to teams up to around 12-15 people. After that politics will dictate how you migrate.

Adopting three rules has pretty much made migrations a non issue: 1. migrations should be timestamped, tracked, and applied in time order (rails-style migrations; this allows for the migrator to determine which migrations have not been applied regardless of when they get added to the run list) 2. migrations should be committed separate from logic changes (this way you can bring in migration a if migration b depends on it even when feature a isn't ready to be merged) 3) migrations are always forward. It's great to be able to revert during development but production is always forward. If a migration fails it always requires investigation; there is no automatic recovery.

Using the above rules you can put together a migration system in any language in about an hour by simply storing files that issue DDL/SQL directly. With a few hours more work you can abstract it out and be cross-database but that's rarely worth it.

I've tried sequentially versioned migrations and they are a major bear to work with when branching and multiple developers are involved. You end up having one guy be the "database migration guy" and responsible for keeping everything in order.

The intelligent migrators that do diffs of the schema vs the db always have issues plus the very real potential to lose data.

sandGorgon · on Aug 31, 2015

We actually use rails just to manage our db. Rails migrations are so good and sane (including the rake generator tasks) that I highly recommend it...even if you are using nodejs or something else as your actual stack.

jcoby · on Aug 31, 2015

That's a great idea. I never would have considered that.

Do you create a full rails project to handle migrations or what? How do you integrate that in with your other code?

stevepike · on Aug 31, 2015

I've used https://github.com/thuss/standalone-migrations with success before. It's the rails migration system as a standalone gem.

sandGorgon · on Aug 31, 2015

A full rails project is just a few KB. All you need to do each time is run "bundle exec rails generate migration AddNameToUsers"

And then follow the migration syntax to modify your schema. Rails will take care if the timesramping, rollbacks (in most cases),etc. You can manage your indexes, primary,etc everything here. You don't need to write any other rails code.

Check this whole directory as a part of your main source repo.

Another commenter pointed out a standalone gem for migrations. That might be OK, but I'd rather use mainline rails in this case.

joesb · on Aug 31, 2015

I could not agree more.

Rails's migration is probably the only one that works in any project condition we've been through. No problem working with legacy DB. No problem using it in the DB where other team change unrelated table. You always know what change is in each migration.

zimbatm · on Aug 31, 2015

Something I understood only recently is that a DB is an API. You can add fields/tables to it easily but changes should always be backwards-compatible (if you care about availability). Only when all the clients have been upgraded can the old fields/tables be removed.

Most deploys should be:

1. Adding fields/tables

2. Replace all clients

3. Remove old fields/tables

In the case of a rename triggers should be used to duplicate data between step 1 and 3.

idunno246 · on Aug 31, 2015

We've been using alembic in Python. It makes a dag, and has a concept of merge points so you Don't grow leaves indefinitely. Handles branching well, our history mostly just looks like a line of diamonds.

You're sol if the branches touch the same column or directly conflict, but that's never happened to us. Rarely dev Dbs end up hosed, if someone commits a bad change for instance and you pull it in before fixed, but since the latest alembic bug fixes and solid pr ci t hasn't happened.

subleq · on Aug 31, 2015

Django. Migrations are modeled as a DAG instead of a linear sequence, so you can create migrations in conflicting branches as much as you want, and just create a 'merge' migration (that does nothing but combine the two branches) when you merge the branches.

shkkmo · on Aug 31, 2015

I've found the doctrine migrations library to work nicely, if you are already using doctrine as an ORM. In fact, I find the simplicity of using doctrine migrations one the my favorite features of doctrine. I find myself wishing that the other parts of doctrine were as well designed and implemented.

1) It identifies migrations based on the timestamp when the migration was created. This means that it checks for any missing migrations regardless of migration 'order' while still running all needed migrations in order. It also checks for any migrations that have been run on the DB that are not in your current code base and warns you about them.

2) The migrations are simple PHP scripts that run SQL. This means that you can put whatever you want in them, easily edit them, adjust the timestamp, and manage them in your version control of choice.

3) Merging two branches with different migrations is not a problem, as long as the migrations themselves don't conflict (i.e. one changes a table or column name that the other also wants to modify). You will however have to identify the conflicts by testing the migrations together rather than just relying on your merge tool.

I'd have a hard time moving to any other migration tool that didn't offer this level of simplicity. The auto-generation of migrations based on comparing the XML schema and the DB is nice as well, but I could live without it as long as I have the other features.

dmmalam · on Aug 31, 2015

Currently using pg-migrator [1]. Seems to have the right balance between power and simplicity. Though numbered migrations can be a little annoying to manage cross team.

[1] https://github.com/aphel-bilisim-hizmetleri/pg-migrator

mrbig4545 · on Aug 31, 2015

I don't know about conflicting branches, but sqitch didn't like out of order migrations, even if they have nothing to do with each other. The authors response is for the developer with the issue to revert back to a sane point and then redeploy, which is not suitable if you've spent a hour filling that table with data!

UseStrict · on Aug 31, 2015

So far Sqitch has been an absolute nightmare for production deployments, particularly if you have to change something in history before a tag occurs. Not to mention merging, which invariably always generates a conflict in the plan file.

ahachete · on Aug 31, 2015

In the PostgreSQL world, there are extensions (http://www.postgresql.org/docs/9.4/static/extend-extensions....). Since extensions are self-contained groups of database objects (i.e., code), are versioned and have some (basic) form of dependency management, they can be used for simple deployment mechanisms. They also support "SQL diffs" as "migration scripts" from version to version, and the extension mechanism applies them as needed. With all this, ot is not difficult to build a simple infrastructure for code packaging and deployment. That's what we do in our company.

dmmalam · on Aug 31, 2015

Would love to see a blog post detailing this.

ahachete · on Aug 31, 2015

Hmmmmmmm ok, good idea. Adding to our TODO list, hope to get it out soon ;P

shoq · on Aug 31, 2015

We're currently looking at http://flywaydb.org/. Does anyone care to share some expiriences?

enricopulatzo · on Aug 31, 2015

Especially if your build involves Maven, FlywayDB is a very good choice if you want to use native SQL to manage your schema.

I was originally quite bullish on FlywayDB until I started looking into Liquibase by way of Dropwizard. I'm personally not quite convinced of the need for native SQL scripting for schema management, as I often like to work in H2 for dev and something else for production.

However, if you primarily want to use JDBC + native SQL to manage your schema, Flyway is quite powerful and easy to fold into your build process.

hackerboos · on Aug 31, 2015

Coming from Django and Rails Flyway was missing support for rollbacks. Everything else worked well mind...

ljoshua · on Aug 31, 2015

Nice, it feel similar to how Yii does database migrations but with a lot more packed in.

What I'd really like to see is a open source or free tool accomplishing similar things to what RedGate SQL Compare does. Now that's a powerful tool and certainly worth its cost, but in some situations a limited option due to its support for only SQL Server or where funds are lacking.

zamalek · on Aug 31, 2015

What I'd like to see is a portable Data Dude (a.k.a. Visual Studio Data Projects). You basically write your database as a whole bunch of CREATE scripts, Data Dude then does a full parse over that (with all the static compilation bells and whistles - including some static analysis), diffs it against your actual DB (or a previous version, if you keep the 'libs' around) and then spits out the relevant diff script. Obviously you'll sometimes you need to manually pre-script/post-script to upgrade data.

It basically makes database dev behave exactly like your average imperative language. The current mainstream DB landscape is ridiculously absurd. We don't write migrations for our C++, so why the heck are we writing migrations for our DBs? We the heck are we committing history (migrations) into a stores that is already historical (SCM)?

jacques_chester · on Aug 31, 2015

> We don't write migrations for our C++, so why the heck are we writing migrations for our DBs?

In-memory data structures are ephemeral. You "migrate" by restarting or replacing code entirely. If you have an on-disk format that basically acts as a memory snapshot, someone writes a migration script, which is usually not recognised as such.

Databases seem weird and icky and mismatched, but it's because of the D in ACID. Durable data has momentum, more force is required to change its vector than the mist that lives in memory.

kyllo · on Aug 31, 2015

Persistent (a Haskell database library) does something like this, in a highly declarative way. You just declare your tables and their columns as Haskell data types, and when it compiles your application it automatically migrates your database if it can do so non-destructively. If your change would destroy data or requires a default value for a new non-null column or something, it will give you an error message that tells you what you need to do.

It's by far the best experience I've had working with a database in a programming language. Only problem is that it's Haskell specific, not portable to other languages.

VLM · on Aug 31, 2015

If I read you correctly (quite possibly not) you're proposing something like the output of "mysqldump -d" be compared to some golden schema dump and then implement something like RANCID for routers such that changes in the DB schema get emailed or otherwise loudly announced and tracked vs the golden schema. I do that. A very small shell script can take care of irrelevant diff outputs (timestamps and default minimum autoincrement numbers the like)

On a side issue (kind of), something not covered in the squitch tutorial is if you git-flow then you probably should lay down some discipline that the exact name of your feature branch is the squitch change name, just to keep insanity levels down. Also I've done things like a git-flow feature branch results in a completely new test table name being created and there needs to be discipline that the test table name is "normalTableName-gitflowFeatureBranchName" or some kind of site standard. Its only kind of a side issue, in that your proposed Data Dude app probably should cooperate with git-flow workflows.

> so why the heck are we writing migrations for our DBs

Sometimes dumb indexing decisions can take hours / days to apply and kill the NAS, run the system out of storage, etc. I've done that and its not entertaining, so its less painful to do "real stuff" by hand, at least to production machines. Tend to find some hilarious scaling mistakes. Very few source code diffs scale the source code length itself exponentially or something equally naughty so its always safe to commit/pull source code changes and automerge them, not so much schema changes.

zamalek · on Aug 31, 2015

> Very few source code diffs scale the source code length itself exponentially or something equally naughty so its always safe to commit/pull source code changes and automerge them, not so much schema changes.

Awesome counterargument. However, in the same breath: what's the story with migrations? In RoR they're ordered by date, so what happens if two developers make concurrent and conflicting schema changes in their respective branches? That merge will silently fail, a false positive. At least a merge conflict throws up red flags, causing some human has to look at what is going on.

dpb000 · on Aug 31, 2015

SCM solves the easy problem of maintaining the history of the DDL required to build an empty database from scratch (and you can even have DML under source control to populate the database with test data once it is built).

When version 1.2 of a table's DDL adds a new, not nullable column to the version 1.1 definition and there are a billion rows in production, you are looking at a one-off migration event.

zamalek · on Aug 31, 2015

> you are looking at a one-off migration event.

Right, and that's where you roll up your sleeves and describe how to upgrade to that new column; just like you'd handle an out-of-date wire format with C++. No matter how you look at it, you'll always have to manually deal with the most complex upgrade scenarios no matter what you are targeting (RDBMS/CPU/car/washing machine).

The question remains: why are we doing that all the time?

dragonwriter · on Aug 31, 2015

> We don't write migrations for our C++, so why the heck are we writing migrations for our DBs?

Because your DB is a datastore as well as a store of code and structure, and when you apply changes to your production DB, they generally have to preserve the data that is already in the DB.

That means that the transition is the important thing to describe and test. Inferring it from create scripts is possible, but it doesn't really gain you anything. And doing it by some process that infers some of the changes but also leaves you to directly script others seems to be strictly inferior to describing the transitions explicitly and not leaving something to infer them partially with you filling in the gaps.

It sounds like you are describing a process that leaves you more work to do so that you can be less close to the important part of what the is actually be done, which seems like losing all around -- though it certainly seems like the kind of losing that is inexplicably popular in the enterprise world.

gghh · on Aug 31, 2015

We don't write migrations for our C++, so why the heck are we writing migrations for our DBs

The reason is that your database is much more "stateful" than you binary executables. A db migration (DDL, i.e. when you change the schema of your db) can take days to apply (eg: create an index in a large table), and you don't want to re-do it every time you deploy. That's why database changes are much more "diff oriented" than your code: when you deploy code you just compile the whole thing and replace the old binary with a new one.

glogla · on Aug 31, 2015

Squitch looks interesting, but I can't figure out how to use it to make production deployments.

In some environments, you can play with your dev and test environment, but the production is someone else's playground, and you just have to give them set of sql scripts to deploy.

Can something like that be done with squitch?

0xbadcafebee · on Aug 31, 2015

I might be missing something, but all the code being executed just seems to be running some .sql files on the database (deploy/.sql, revert/.sql, verify/*.sql). It seems to just be an interface to loading these sql files, making it easier to (for example) revert a set of changes in order.

So, you could just use this tool to manage and test your chances, and then manually go through and give instructions to prod engineers on which files to deploy and how.

Git does seem to be complicating things, though.

chrismaeda · on Aug 31, 2015

We're in the midst of adopting Liquibase to support our schema on 3 different db's (Oracle, SQL Server, MySQL). Interested in hearing about any experiences or gotchas with this tool.

fusiongyro · on Sept 1, 2015

I have been using it for the last five or so years and absolutely love it. I would recommend that you:

- Make sure every code feature gets its own Liquibase migration file

- Use the XML format—stronger IDE integration, and it's been around longer

- If possible, use their migrations instead of raw SQL, since you will get the rollback functionality "for free."

- Integrate it with the "deploy" phase of your Maven build, if applicable

- Use the "reverse engineer" feature when you get close to deploying it the first time, to ensure that you have a from-scratch workflow that works and produces identical schemas

You will get merge conflicts on the file that lists the migrations to apply, but they are obvious and trivial to fix.

Pitfalls. The only one I've noticed is that it can be confused about Postgres types, but I have been able to the "rewrite" functionality to "fix them in post." I tried the YAML format and found that I missed the completion in the IDE that the XML format has, thanks to the schema. Otherwise, it works fine, but I like the validity angle.

You can set up some migrations to be run every time. I usually set up the GRANTs that way, so I know the permissions are good right before a code deploy. Also, on those migrations, use an empty <rollback> clause so it doesn't prevent you from rolling back (you can do that on any migration to make Liquibase ignore it during a rollback operation, so all my data-manipulation migrations get one too).

dalacv · on Aug 31, 2015

I tried to use it. But it was too cumbersome to manage my migrations this way. I love the way that Rails handles database migrations.

eddd · on Aug 31, 2015

I tried to use it once. It looked promising, but i remember I had some problems with testing migrations.

frugalmail · on Aug 31, 2015

I really prefer git + Liquibase. I personally like the database abstraction and the refactorings.