Versioning Databases – The Baseline

Continuing from the last post (Three Rules for Database Work), I wanted to drill into some database versioning strategies that have worked well for me.

Caveats and Considerations

As a preface, let me say there are many different strategies that can work. I'm not presenting the one true way. My goal is to roll out database changes in a consistent, testable, reproducible manner. I'm used to working with largish databases that are packaged to install behind a customer's firewall. I don't want to use naïve update scripts that run for 6 hours. Concerns like this bias my thinking and strategies. Every application should manage database change, but you'll have to decide on the simplest strategy that works for your environment.

The Baseline

The first step in versioning a database is to generate a baseline schema. This is the starting point for versioning a database. After you've published the baseline to an authoritative source, any changes to the schema require a schema change script (a topic for the next post, because I have a feeling this is going to be long for a blog post).

You are probably not going to baseline the database on day 1 of a project. I'd suggest letting the early design of the schema settle in a bit so you are not creating a huge number of change scripts. This might sound like I'm suggesting you need to do some big, up front schema design – but that's not true. You can get pretty far into an application these days with in-memory data, fakes, stubs, mocks, and unit tests. Once the model in your code starts to stabilize, you can start thinking of the schema required to persist all the data. If you are ORMing, you can even generate the first schema from your model.

On the other hand, maybe your project and database have already been around a couple years. That's ok – you can baseline today (tomorrow at the latest), and manage change moving forward.

But How?

If you want to do things the hard way, then open a new file in a text editor and write all the SQL commands that will create every table, constraint, function, view, index, and every other object in your database. You'll also want to include commands that populate lookup tables with static data and include any bootstrap data needed by the application. Test the new script against a fresh database server and if successful, commit the file to source control. Consider your schema baselined!

Nobody really does this step the hard way, though. Most of us use tools that we point to a database, and the tools generate one or more scripts for us. Some people like to generate everything into one big script file. Others like to generate one script file for every database object. SQL Server Management Studio provides both option when you elect to script a database. I've seen both approaches work, but the "one file per object" approach feels cumbersome on a day to day basis, and unwieldy if the number of objects grow into the thousands.

Personally, I like to take a hybrid approach. I like to keep all of the SQL needed to create tables, constraints, defaults, and primary indexes in a single file. Any views, stored procedures, and functions are scripted one per file.

If you go the multiple file approach, make sure to write a batch file, shell script, application, or some other form of automation that can automatically locate and run all of the script files required to install the database. Human intervention in this process is a step backwards.

Also, many tools like to include CREATE DATABASE commands and/or include database names in the scripts they generate. You'll want to purge any reference to a hardcoded database name. You want the name to be configurable (a default name is fine), and you probably want to support multiple databases for your application on the same database instance (mostly for testing).

Whichever approach you take (one single file, or multiple files), you now have scripts that can recreate the database schema on any developer machine, test machine, or production machine. Everyone uses the exact same schema. Congratulations! You've just increased the quality of your software because the database can be reliably reproduced in any environment.

I Almost Forgot the Most Important Ingredient

At some point in the future, the schema will have to change. Before you baseline the database you need to add a table to record these schema changes. The following table is the kind of table I'd use to track every change to a database.

CREATE TABLE [dbo].[SchemaChanges](
   [ID] [int]
IDENTITY(1,1) NOT NULL,
   [MajorReleaseNumber] [varchar](2)
NOT NULL,
   [MinorReleaseNumber] [varchar](2)
NOT NULL,
   [PointReleaseNumber] [varchar](4)
NOT NULL,
   [ScriptName] [varchar](50)
NOT NULL,
   [DateApplied] [datetime]
NOT NULL,

    
CONSTRAINT [PK_SchemaChangeLog]
        
PRIMARY KEY CLUSTERED ([SchemaChangeID] ASC)
)

The first baseline schema script should, as the last step, officially install version 1.0 of the database:

INSERT INTO [SchemaChangeLog]
       ([MajorReleaseNumber]
       ,[MinorReleaseNumber]
       ,[PointReleaseNumber]
       ,[ScriptName]
       ,[DateApplied])
VALUES
       ('01'
       ,'00'
       ,'0000'
       ,'initial install'
       ,GETDATE())
posted on Thursday, January 31, 2008 11:46 PM by scott

Comments

Friday, February 01, 2008 1:34 AM by Holger H.

# re: Versioning Databases – The Baseline

Hi Scott,
I didn't find in SQLWB 2005 any option to script a database to individual files. Sorry to be so blind, but: where do I have to look?

Thanks,
Holger

Friday, February 01, 2008 6:32 AM by scott

# re: Versioning Databases – The Baseline

@Holger:

Sure - in the Object Explorer window, right-click a database, go to Tasks in the context menu, and then click on "Generate Scripts". There are a plethora of options to set there.

Sorry for not providing more detail. I'm trying to bang out these posts without writing the equivalent of a book :)
Friday, February 01, 2008 7:11 AM by Sean

# re: Versioning Databases – The Baseline

Thanks for the schema table tip! That's a piece I was missing.

When I started doing this I was using SSMS as well but made the switch to Red Gate SQL Compare a while back. Much smoother process now.
Friday, February 01, 2008 7:27 AM by Shawn

# re: Versioning Databases – The Baseline

In regards to creating a baseline script, if it's a green field project here's what has worked well for me in the past.

Create a fully automated build script for the DB which drops the DB if it exists and then builds it from scratch including any "dev test" test data and static data.

Everytime a dev updates from source control, they rebuild their local DB. This requires us to only keep a create script around and no alter scripts (remember this is green field).

Once release one is out the door, we start writing alter scripts.

I've found this very easy to setup and maintain since we can make drastic changes to the model without the need to write (sometimes) time consuming alter scripts, at least in the beginning of the project. This also has the benefit of being able to use the built in SQL Server Management tools and then export the script when done if you aren't using an OR/M tool to create your script.
Friday, February 01, 2008 8:55 AM by Jason Haley

# Interesting Finds: February 1, 2008

Friday, February 01, 2008 4:38 PM by scott

# re: Versioning Databases – The Baseline

@Sean:

That's a good approach. I wish every project was a green field...
Friday, February 01, 2008 8:07 PM by Alvin

# re: Versioning Databases – The Baseline

Scott, do you know any tool (or script) that can script out data?
Friday, February 01, 2008 9:22 PM by K. Scott Allen

# Versioning Databases – Change Scripts

After considering the three rules and creating a baseline, an entire team can work with a database whose...
Friday, February 01, 2008 9:25 PM by Jesse

# re: Versioning Databases – The Baseline

Here's an open source tool I wrote that uses SMO to generate scripts from the cmd line. It's useful if you want to automate things.

http://www.codeplex.com/scriptdb
Friday, February 01, 2008 9:25 PM by scott

# re: Versioning Databases – The Baseline

Friday, February 01, 2008 9:26 PM by scott

# re: Versioning Databases – The Baseline

Plus there is Jess's tool :)
Friday, February 01, 2008 9:32 PM by BusinessRx Reading List

# Versioning Databases – Change Scripts

After considering the three rules and creating a baseline , an entire team can work with a database whose
Saturday, February 02, 2008 5:38 AM by Christopher Steen

# Link Listing - February 1, 2008

Link Listing - February 1, 2008
Saturday, February 02, 2008 5:39 AM by Christopher Steen

# Link Listing - February 1, 2008

WPF Finding an ancestor of a WPF dependency object [Via: Philipp Sumi ] Code Camps South Florida CodeCamp!...
Sunday, February 03, 2008 3:49 AM by Zubair.NET!

# re: Versioning Databases – The Baseline

@Alvin,

Here's another tool on codeproject.com that makes my life alot easier.

http://www.codeproject.com/KB/database/SQLStatementGenerator.aspx
Sunday, February 03, 2008 3:07 PM by Gonzalo

# re: Versioning Databases – The Baseline

Nice posts about versioning; but how do you deal with scripts that need to be back-ported to maintenance branches?

Say you have your trunk or main code-line on version "02.01.0001", and you add a new DB script to fix a bug. You commit, everything is fine.

Then, you need to fix the same bug on a previous branch, say "01.09.0050", you merge everything in your branch, commit, and everything is OK.

But, what happens when the branched version "01.09.0050" is updated to "02.01.0001" or higher version? How do you keep an 'identity' of the script to avoid executing it again?

The only possible way is to have idempotent scripts?
Sunday, February 03, 2008 7:40 PM by K. Scott Allen

# Versioning Databases – Branching and Merging

Previous entries:
Three rules for Database Work
The Baseline
Change Scripts
Views, Stored Procedures...
Sunday, February 03, 2008 8:32 PM by scott

# re: Versioning Databases – The Baseline

@Gonzalo:

It's tricky as I point out here: http://odetocode.com/Blogs/scott/archive/2008/02/03/11746.aspx, and I can't say I have the perfect answer. Your scenario (merge mainline into branch) isn't one my process will handle smoothly. The scripts in this case will have to be idempotent, which I usually don't worry about. It might mean you have to modify the published script in the trunk to make it idempotent, which is really evil, I know. You can call me bad names now :)

Monday, February 04, 2008 5:21 AM by Christopher Steen

# Link Listing - February 2, 2008

WPF Finding an ancestor of a WPF dependency object [Via: Philipp Sumi ] Code Camps South Florida CodeCamp!...
Monday, February 04, 2008 5:21 AM by Christopher Steen

# Link Listing - February 2, 2008

Link Listing - February 2, 2008
Monday, February 04, 2008 9:14 AM by Will Gage

# re: Versioning Databases – The Baseline

First of all, thanks for the insights! I like the general approach that you're taking, but I'd argue that requiring a separate database to manage your database update scripts might be overkill. I much prefer the idea that you can build your whole app out of just what's in the source control system, without other external dependencies, and I think that's possible here. Rather than a table which maintains your list of update scripts, you can simply have a directory which contains update scripts that adhere to a file naming convention: ${id}.${majorReleaseNumber}.${minorReleaseNumber}.${pointReleaseNumber}.sql . The date information should be maintained by the version control system, but otherwise, the file naming convention captures all of the information you were keeping in a table. Now all your developers need to do to get a list of changes is: ls.
Monday, February 04, 2008 10:12 AM by Community Blogs

# Versioning Databases – Change Scripts

After considering the three rules and creating a baseline , an entire team can work with a database whose
Monday, February 04, 2008 10:50 AM by Bob Archer

# re: Versioning Databases – The Baseline

We use a tool called DbGhost. It is designed for managing DB change and source controlling the schema/data scripts.

We only generate change scripts for each release version... not for each build. For the interium build versions each dev has a copy of Dbghost and can use it to update thier local db from script updated from subversion. Or, they can grab the "template" db's backup to get up to date if they'd like.

This works very well for us. DbGhost also has a command line interface which makes it easy to run from NAnt or whatever build tool you use.
Tuesday, February 05, 2008 5:09 AM by Nitin Badole

# Version Control for Database

We version our .net project assemblies, code files (in vss). Also we come across the various versions
Tuesday, February 05, 2008 7:10 AM by Tod1d's Thought Process

# Article of the Week: Three Rules for Database Work

Article: Three Rules for Database Work Author: K. Scott Allen Description: Short article on why database
Tuesday, February 05, 2008 8:57 AM by Paul

# re: Versioning Databases – The Baseline


<i>This might sound like I'm suggesting you need to do some big, up front schema design – but that's not true.<i>

I disagree, this is something you should be doing! Remember your app will probably not be the only thing accessing the data. Having a well-designed schema will help not just your app, but all other apps that will come along that need access to this data.
Thursday, February 07, 2008 7:56 PM by David Keaveny

# Dependencies

Any thoughts on how to handle SQL object dependencies? I can see that with your hybrid approach, you can script the tables with their indexes and constraints etc and presumably SQL Server will sort out dependencies, especially if you use schema binding.

So what happens if you have your views, UDFs and SPs in one-file-per-object and you have some batch file to execute all the scripts - how do you ensure that stored procedure B, which is depended on by stored procedure A, is created first?
Friday, February 08, 2008 7:04 AM by scott

# re: Versioning Databases – The Baseline

@David:

Actually, there is another text file checked in that lists all the sproc and view files to apply to a database. It's more work to maintain the list, but the tool will then apply the sql files in order, so you can control the dependancies. I'm told Oracle can do this automagically...
Tuesday, April 15, 2008 10:53 AM by .Net World

# Versioning Databases – Branching and Merging

Previous entries: Three rules for Database Work The Baseline Change Scripts Views, Stored Procedures
Tuesday, April 15, 2008 10:56 AM by .Net World

# Versioning Databases – Change Scripts

After considering the three rules and creating a baseline , an entire team can work with a database whose