POST DIRECTORY
Category software development

At Haught Codeworks the team is always working on many different projects, so it’s very important that developers can become productive quickly on any of them. This is necessary both for developers new to a project and trying to come up to speed, as well as developers familiar with a project that have been away from it for a while. We’ve found that the ability to easily and consistently generate a reasonable set of data for developer environments makes this a lot easier.

Over time we’ve recognized a related requirement on some of our projects: the need to easily load a consistent set of data for non-development purposes. This may be to support a demo environment that needs to be easy to set up and also easy to reset to a consistent state. It may be the need to load realistic data into environments where clients are reviewing and approving new features. There may be a need to generate large data sets to test, analyze, and tune application performance.

The ability to easily load this kind of special purpose data seems to be common within a project as well as across different projects. We follow a few simple principles and employ a consistent tooling to meet these requirements.

Principles

We use a combination of seed data scripts and rake tasks to meet these needs, and we follow a few guiding principles to keep things easy and consistent.

Our db/seeds.rb script is restricted to setting up an application with the bare minimum data required for it to be usable. This almost always includes creating an initial administrative user and possibly populating user roles. It can also include application specific data like fixed product categories, financial transaction types, or database persisted application settings. The important thing is that the core seed script does not include anything we wouldn’t want to see in a production deployment.

When we identify a need for special purpose data, we add a specialized seed script and make it simple to execute. These special purpose scripts should not duplicate data created by the minimal db/seeds.rb script and are intended to run after the minimal seeds are loaded. In order to keep things simple, we assume that all of our seed scripts focus on creating the desired data in a logical order without attempting to find and update existing records.

How We Do It

We add special purpose seed scripts to a db/seeds directory. These are normal ruby scripts and can populate the database any way you want, from using ActiveRecord models to loading database exports to filling the database from CSV files. The last piece is a rake script in lib/tasks/seed.rake that generates documented tasks for executing the scripts in db/seeds.

# lib/tasks/seed.rake
namespace :db do
  namespace :seed do
    Dir[Rails.root.join('db', 'seeds', '*.rb')].each do |filename|
      desc "Loads the seed data from db/seeds/#{File.basename(filename)}"
      task File.basename(filename, '.rb').to_sym => :environment do
        load(filename)
      end
    end
  end
end

There might be special purpose seed scripts like db/seeds/dev.rb to set up a development environment and db/seeds/demo.rb to support client demos. We might support feature review instances of our application using db/seeds/review.rb. We could have a db/seeds/performance.rb script to generate large amounts of fake data to test and tune page load times. Maybe even a db/seeds/import_production.rb script that exports a database dump from our production environment and loads it into the development database.

Seeds In Action

Once these scripts are in place, setting up a new developer environment is as easy as rake db:setup db:seed:dev. When switching between branches we can reset the whole database to a consistent and usable state using rake db:reset db:seed:dev.

We use Github and Heroku support for review apps, where each new pull request in Github automatically builds a new application instance for that PR. To load data that the client can use when reviewing that environment, we just need to add a scripts.postdeploy setting to the app.json file:

"scripts": {
  "postdeploy": "bundle exec rake db:schema:load db:seed db:seed:review"
}

When deploying a new version of the application to the client’s demo environment on Heroku, we can reset everything to a consistent state using pg:reset and a simple rake command.

heroku pg:reset
heroku run rake db:schema:load db:seed db:seed:demo

The Benefits Of Consistency

Make your special purpose data scripts consistent and easy to use, and you will see a number of benefits. Developers new to a project get a jump-start on their local development environments while also seeing benefits when switching between branches. Clients will see the benefits of consistent and resettable data they can use to demo their application. They also get a basic set of data when reviewing new features in automatically deployed review apps. And most importantly the db/seeds.rb seed script is reserved for the bare minimum data required to stand up a new production environment, so the team is always ready for a clean and functional production deploy.

''