Category software development

In a previous post, I described a few ways to use seeds in a Rails app. Here I want to bring to life the real challenges in using seeds and what I learned about seeds through a project’s life cycle.

As the project matured, our seeds.rb file became less used and poorly maintained. It used to be a reliable way to create a starter set of objects that would give a developer an easy way to see the app in action, without having to create those objects herself. Now maintaining it seemed like a waste of time.

We didn’t give up on it; we knew it was still useful, but we had to understand that it had a different purpose at different stages of the project and in different environments.

Before Production usage

Before the app was meant for production use by real users, seeds contained mock data meant to test edge cases and unlikely scenarios, with fake names and text. We needed feedback from an internal audience for confidence in the data schema and UI design. Seeds also provided a way to set up everything we needed to develop more involved features that required complicated data. We needed to create users with different levels of access, hierarchical object structures with multiple relationships, and reporting tools that sliced data in many ways.

During this period we worked to acquire realistic data. We asked our client for the exact things that will go into the database to get a clear idea of how our code and UI would interact with reality. We had no idea if our layout could accommodate long text blobs, how much we needed to truncate, or how many levels of abbreviation we needed for the same heading. Additionally, because the project had a large framework of objects and relationships, it was prudent to incorporate a realistic version of this framework into our seeds. Now we could run the app locally with all the data created up front, allowing us to focus on tweaking the app internals.

Preparing for Production

Before pushing to production, it helps to have at least one environment identical to your anticipated production environment. We always have at least two: development and staging. Development is for the development team to test changes in a production-like environment; staging is for the business owner or client to verify the changes. We also often create a demo environment meant to be identical to production, but with data slightly tailored to a specific customer or certain experimental features enabled.

Having multiple environments that resemble production allows us to test integration with other services upon which the app depends, like email sending, API calls, logging, and file upload services. You can seed these environments to get started, or you can copy another environment’s database.

When we launched staging our seeds still contained unrealistic data. Since we wanted to provide our client with the most realistic environment, we had him populate the staging database manually through an Admin portal. Whenever we needed a new field on the models in question, the client would manually populate that field on every object. This became so painful that we created an importer that accepted a comma-separated text file and create objects, saving the client hours (maybe days) of effort. If your app requires a large amount of data before you can launch, look into building an importer like this.

We planned to copy the staging environment’s database to production. We let the client groom the data in staging as much as needed to make it realistic and production ready.

After Production

Finally production went live and started amassing data created by user interaction. Even if we had chosen to seed the production database rather than copy an existing one, our seeds would have become irrelevant to the production environment.

Repeat after me:

Once I have real data, I will never use seeds in production.

So what happens to seeds?

Seeds had significance up until we launched production. After that, seeds became outdated. As we built more migrations, the seed objects didn’t have the new attributes defined, and running the seed task produced errors. Updating seeds didn’t seem worth it because we had an importer or we could copy an existing database. I questioned the point of seeding ever again. But seeds are still useful for getting a starter environment up and running. We would appreciate having a working seeds.rb for new developers, and for accidental changes to the database that we’d rather not fix manually. I also like to create a bare environment less populated than production to see how the app behaves with less data.

At this stage of the project, seeds have only one environment where they matter: local development. We realized that the seeds file didn’t even need most of the objects that it created. We thinned it down to the bare minimum to make the app work, and one or two of each model that would populate parts of the UI with sample data. It turned out to be very little work and maintaining the seeds file is now cheap. Because seeds.rb is a ruby code representation of objects that you can build in a modular way, I think seeds are better to use in a team’s local development environments than a database copy.

The shrinking scope of your seeds.rb file shouldn’t lead to a crufty dusty unusable version as your project matures if you understand its purpose at different stages of your project. Keep those seeds alive or kill them, but don’t let them become dead weight.