Technical Blog: How to migrate >1 million journalists
In one of our previous blogs, we described why we decided to build a new CRM, even though we already had one, and why that was such a good decision.
One tiny obstacle in this whole process was the fact that we changed the underlying data structure for the better, meaning that we had to migrate >1 million journalists and investors, >27 million emails with all their email engagement data points (reads, opens, clicks et cetera) from the old system to the new CRM. Without any data loss and without bothering our customers too much. How did we approach this?
It won’t come as a surprise that we didn’t manually scoop over all those contacts and other data points. That would be a bad idea not just because of time reasons, but also because manual processes inevitably introduce inconsistencies, or even mistakes. So the first thing our developers did was build a system to run the sequence of steps needed for a migration in an automated fashion, based on Airflow. We also built a system to do a testrun and validate the data on every migration before it actually took place.
Every migration basically contained the following steps, in chronological order:
- First the (not automatable) logistics:
- Categorize each customer on migration size and duration.
- Agree on a migration date with the customer via our Customer Success team.
- Secondly, the preparations:
- Run a test migration with the dataset of the customer on a staging environment.
- If the test fails: find out what went wrong and fix that, before trying another test run.
- Finally the technical Migration:
- Disable the customer’s old CRM system and connected modules, so that no new data comes in, which would otherwise get lost.
- Copy the data from the old databases.
- Convert and map the data points to make sure the data is properly formatted with the correct values for the new environment.
- “Paste” all the data to the new databases.
- If the migration is not successful, start a Roll Back:
- Purge all data from the new environment.
- Re-enable the old system and its modules so that everything is functional again, without data loss.
- Investigate what went wrong.
- Migration successful?
- Enable the new system and all its modules.
- The customer can immediately start using the new CRM with their existing data showing in the new environment.
Automated but supervised
Although the system automated the whole migration process, we kept monitoring every migration closely. We decided to run them sequentially and never in parallel. This was therefore quite a ‘heavy’ project for our developers. Although we tried to schedule most migrations during their ‘awake time’, in close alignment with our users of the CRM, this was not possible for all of our customers. Meaning our developers sometimes got out of bed at 3AM to check if everything went well. Even after 150 successful migrations. Because you never know.
Sometimes the pre-run tests would fail, resulting in an aborted process before the migration even started. The most common reason for that was a slightly different spelling of similar data, due to changing the underlying structure of our data. We made that change to make it easier for all users from one company, with different teams, to work on the same global dataset. A CRM capability that was highly requested by multinationals that value centralised control and efficiency as well as empowering their local teams to adapt to their local needs.
Previously, team A could have a contact working for cnn.com, while team B had the same contact working for CNN.com. In our new system, they would need to be merged into one contact. All these types of entries needed to be humanly analyzed, assessed (are they indeed single entries or do they need to be merged?) and then processed the right way.
Creative and expected customer use cases would also sometimes surprise our tests. For instance, one customer had 94 tags added to one single contact, which was much more than we had anticipated and tested for. This hefty number made the amount of characters exceed the max size of the database cell. Increasing the size limit made a retry of the testrun successful.
We had one actual case where we had to roll back the whole process, so having that precaution in place - a base condition - and fully automated was definitely a life saver. During the migration of one of our biggest customers in terms of CRM usage, a lot of recipients of their emails happened to engage with their emails. There is nothing you can do about it: somebody can even click on an email link 10 years after it’s sent to him. This scenario was actually already taken into account by us. We had put a validation system in place to discover the differences before and after the migration. But in this case, the volume of actions far exceeded our expectations and made the migration fail. Once we worked on the validation, we were able to run the migration a second time, this time successfully.
Proof of the pudding: CRM performance
Also, logically, as the migrations proceeded, customers started to use the CRM. Although we had performed many stress tests before going live, with many different types of mock customers and different volumes and set ups, the real proof of the pudding is always in the eating. One big thing we learned very quickly after having migrated a couple of customers, is that the amount of languages that a customer has in its account, has a big impact on the performance of the CRM, bigger yet than we had anticipated. Meaning that certain actions, such as pagination or sorting the overview differently, took too much time: multiple seconds. As soon as we realized this, we pivoted very quickly: we paused the migrations temporarily to investigate. The first thing we did was drill down to which calls were causing the biggest delay for the customers in question. We also added more caching and we came up with a supporting UX that clearly shows what’s going on, to make the waiting less painful. With all these mitigations in place, we reduced the load time to 1 second max and were able to pick up the migrations again after 1,5 weeks.
We started migrating customers in October 2023 and it took us almost 2 months to carefully migrate all 1 million journalists, 27 million emails and all other data. During the time, our monthly AWS invoices grew notably to keep the servers running that hosted the migration automation software. We improved the performance of the CRM significantly. And most importantly: we unblocked the development of the CRM, which you will notice in our upcoming Product releases.