Earlier this week, I had an interesting problem someone asked me to solve. A custom analytics tool had been set up which was gathering lots of data at a rate of about 10 rows/second. The analytics table had 5.3 million rows, was innodb, had two fields which were nested JSON (stored as TEXT) and NO INDEXES.

The customer wanted to be able to run reports on this table and was unable to as mariadb would just lock up (as expected) when trying to use any kind of clause.

What made the issue more interesting was that we couldn’t add any indexes to the table as it was ’live’ and had loads of data coming in all the time. Because of this, essentially that table had to be left alone.

The solution I came up with in the end was to create a second table based closely on the first with all the indexes we wanted to query on. I then wrote a simple script in PHP which would go through the first table and copy the data into the second, at the same time parsing the JSON to pull out the things we wanted to query on into their own indexed columns on the second table.

To make things faster I made the script get 1000 rows at a time and insert in batches of 10. The batch INSERTs were many times faster than doing them individually. The migration script used a counter set in a file (which was the primary key on the source table) so that it could remember where it got to in the process thereby only ever taking a long time on it’s initial run.

I realise there are tools like gho:st that do online migrations like this but it felt like overkill (I initially quoted just a couple of hours for this job as client wanted it done fast/cheap).