TRY AND ERROR

気になったこと、勉強したこと、その他雑記など色々メモしていきます。。Sometimes these posts will be written in English.,

AWS Aurora restarted due to an error Out Of Memory

Recently, there was a problem that Aurora database had been restarting at the same time on daily.
Since at the time batch with a huge query had processed, so we guessed it was the cause of restarting Aurora.
We asked AWS Technical Support the reason of the problem, then we recieved below answer.

We think your guess is almost correct. According to your Cloud Watch, we guess that Aurora restarting is maybe caused by the batch process.
In default, 75% of the memory on Aurora is assigned at innodb_buffer_pool.
This buffer is mainly used for query caches, so other uses like table caches, log buffers, memory used in each connection is assined at 25% remained memory.
Therefore you're not able to use the full of 25% memory just for your queries, actually it's less than 25%.
In this case, Due to the memory size used by your batch was exceeded the actual enabled memory size, OOM error occured.

The conceivable actions for this problem are like below.

  • Decrease the "innodb_buffer_pool_size" from default(75%) to 50~60%.

Set {DBInstanceClassMemory*2/4} to "innodb_buffer_pool_size" in parameter group console.

  • Upgrade DBInstance class.
  • Optimize your query.

In these, we reccomend the first action for this time.

In this time, We took the first action for the problem. And we haven't face the OOM problem since then.
We learned a lot from AWS Technical Support, so we appreciate them far too much.