MySQL - Concurrency Handling and Isolation Levels Simplified

I spent ages wrapping my head around isolation levels and concurrency in MySQL, and had to read definitions from various sources over and over again mainly because they were slightly incorrect, or would leave out some small detail, and with transactions/concurrency the tiny details matter a lot! Perhaps others will benefit from the way I like to define/explain things at the most basic/dumbed-down level I can, but I'm mainly writing this because I am probably going to read this again in a few months, and nobody explains things better to me than myself.

If you spot something incorrect, please point it out in a comment below with a reference to a source, and I will make sure to update accordingly!

Operation

  • The smallest action being performed. E.g. reading or writing to a cell

Query

  • A line of logic that executes operations.
  • E.g. SELECT * FROM `my_table`

Atomic

  • The smallest unit - cannot be divided up
  • All or nothing (all sub parts succeed or fail)

Transaction

  • A single or grouping of queries that form an atomic (single) operation
  • All queries in a transaction will succeed or the changes are rolled back so that all changes made will be undone. If nothing else was simultaneously going on, then the database would be in the same original state as it was before the transaction started.
  • Usually started with Begin or Start Transaction.
  • If you don't explicitly start/stop a transaction, each query forms an individual transaction. (autocommit=1 by default).

Transactional Good Practice

  • Only queries that are related or dependant on each other should be put into a transaction due to the 'overhead' (computer resources) that need to be used to implement the 'atomic' behaviour.
  • Common example: Transferring money from one account to another.
    • Placing money in one account should only be performed if the money was taken out of the other and vice-versa.
    • Otherwise you may have free money, or a very unhappy customer.

Implicit Commit

  • A step the results in a commit being run.
  • E.g. UNLOCK TABLES will run a commit if any of the tables it unlocks was locked by the current transaction.

Concurrency

  • Lots of stuff happening simultaneously (at the same time!)

Isolation Levels - What you came for!

  • The 'mode' of which a transaction operates in order to handle concurrency. E.g What should the database do if something is being written to whilst it is also being read.
  • There are various 'levels' whereby a higher level could be considered 'safer' but will cost more resources (CPU/RAM etc) to implement.
  • You may think the following, but you would be quite wrong:
    "The computer is not REALLY reading a particular cell and writing to it at the same time, or the chances are minescule, so I don't have to worry about this"
  • Actually the chances are quite high because:
    • The entire set of all queries in a transaction are considered 'undivisible', and a group of read/write queries may take minutes or even hours to complete
    • Hundreds of queries can hit a server at the same time.
    • There may be a queue of queries, so even if your server isn't currently getting 'hit' it may be dealing with a backlog that has built up.
  • They key thing here is that isolation levels are mostly about achieving consistency whilst avoiding "locks" wherever possible because a lock can lead to:
    • Other CPUs/threads doing nothing, even whilst there is a backlog/queue of queries waiting to run. (reduced throughput)
    • A deadlock whereby two transactions currently running cannot complete because they each require the other to finish before proceeding (a computerized traffic jam).

A Note About MyISAM

  • MyISAM is a nontransactional engine, so everything you read about isolation levels here does not apply!
  • MyISAM can only use locks, which is no substitute
  • Rule of thumb: Only use MyISAM for read-only (lookup) tables.
    • MyISAM is especially good/fast with this due to its table (instead of row) level locking.
    • MyISAM supports a compression feature which further increases read speed, but requires read-only mode.

A Note About Locks

Locks are a 'primitive' way to prevent concurrency issues. Locking something essentially means that nothing else can act upon it. A thread can lock something before operating on it, to ensure that other threads don't change it whilst this thread is working. However, preventing other CPUs from running (blocking) slows down overall performance (throughput).

It would be much better if a 'history' of data was kept, so that all processors/cores can operate on the same data at the same time. They now just have to make sure they read/write to the cell's history at the relevant point in time. This point in time is usually (but not always) the time at which a transaction started, and not necessarily what time it is now. Obviously keeping a history requires resources, so you end up with more throughput at the expense of some overhead, so having plenty of RAM helps. This is not necessarily how isolation levels actually work, but is a nice/simplified way to conceptualize it. Precise details of InnoDB's history system is beyond the scope of this post.

Read Locks and Write Locks

  • Read locks and Write locks both prevent writes when in place I.e. a write cannot 'override' an existing read lock
  • However:
    • Queries that update are scheduled ahead of read queries.
    • Read locks 'share'; they do not block other queries that want to aquire a read lock or just want to read without a lock.
    • Write locks do not share, not even with each other, they block everyone and should be avoided wherever possible.

Level 0 - Read Uncommitted - Quick and Dirty

  • Also known as the "dirty read".
  • Queries will read uncommitted data. (uncommited being any transaction that is mid-way through processing and hasn't yet reached it's COMMIT end point. This includes itself!
  • Rarely used because its performance is not that much better than the other isolation levels, but has a massive tradeoff for that.
  • Anyone needing to read this tutorial should probably avoid setting to this level.
  • Demonstration of its self-affecting property (unique to this isolation level)
    # a is currently 1
    Transaction a: START TRANSACTION;
    Transaction a: Read a; # fetches 1
    Transaction a: Set a=2;
    Transaction a: Read a; # fetches 2
    Transaction a: COMMIT;
  • Demonstration of other transactions affecting this one (even if it starts later):
    # a is currently 1
    Transaction a: START TRANSACTION;
    Transaction a: Read a; # returns 1
    Transaction b: START TRANSACTION;
    Transaction b: Set a=2;
    Transaction a: Read a; # returns 2
    Transaction a: COMMIT; # All the fetched rows are returned
    Transaction b: COMMIT; # All the fetched rows are returned

Level 1 - Read Committed

  • The default setting for most databases except mysql
  • Queries will read only data that has been committed by other transactions that finished (committed) before this statement/query started.
  • Important notes/drawbacks
    • Please note the use of the terms "query/statement" in bold, instead of "transaction"
    • This means that two of the exact same SELECT query within the same transaction could return very different results because an "outside" transaction completed between this transaction's two reads! I attempt to demonstrate this below:
      Transaction a: Read table 1; # Rows with some data within are fetched
      Transaction b: Update table 2 so that all row cell's are NULL;
      Transaction b: Commit; # Commit happens before the second read
      Transaction a: Read table 1; # A bunch of empty rows are fetched
      Transaction a: Commit. # All the fetched rows are returned
    • Example of the safety it provides:
      Transaction a: Read table 1; # Rows with some data within are fetched
      Transaction b: Update table 2 so that all row cell's are NULL;
      Transaction a: Read table 1; # Same data fetched from the first read
      Transaction b: Commit; # Commit happens after the second read
      Transaction a: Commit. # All the fetched rows are returned

Level 2 - Repeatable Read

  • MySQL's default level (whew!)
  • Guarantees that all read statements within the same transaction will return the same result
  • Think of this as taking a snapshot of the database at the start of the transaction, and performing all the read statements on that.
  • It's not actually taking a snapshot, but with InnoDbs handling of "phantom rows", you can think of it like this as the end result is the same.
    • Note that if you are not using InnoDB or XtraDB, then this isolation level may suffer from phantom reads
    • A phantom row is an extra row that may appear because an external transaction INSERTED a new row between reads in this transaction.
    • Engines other than InnoDB and XtraDB suffer from phantom reads because they can only handle rows that have a 'history', which fresh INSERTS don't have.
  • Any changes happening whilst this transaction takes place do not affect this transaction's read statements.
  • Important: Other transactions may still be taking place, and even on the very same rows/columns, so this transaction isn't tying up the database and building a massive queue if it takes ages to complete.) This is in comparison with a read lock, which would block writes so, so don't just use locks everywhere!
  • Suddenly, how long the queries within a transaction take doesn't seem to matter any more. (No race condition).
  • Demo example:
    Transaction a: Read table 1; # Rows with some data within are fetched
    Transaction b: Update table 2 so that all row cell's are NULL;
    Transaction b: Commit; # Commit happens before the second read
    Transaction a: Read table 1; # Same data fetched from the first read
    Transaction a: Commit.
  • Use case: You have a series of tables that you just want to pull some rows directly from. You could use "UNION ALL" or JOIN, but running these operations results in combining the tables which can be quite CPU intensive as it merges all the keys/indexes into a temporary table. Instead it is much faster to wrap a couple of SELECT statements in a transaction to ensure that a 'snapshot' view is still taken.

Level 3 - Serializable

  • The most expensive, but safest isolation level.
  • Forces transactions to be ordered so that they can't possibly conflict.
  • Essentially places a lock on every row it reads.
  • Leads to lots of timeouts and deadlocks/rollbacks
  • Rarely used due to the previous points.

References

No comments:

Post a Comment