• 周日. 11月 27th, 2022

5G编程聚合网

5G时代下一个聚合的编程学习网

热门标签

Understand the transaction principle and transaction isolation of MySQL

[db:作者]

1月 6, 2022

A transaction is a set of ungroupable operations , These operations are either successful , Or cancel the execution . The most typical scenario that requires transactions is the transfer between bank accounts : If A The account should be given to B Account transfer 100 element , that A The account should be deducted 100 element ,B The account needs to be increased 100 element , Only when the data of these two accounts are changed successfully can the transfer be regarded as successful .

This article deals with MySQL A lot of knowledge about lock mechanisms , If you don’t know about this, you can read the blog first 《 Quickly understand MySQL Lock mechanism and deadlock checking of 》

ACID principle

Business has four principles : Atomicity (Atomicity)、 Uniformity (Consistency)、 Isolation, (Isolation) And persistence (Durability). These four characteristics are abbreviated as ACID principle .

  1. Atomicity (Atomictiy): A transaction is an atomic operation that contains a series of operations . The atomicity of transactions ensures that all of these operations complete or fail .

  2. Uniformity (Consistency): The execution of the transaction cannot destroy the integrity and consistency of the database data , A transaction before and after execution , Databases must be in a consistent state .

If the database system fails during operation , Some transactions are interrupted before they are completed , Some of the changes made to the database by these unfinished transactions have been written to the physical database , At this point, the database is in an inconsistent state .( In qualified database products , After the database restarts , All inconsistent transactions will be rolled back according to certain rules )

  1. Isolation, (Isolation): The ability of a database to allow multiple concurrent transactions to read, write, and modify its data simultaneously , Isolation can prevent data inconsistency caused by cross execution when multiple transactions are executed concurrently .

  2. persistence (Durabilit): After transaction ends , Changes to data are permanent , Even if the system fails, it will not be lost .

1. Atomicity

Atomicity (Atomicity) We can’t do without MySQL Transaction log for undo log Log type , When a transaction needs to be rolled back, the database state needs to be rolled back before the transaction starts , That is to say, it is necessary to cancel all the successful execution sql sentence .undo log Played a key role .

When a transaction modifies the database ,InnoDB I’m going to generate the corresponding undo log; If the transaction fails or is invoked rollback, Causes the transaction to be rolled back , You can use undo log The information in rollback the data to what it looked like before the modification .

undo log

undo log Mainly for transaction rollback service . In the process of transaction execution , Except for the records redo log, There will be a certain amount of records undo log.undo log Record the status of data before each operation , If rollback is needed during transaction execution , It can be based on undo log Roll back . Rollback of a single transaction , Only operations done by the current transaction are rolled back , It doesn’t affect the operation of other transactions .

undo log The role of :

  1. To ensure the atomicity of the data , Record a version before the transaction , For rollback ,

  2. adopt mvcc and undo log Realization innodb Transaction repeatable read and read committed isolation levels .

among ,undo log It is divided into :

  • insert undo log:insert In operation undo log.

  • update undo log:delete and update The operation produces undo log.

  1. insert operation

because insert Records of operations , Visible only to the transaction itself , Not visible to other things . so undo log You can delete… Directly after the transaction is committed , There is no need for purge operation ,

  1. delete operation

Delete An operation in a transaction does not really delete a row of data , It’s a kind of Delete Mark operation , Mark the record with Delete_Bit, Instead of deleting records . It’s a kind of ” False deletion “, After through purge Thread deletion .( The reason is to support MVCC)

  1. update operation

update There are two cases :update Whether the column of is a primary key index .

If it’s not a primary key index , stay undo log How is direct reverse recording in update Of . namely update It’s direct .

If it’s a primary key index ,update It is divided into two parts : Delete the line first , Insert another row of target lines .

And insert undo log Different ,update undo log journal , When the transaction is committed ,innodb It will not be deleted immediately undo log, The undo log Put it in the delete list , After through purge Thread deletion .

Because it may be used later undo log, For example, the isolation level is repeatable read At the time of the MVCC, Transactions read the latest committed row version when the transaction is opened , As long as the business doesn’t end , The row version cannot be deleted ( namely undo log Can’t delete ), And undo log The allocated pages can be reused to reduce storage space and improve performance .

purge The two main functions of threads are : clear undo Page and clear page With a Delete_Bit Identified data rows .

2. Uniformity

Consistency is through the atomicity of transactions 、 Persistence and isolation .

  • Atomicity : Statements are either fully executed , Or not at all , Is the core feature of transactions , Transactions themselves are defined in terms of atomicity . Implementation mainly based on undo log.

  • persistence : Ensure that data is not lost due to downtime or other reasons after the transaction is committed . Implementation mainly based on redo log.

  • Isolation, : Ensure that transaction execution is as unaffected by other transactions as possible .InnoDB The default isolation level is RR,RR The implementation of isolation is mainly based on the lock mechanism ( contain next-key lock)、MVCC( Include hidden columns of data 、 be based on undo log Version chain of 、ReadView).

3. Isolation,

Isolation means that operations inside a transaction are isolated from other transactions , See the following section for details “ Transaction concurrency ” And “ Solve the problem of transaction concurrency ”.

4. persistence

Persistence is after a transaction is committed , The changes will be saved forever , No data loss due to system failure ,

redo log

stay innoDB In the storage engine of , The transaction log is passed through redo log And log buffer implementation . When the transaction is opened , Operations in transactions , Will be written to the log buffer of the storage engine first , Before transaction commit , These buffered redo log You need to refresh to disk in advance for persistence , This is the log first mechanism (Write-Ahead Logging). When the transaction is committed , The data file mapped in the log buffer will be slowly refreshed to the disk . At this time, if the database crashes or goes down , So when the system restarts for recovery , It can be based on redo log Logs recorded in , Restore the database to its pre crash state . Unfinished business , You can continue to submit , You can also choose to roll back , It depends on the recovery strategy .

When the system starts , The storage engine is already working for redo log Allocate a contiguous block of storage space , Record in the form of sequential addition Redo Log, By order IO To improve performance .redo log The size of the log is fixed , That is to say, when the record is full, write it from the beginning . All transactions share redo log Storage space ,Redo Log In the order in which statements are executed , Record in turn .

Buffered redo log Refresh to disk persistence , The refresh rate is determined by innodb_flush_log_at_trx_commit Variables to control :

  • 0 : Refresh writes to disk per second , When the system crashes , Will lose 1 Seconds of data .

  • 1: The transaction is written to disk every time it is committed .

  • 2: Refresh writes to disk per second , But follow 0 There is a difference .

Transaction concurrency

1. Dirty reading

Dirty read refers to the access to uncommitted data of another transaction in one transaction .

If the session 2 to update age by 10, But in commit Before , conversation 1 At this time to get age, So the value you get is session 2 Not yet commit Value . If the session 2 Updated again age Or executed rollback, And conversation 1 It’s got age=10 Value , This is dirty reading .

2. It can’t be read repeatedly

A non repeatable read is a transaction that reads the same record 2 Time , The results are inconsistent .

Because the data was changed in the middle of the read , So conversation 1 The results obtained during the transaction query are different .

3. Fantasy reading

Unreal read is a transaction read 2 Time , The number of records obtained is inconsistent .

Because in conversation 1 A new data is inserted between , So the two data are different .

Transaction isolation level

In order to solve the problem of transaction concurrency mentioned above ,SQL The standard specifies four isolation levels :

  1. Read uncommitted (Read Uncommited)

  2. Read submitted (Read Commited)

  3. Repeatable (Repeatable Read)

  4. Serialization (Serializable)

Isolation level Dirty reading It can’t be read repeatedly Fantasy reading
Read uncommitted There is There is There is
Read submitted non-existent There is There is
Repeatable non-existent non-existent There is
Serialization non-existent non-existent non-existent

1. Read uncommitted (Read Uncommited)

This isolation level is the least secure , Dirty reading allowed .

2. Read submitted (Read Commited)

There are non repeatable reads , But dirty reading is not allowed . Read submitted only allows access to submitted data .

3. Repeatable (Repeatable Read)

Make sure that during the transaction , When reading the same data multiple times , The values are consistent with the start of the transaction , Therefore, this transaction level prohibits non repeatable read and dirty read , But there may be illusory reading .

4. Serialization (Serializable)

Is the most stringent level of transaction isolation , It requires that all transactions be executed serially , That is, transactions can only be processed one by one , Cannot execute concurrently .

Here’s the default isolation level for mainstream databases :

Database Default isolation Level
Oracle READ_COMMITTED
MySQL REPETABLE_READ
Microsoft SQL Server READ_COMMITTED
PostgreSQL READ_COMMITTED
DB2 CURSOR STABILITY (a.k.a READ_COMMITTED)

Higher isolation level , The more data integrity and consistency can be guaranteed , But the greater the impact on concurrent performance . For most applications , Priority can be given to setting the isolation level of the database system to Read Committed. It avoids dirty reads , And it has better concurrent performance . Although it can lead to non repeatable reading 、 Unreal reading and the second kind of missing update , In business scenarios where such problems may arise , You can use pessimistic lock or optimistic lock to control .

Solve the problem of transaction concurrency

MySQL、Oracle In such a database , For the sake of performance, it’s not exactly in accordance with the above SQL Standards to achieve .

The way that database realizes transaction isolation , It can be divided into the following two kinds .

  1. Before reading the data , Lock it , Prevent other transactions from modifying data .

  2. Don’t put any locks on it , Through a certain mechanism to generate a data request point in time consistent data snapshot (Snapshot), And use this snapshot to provide a certain level ( Statement level or transaction level ) Consistent read of . From the user’s point of view , Databases can provide multiple versions of the same data , therefore , This technology is called data multi version concurrency control (MultiVersion Concurrency Control, abbreviation MVCC or MCC)

MVCC

There are some problems with using locks to control the level of transaction isolation , For example, to achieve RC( Read submitted ) Level , Business a Update a row of data , It needs to be right ( It’s actually an index record ) Add X lock , Block other transactions from reading or writing to the row , Business b To read this line, you have to wait until a Commit or rollback release lock , In this way, the concurrent ability of reading and writing will be greatly limited .

MVCC Is a lockless scheme , To solve transaction read – Write concurrency problem , Can greatly improve reading – Performance of write concurrent operations .

In order to understand MVCC, You need to understand the next version chain and ReadView, The business scenario is as follows :

Create a table book, Just three fields , They are primary keys book_id, name book_name, stock stock. Then insert some data into the table :

INSERT INTO book VALUES(1, ' data structure ', 100);
INSERT INTO book VALUES(2, 'C++ guide ', 100);
INSERT INTO book VALUES(3, ' Master Java', 100);

Version chain

For the use of InnoDB The table that stores the engine , The primary key index record contains two important hidden columns :

  1. Business ID(DB_TRX_ID): Whenever a transaction modifies a record in the primary key index , The current transaction id It was recorded that DB_TRX_ID in .

  2. rollback pointer (DB_ROLL_PTR): Whenever a transaction modifies a record in the primary key index , The old version of the record will be recorded in the undo In the log , adopt DB_ROLL_PTR This pointer can be used to get information about the old version of the record .

If the record is modified more than once in a transaction , Every modification will generate undo journal , And these undo Log pass DB_ROLL_PTR The pointer is concatenated into a version chain , The header node of the version chain is the latest value of the record , The end node is the initial value at the beginning of the transaction .

For example, in the table book Make the following changes :

BEGIN;
UPDATE book SET stock = 200 WHERE id = 1;
UPDATE book SET stock = 300 WHERE id = 1;

that id=1 The version chain is as shown in the figure below :

ReadView

about Read Uncommitted For isolation level transactions , Just read the record of the latest version in the version chain ; about Serializable For isolation level transactions ,InnoDB Using lock to access records ; And for Read Committed and Repeatable Read At the isolation level , All the submitted transaction modification records need to be read , That is to say, if the modification of a version in the version chain is not submitted , The record of this version cannot be read . So we need to make sure Read Committed and Repeatable Read Under isolation level , Which version in the version chain can be read by the current transaction . therefore ReadView To solve this problem .

ReadView Equivalent to a snapshot of a schedule record , In this snapshot, transactions related to the current record can be obtained , Which transactions are committed stable transactions , What are the active transactions , Which transactions are opened after snapshot generation . Thus, the latest version records that can be read in the publishing chain can be judged according to the visibility comparison algorithm .

stay MySQL in ,Read Committed and Repeatable Read The difference at the isolation level is that they generate ReadView The timing is different .

stay Read Committed Under isolation level , Generated every time data is read ReadView; And in the Repeatable Read At the isolation level, only the first transaction select When data is generated ReadView, All subsequent read operations will use the same ReadView.

Visibility comparison algorithm

The visibility comparison algorithm is based on transaction ID Comparison algorithm of . Business id It’s incremental , from ReadView We can get the smallest and largest transactions among the active transactions in the system at the time of snapshot generation id( The biggest business id In fact, the next transaction will be assigned in the system id value ), So you get an active transaction id The scope of the , be called ACTIVE_TRX_ID_RANGE. Then transactions smaller than this range id The corresponding transactions are all committed stable transactions , Transactions larger than this range are all transactions that are opened after the snapshot is generated , And in the ACTIVE_TRX_ID_RANGE In addition to the active transactions in the scope , They are also committed stable transactions .

The specific visibility comparison algorithm process is as follows :

  1. First, judge the version record DB_TRX_ID Fields and ReadView Of creator_trx_id Whether the fields are equal . If equal , It means that the records of this version are generated in the current transaction , Naturally, it can be read by the current transaction ; Otherwise, proceed to the 2 Step .

  2. If the version records DB_TRX_ID Field is less than range ACTIVE_TRX_ID_RANGE, Indicates that the version record is a committed transaction modification record , Visible to the current transaction ; Otherwise proceed to the next step .

  3. Version records DB_TRX_ID Field in range ACTIVE_TRX_ID_RANGE Inside : If the transaction ID The corresponding is not an active transaction ( Has submitted ), Visible to the current transaction ; If the transaction ID Corresponding to active transactions ( Not yet submitted ), Not visible to the current transaction , Then read the next version record in the version chain . Repeat the above steps , Until the version visible to the current transaction is found .

creator_trx_id Field :ReadView Properties in , Record create this record / The transaction that last modified the record ID.

If a version record is judged to be visible to the current transaction through the above steps , Then the query result returns this version record ; Otherwise, read the next version record and continue to judge according to the above steps , To the end of the version chain . If no version visible to the current transaction is found after traversing the version chain , The query result is empty .

Read Committed Under the MVCC

Suppose that Read Committed Under isolation level , The following transactions are in progress , Business id by 10:

BEGIN; //  Turn on Transaction 10
UPDATE book SET stock = 200 WHERE id = 2;
UPDATE book SET stock = 300 WHERE id = 2;

The transaction has not been committed at this time ,id by 2 The record version chain of is shown in the figure below :

Then open a transaction to id by 2 Query the records of :

BEGIN;
SELECT * FROM book WHERE id = 2; // here Transaction 10 Not submitted

When executed SELECT Statement will generate a ReadView, The ReadView Medium ACTIVE_TRX_ID_RANGE by [10, 11),creator_trx_id by 0( Because a separate transaction is allocated when a write operation is performed in the transaction id, Otherwise, the transaction id by 0). As previously described ReadView How it works , Query to the id=2 Of book,stock Value should be 100.

And then the business id by 10 Transaction commit for :

BEGIN; //  Turn on Transaction 10
UPDATE book SET stock = 200 WHERE id = 2;
UPDATE book SET stock = 300 WHERE id = 2;
COMMIT;

Start another transaction at the same time id by 11 The business of , But not submitted :

BEGIN; //  Turn on Transaction 11
UPDATE book SET stock = 400 WHERE id = 2;

here id by 2 The record version chain of is shown in the figure below : 

Then go back to the query transaction and query again id by 2 The record of :

BEGIN;
SELECT * FROM book WHERE id = 2; // here Transaction 10 Not submitted
SELECT * FROM book WHERE id = 2; // here Transaction 10 Has been submitted

When the second execution SELECT Statement will generate another ReadView, The ReadView Medium ACTIVE_TRX_ID_RANGE by [11, 12), Current affairs IDcreator_trx_id Still 0. according to ReadView Working principle of , Query to the id=2 Of book,stock The value is 300.

From the above analysis, we can find that , Because every time a query statement is executed, a new ReadView, So in Read Committed Transactions at the isolation level read the modified data of submitted transactions in the query schedule .

Repeatable Read Under the MVCC

Repeatable Read Transactions at the isolation level are generated only the first time a query is executed ReadView, Subsequent query operations in this transaction will not generate new ReadView, therefore Repeatable Read Under isolation level , Perform the same query multiple times in a transaction , The results are the same , This enables repeatable reading .

Add

redo/undo log and binlog The difference between :

  1. Different levels .redo/undo yes innodb Engine layer maintenance , and binlog yes mysql server Layer maintenance , It has nothing to do with which engine to use , What is recorded is the log record of all engine update operations .

  2. The records are different .redo/undo It records every page / The modification of each data , Belongs to the physical log + The combination of logical logs (redo log It’s a physical log ,undo log It’s a logical log );binlog It’s a logical log , Its record is corresponding to SQL sentence .

  3. The timing of recording is different .redo/undo In the process of transaction execution, it will write , and binlog Is written before the transaction is finally committed .binlog When to refresh to disk and parameters sync_binlog relevant .

The current reading

SELECT xxx LOCK IN SHARE MODE  Statement for the current read , Add S Lock and metadata read lock , Until the end of the transaction .

SELECT xxx FOR UPDATE Statement for the current read , Add X Lock and metadata read lock , Until the end of the transaction

This read mode reads the latest version of the record , Called current read . In addition to INSERT、DELETE、UPDATE operation , You also need to read the record first , Get recorded X lock , This process is also a current reading . Due to the need to lock the records , Block other transaction writes , So it’s also called lock or block reading .

stay MySQL Of Repeatable Read Under isolation level , Currently, the key lock is used to solve the problem of unreal reading .

Read the snapshot

stay Read Committed and Repeatable Read Under isolation level , ordinary SELECT Inquire about ( barring select … lock in share mode, select … for update) All read MVCC One version in the version chain , Equivalent to reading a snapshot , So it’s called snapshot read . This read mode will not lock , So read operations are non blocking , It’s also called non blocking reading .

In standard Repeatable Read Under isolation level, read operation will add S lock , Until the end of the transaction , Therefore, write operations of other transactions can be blocked ; But in MySQL Of Repeatable Read Read operation is unlocked under isolation level , Do not prevent other transactions from writing to the same record .

Be careful : The time point of snapshot reading is the first select To confirm . So even after the transaction starts update Again select, that select The data obtained corresponds to update After the action ReadView.

stay MySQL Of Repeatable Read Under isolation level , Snapshot read uses MVCC Solve the phantom reading problem .

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注