In the last article 《 The illustration Linux Network packet receiving process 》, We sort it out in Linux The whole process of receiving the next packet in the system .Linux The process of receiving network packets by kernel can be roughly divided into receiving RingBuffer、 Hard interrupt handling 、ksoftirqd Soft interrupts handle several processes . Among them in ksoftirqd Soft interrupt processing , Take packets from RingBuffer Take it off , Processing sent to the protocol stack , And then send it to the user process socket In the receive queue of .
I understand Linux After working principle , There are two more important things . The first is hands-on monitoring , Will actually view the overall situation of network packet reception . The second is tuning , When you have a problem with your server , You can find the bottleneck , And will use the kernel open parameters to adjust .
Let’s start with a few tools
Before the formal content starts , Let’s get to know a few Linux Tools available when monitoring network card .
ethtool
First of all, the first tool is the one we mentioned above ethtool
, It is used to view and set network card parameters . The tool itself only provides a few common interfaces , The real implementation is in the network card driver . Because the tool is implemented directly by the driver , So it’s personal .
The command is complex , Let’s pick a few that we can use today
-i
Display network card driver information , Such as the name of the driver 、 Version, etc-S
Check the statistics of network card receiving and contracting-g/-G
View or modify RingBuffer Size-l/-L
View or modify the number of network card queues-c/-C
View or modify the hard break merge policy
Actually check the network card driver :
# ethtool -i eth0
driver: ixgbe
......
Here I see that the network card driver on my machine is ixgbe. With the driver name , You can find the corresponding code in the source code . about ixgbe
Come on , The source code of the driver is located in drivers/net/ethernet/intel/ixgbe
Under the table of contents .ixgbe_ethtool.c
The following are all realized for ethtool The correlation function used , If ethtool There’s something I don’t understand , You can find the source code in this way to read . What’s more, we mentioned earlier 《 The illustration Linux Network packet receiving process 》 Mentioned in NAPI When I collect my bags poll Callback function , When the network card is started open Functions are implemented here .
ifconfig
Network management tools ifconfig Not only can it be configured for network cards ip, Enable or disable the network card , It also contains some network card statistics .
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.162.42.51 netmask 255.255.248.0 broadcast 10.162.47.255
inet6 fe80::6e0b:84ff:fed5:88d1 prefixlen 64 scopeid 0x20<link>
ether 6c:0b:84:d5:88:d1 txqueuelen 1000 (Ethernet)
RX packets 2953454 bytes 414212810 (395.0 MiB)
RX errors 0 dropped 4636605 overruns 0 frame 0
TX packets 127887 bytes 82943405 (79.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
- RX packets: Number of total packages received
- RX bytes: Bytes received
- RX errors: Indicates the total number of errors received
- RX dropped: The packet has entered Ring Buffer, But packet loss due to other reasons
- RX overruns: According to the fifo Of overruns, This is because Ring Buffer Packet loss caused by shortage
Pseudo file system /proc
Linux The kernel provides /proc Pseudo file system , adopt /proc You can view the internal data structure of the kernel 、 Change kernel settings . Let’s run the question first , Take a look at what’s in this fake file system :
/proc/sys
Directory to view or modify kernel parameters/proc/cpuinfo
You can see CPU Information/proc/meminfo
You can view memory information/proc/interrupts
Count all hard interrupts/proc/softirqs
Statistics of all soft interrupt information/proc/slabinfo
Statistics of kernel data structure slab Memory usage/proc/net/dev
You can see some network card statistics
Let’s talk about fake files in detail /proc/net/dev
, Through it, we can see some relevant statistics of the network card in the kernel . Contains the following information :
- bytes: The total number of bytes of data sent or received
- packets: The total number of packets sent or received by the interface
- errs: The total number of send or receive errors detected by the device driver
- drop: The total number of packets dropped by the device driver
- fifo: FIFO Number of buffer errors
- frame: The number of packet framing errors.( Number of packet frame errors )
- colls: Number of conflicts detected on the interface
therefore , Fake documents /proc/net/dev
It can also be used as one of the tools for us to view network card work statistics .
Pseudo file system sysfs
sysfs and /proc similar , It’s also a pseudo file system , But compared to proc to update , Structure is clearer . Among them /sys/class/net/eth0/statistics/
The statistics of the network card are also included .
# cd /sys/class/net/eth0/statistics/
# grep . * | grep tx
tx_aborted_errors:0
tx_bytes:170699510
tx_carrier_errors:0
tx_compressed:0
tx_dropped:0
tx_errors:0
tx_fifo_errors:0
tx_heartbeat_errors:0
tx_packets:262330
tx_window_errors:0
Okay , After a brief understanding of these tools , Let’s officially start today’s journey .
RingBuffer Monitoring and tuning
As we saw earlier , When the data frame in the network cable reaches the network card , The first stop is RingBuffer( Network card through DMA Mechanism to send data frames to RingBuffer in ). So the first thing we need to monitor and tune is the network card RingBuffer, We use ethtool
Let’s check it out Ringbuffer.
# ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 512
RX Mini: 0
RX Jumbo: 0
TX: 512
Here I see the network card settings in hand RingBuffer The maximum allowed setting is 4096, The current actual setting is 512.
Here’s a little detail ,ethtool What you can see is actually Rx bd Size .Rx bd In the network card , It’s like a pointer .RingBuffer In memory ,Rx bd Point to RingBuffer.Rx bd and RingBuffer The elements in are one-to-one correspondence . When the network card starts , The kernel will be the network card Rx bd Allocate in memory RingBuffer, And set up the corresponding relationship .
stay Linux In the entire network stack of ,RingBuffer Play the role of a transfer station for sending and receiving tasks . For the receiving process , The network card is responsible for RingBuffer Write the received data frame in ,ksoftirqd The kernel thread is responsible for taking processing from it . as long as ksoftirqd Threads work fast enough ,RingBuffer There will be no problem with this transfer station . But let’s imagine , If at some point , A lot of bags came in an instant , and ksoftirqd I can’t handle it , What’s going to happen ? At this time RingBuffer It could be filled in an instant , After the packet network card will be discarded directly , Do nothing !
So how can we have a look at , Is there any packet loss caused by this reason on our server ? The above four tools can check the packet loss statistics , take ethtool
For example :
# ethtool -S eth0
......
rx_fifo_errors: 0
tx_fifo_errors: 0
rx_fifo_errors If not for 0 Words ( stay ifconfig It is embodied in overruns Index growth ), It means there is a bag because RingBuffer It couldn’t hold it and was discarded . So how to solve this problem ? Naturally, the first thing we think of is , enlarge RingBuffer This “ Transit warehouse ” Size . adopt ethtool I can modify it .
# ethtool -G eth1 rx 4096 tx 4096
In this way, the network card will be allocated a bit larger ” Transfer station “, Can solve the occasional instantaneous packet loss . But this method has a small side effect , That is, too many queued packets will increase the delay of processing network packets . So another solution is better ,. That is to make the kernel process network packets faster , Instead of letting the network packets fool around RingBuffer In line . How to speed up kernel consumption RingBuffer How about the speed of the mission in , take it easy , Let’s move on …
Hard interrupt monitoring and tuning
When data is received RingBuffer after , The next execution is the initiation of the hard interrupt . Let’s look at hard interrupts first , And then talk about how to optimize .
monitor
In case of hard interrupt, the pseudo file provided by the kernel can be used /proc/interrupts
To check .
$ cat /proc/interrupts
CPU0 CPU1 CPU2 CPU3
0: 34 0 0 0 IO-APIC-edge timer
......
27: 351 0 0 1109986815 PCI-MSI-edge virtio1-input.0
28: 2571 0 0 0 PCI-MSI-edge virtio1-output.0
29: 0 0 0 0 PCI-MSI-edge virtio2-config
30: 4233459 1986139461 244872 474097 PCI-MSI-edge virtio2-input.0
31: 3 0 2 0 PCI-MSI-edge virtio2-output.0
The above results are the output of a virtual machine I have . It contains a lot of information , Let’s come together :
- Input queue of network card
virtio1-input.0
The interrupt number is 27 - 27 The interruption of No CPU3 To deal with
- The total number of interrupts is 1109986815.
Here are two details that we need to focus on .
1) Why are all interrupts in the input queue CPU3 How about it ?
This is because of a configuration of the kernel , In the pseudo file system you can see .
#cat /proc/irq/27/smp_affinity
8
smp_affinity
Is in the CPU The affinity of binding ,8 It’s binary 1000, The first 4 Position as 1, It stands for the 4 individual CPU The core -CPU3.
2) For the process of receiving the package , The total number of hard interrupts is Linux The total number of bags received ?
No , The number of hardware interrupts does not represent the total number of network packets . The first network card can set interrupt merging , Multiple network frames can initiate an interrupt only once . second NAPI Hard interrupts are turned off when running , adopt poll Come and collect the bag .
Multi queue network card tuning
Now the mainstream network card basically supports multi queue , We can divide different queues into different CPU The core deals with , So as to speed up Linux The speed at which the kernel processes network packets . This is one of the most useful optimizations .
Every queue has an interrupt number , You can be independent of someone CPU The core initiates a hard interrupt request , Give Way CPU Come on poll
package . By putting incoming packets into different memory queues , Multiple CPU At the same time, we can launch consumption to different queues at the same time . This feature is called RSS(Receive Side Scaling, Receiver extension ). adopt ethtool
The tool can view the queue of the network card .
# ethtool -l eth0
Channel parameters for eth0:
Pre-set maximums:
RX: 0
TX: 0
Other: 1
Combined: 63
Current hardware settings:
RX: 0
TX: 0
Other: 1
Combined: 8
The above results show that the maximum number of queues supported by the current network card is 63, The current number of open queues is 8. For this configuration , At most, there can be 8 Core to participate in network packet collection . If you want to improve the ability of the kernel to receive packets , Simply increase the number of lines , This is bigger than RingBuffer More useful . Because it’s bigger RingBuffer Just give more space for the network frames to continue queuing , Increasing the number of queues allows packets to be processed earlier by the kernel .ethtool
Modify the number of queues as follows :
#ethtool -L eth0 combined 32
We said earlier , Which core does the hard interrupt occur on , It sends out the soft interrupt by which core to deal with . All by increasing the number of network card queues , This hard interrupt work 、 Soft interrupt work will have more core participation .
Every queue has an interrupt number , Each interrupt number is bound to a specific CPU Upper . If you’re not satisfied with one of these CPU binding , It can be modified by /proc/irq/{ Interrupt number }/smp_affinity To achieve .
It’s usually handled here , There is no big problem in receiving network packets . But if you have higher aspirations , Or you don’t have more CPU The core can be involved , What to do with that ? don ‘t worry , We also have a way to improve the single core processing network packet reception speed .
Hard interrupt merge
Let’s start with a practical example , If you are a development student , Your counterpart product manager has 10 A little need for your help to deal with . She has two ways to interrupt you :
- The first one is : The product manager came up with a requirement , I came to see you , Your details and requirements , And then I’ll ask you to change it for you
- The second kind : When the product manager thinks about the demand , I don’t want to disturb you , Wait till you save enough 5 I’ve come to see you once , You focus on
We don’t think about timeliness right now , Just think about the overall efficiency of your work , Do you think your work efficiency will be higher under that scheme ? Or to put it another way , Which working state do you prefer ? Obviously , As long as you are a normal Developer , I think the second one is better . For the human brain , Frequent interruptions can disrupt your plans , You’ve just thought that half of the technical solutions may be scrapped . When the product manager left , When you try to pick up a job that has just been interrupted , It’s likely to take some time to recall before you can continue working .
about CPU It’s the same thing ,CPU Before you do something new , The address space to load the process ,load Process code , Read process data , All levels cache Warm up slowly . Therefore, if you can properly reduce the frequency of interruption , Save more packets and send out interrupts together , On Promotion CPU It’s helpful to be efficient . therefore , The NIC allows us to merge hard interrupts .
Now let’s take a look at the hard interrupt merge configuration of the network card .
# ethtool -c eth0
Coalesce parameters for eth0:
Adaptive RX: off TX: off
......
rx-usecs: 1
rx-frames: 0
rx-usecs-irq: 0
rx-frames-irq: 0
......
Let’s talk about the general meaning of the above results
- Adaptive RX: Adaptive interrupt merging , The network card driver decides when to merge and when not to merge
- rx-usecs: After such a long time , One RX interrupt Will be generated
- rx-frames: When so many frames are received , One RX interrupt Will be generated
If you want to modify one of the parameters , Use it directly ethtool -C
Can , for example :
ethtool -C eth0 adaptive-rx on
But here’s the thing , Reducing the number of interrupts can make Linux The overall throughput is higher , However, the latency of some packets will also increase , So you should pay attention to it when you use it .
Soft interrupt monitoring and tuning
After a hard interrupt , And then the next process is ksoftirqd Soft interrupt in kernel thread . We said before , Soft interrupt and its corresponding hard interrupt are processed on the same core . therefore , When the front hard interrupt is distributed to multiple cores , In fact, the optimization of soft interrupt follows , It’s also going to be multi-core . However, soft interrupt also has its own optimization options .
monitor
Soft interrupt information can be obtained from /proc/softirqs Read :
$ cat /proc/softirqs
CPU0 CPU1 CPU2 CPU3
HI: 0 2 2 0
TIMER: 704301348 1013086839 831487473 2202821058
NET_TX: 33628 31329 32891 105243
NET_RX: 418082154 2418421545 429443219 1504510793
BLOCK: 37 0 0 25728280
BLOCK_IOPOLL: 0 0 0 0
TASKLET: 271783 273780 276790 341003
SCHED: 1544746947 1374552718 1287098690 2221303707
HRTIMER: 0 0 0 0
RCU: 3200539884 3336543147 3228730912 3584743459
Soft interrupt budget adjustment
I don’t know if you’ve heard of tomato working , It basically means that you have to have a whole period of undisturbed time , Concentrate on an assignment . The whole length of time is suggested to be 25 minute .
For our Linux Dealing with soft interrupts ksoftirqd Come on , It is also similar to the tomato working method . Once it’s triggered by a hard interrupt, it starts working , It will focus on a wave of network packets ( It’s not just 1 individual ), And then do something else .
How many waves are we talking about , The strategy is a little more complicated . Let’s just say one of them is easier to understand , That’s it net.core.netdev_budget
Kernel parameters .
#sysctl -a | grep
net.core.netdev_budget = 300
What does this mean ,ksoftirqd At most at one time 300 A package , When you’ve dealt with it enough, you’ll put CPU Take the initiative to let it out , In order to Linux Other tasks can be dealt with . So if you say , We are now trying to improve the efficiency of the kernel in handling network packets . Then we can make ksoftirqd Process to dry a little longer network packet reception , Give up again CPU. As for how to improve , Just change the value of no parameter directly .
#sysctl -w net.core.netdev_budget=600
If you want to make sure that the restart still works , This configuration needs to be written to /etc/sysctl.conf
Soft interrupt GRO Merge
GRO Similar to the idea of hard break merging , But the stages are different . Hard interrupt merging is before the interrupt is initiated , and GRO It’s in the context of soft interrupts .
If the application is the transfer of large files , Most packets are pieces of data , no need GRO Words , A packet is sent to the protocol stack one at a time (IP Receiver function 、TCP receive ) Function . Turn on GRO Words ,Linux It will intelligently merge packages , Then a large packet is passed to the protocol handler . such CPU It also improves the efficiency of .
ethtool -k eth0 | grep generic-receive-offload
generic-receive-offload: on
If your network card driver is not turned on GRO Words , You can open it in the following ways .
# ethtool -K eth0 gro on
GRO It’s just about how to optimize the receiving phase of a packet , For sending, it’s GSO.
summary
In the field of network technology , Too much knowledge is still in the theoretical stage . You may feel that you’re well versed in cybernetics , But when something goes wrong with your online service , You still don’t know how to investigate , How to optimize the . That’s because you only understand the theory , It’s not clear Linux What is the kernel mechanism through which the network technology is implemented , How to cooperate with each other , What parameters can be adjusted for each component . We discussed in detail in two articles Linux The process of receiving network packets , And how to view some statistics in this process , How to tune . I believe that after digesting these two articles , Your understanding of the Internet can directly improve 1 individual Level, Your ability to control online services will also be more like a fish in the water .
Development of hard disk album of internal training :
- The illustration Linux Network packet receiving process
- Linux Monitoring and tuning of network packet receiving process
- Chat TCP The things that take time to connect
My official account is 「 Develop internal skill and practice 」, I’m not just talking about technical theory here , It’s not just about practical experience . It’s about combining theory with practice , Deepen the understanding of theory with practice 、 Use theory to improve your technical practice ability . Welcome to my official account , Please also share with your friends ~~~