• 周日. 11月 27th, 2022

5G编程聚合网

5G时代下一个聚合的编程学习网

热门标签

Performance tools of Linux three swordsmen awk, grep, sed

[db:作者]

1月 6, 2022

Preface

linux There are many tools for text processing , for example :sort, cut, split, join, paste, comm, uniq, column, rev, tac, tr, nl, pr, head, tail….., Study linux Lazy way of text processing ( Not the best way ) May be : Just learn grep,sed and awk.

Use these three tools , You can solve the problem 99% linux Text processing of the system , Instead of remembering the different commands and parameters above . picture

and , If you’ve learned and used all three , You’ll know the difference . actually , The difference here means which tools are good at solving what problems .

A more lazy way might be to learn scripting languages (python,perl or ruby) And use it for every text processing .

summary

awk、grep、sed yes linux Three sharp tools for text manipulation , It is also necessary to master linux Command one .

The function of all three is to process text , But the focus is different , Among them awk Most powerful , But it’s also the most complicated .grep More suitable for simple search or matching text ,sed More suitable for editing matched text ,awk Better for formatting text , More complex formatting of text .

A brief summary :

grep: Data search positioning
awk: Data slicing
sed: Data modification
grep = global regular expression print

In the simplest terms ,grep( Global regular expression printing )– The command is used to find the strings in the file that match the conditions . Start with the first line of the file ,grep Copy a line to buffer in , Compare it to the search string , If the comparison passes , Print the line to the screen .grep This process will be repeated , Until the file searches all lines .

Be careful : There is no process execution here grep Store lines 、 Change the line or search only a few lines .

Sample data file

Please cut and paste the following data into a file named “sampler.log” In the file of :

boot
book
booze
machine
boots
bungie
bark
aardvark
broken$tuff
robots
A simple example

grep The simplest example is :

grep
“boo”
sampler.log
In this case ,grep Will traverse the file “sampler.log” Each line , And print out every line Contains words “boo”:

boot
book
booze
boots
But if you’re working on large files , This will happen : If these lines identify which line in the file , What are they , It might be more useful to you , If you need to open a file in an editor , So it’s easier to track specific strings and make some changes . In this case, you can add -n Parameter to implement :

grep -n
“boo”
sampler.log
This leads to a more useful result , Explains which lines match the search string :

1
:boot
2
:book
3
:booze
5
:boots
Another interesting parameter is -v, It prints the opposite result . let me put it another way ,grep All lines that do not match the search string will be printed , Instead of printing the line that matches it .

In the following cases ,grep Will print without strings “boo” Each line , And display the line number , As shown in the previous example

grep -vn
“boo”
sampler.log
4
:machine
6
:bungie
7
:bark
8
:aardvark
9
:broken$tuff
10
:robots
c Options tell grep Suppress printing of matching lines , Show only the number of matching rows , The rows that match the query . for example , The numbers will be printed below 4, Because there is 4 It’s in sampler.log It appears that “boo”.

grep -c
“boo”
sampler.log
4
l Option prints only the file name string of the file in the query that has a line that matches the search . If you want to search multiple files for the same string , This will be very useful . like this :

grep -l
“boo”
*
For searching non code files , A more useful option is -i, Ignore case . This option will handle when matching search strings , Equal case . In the following example , Even if the search string is uppercase , contain “boo” And the lines will also be printed out .

grep -i
“BOO”
sampler.log
boot
book
booze
boots
x Options only match exactly . let me put it another way , The following command search has no results , Because no line contains only “boo”

grep -x
“boo”
sampler.log
Last ,-A Allows you to specify additional up and down file lines , So you get the search string extra lines , for example

grep -A2
“mach”
sampler.log
machine
boots
bungie
Regular expressions

Regular expressions are a compact way to describe complex patterns in text .

With grep You can use search mode ( pattern ) . Other tools use regular expressions (regexp) In a complex way . and grep The normal string used , It’s actually very simple regular expressions . If you use wildcards , Such as ‘ * ‘ or ‘ ? ‘, For example, list file names and so on , You can use grep Search with basic regular expressions

For example, search a file for letters e The line at the end :

grep
“e$”
sampler.log
booze
machine
bungie
If you need more extensive regular expression commands , Must be used grep-E.

for example , Regular expression commands ? Will match 1 or 0 time Previous characters :

grep -E
“boots?”
sampler.log
boot
boots
You can still use it pipe(|) Combine multiple searches , It means “ perhaps ”, So you can do this :

grep -E
“boot|boots”
sampler.log
boot
boots
Special characters

If you want to search for a special character , What should I do ? If you want to find all the lines , If it contains the dollar character “$”, It cannot be executed grep“$”a_file, because ‘$’ Will be interpreted as regular expressions , contrary , You will get all the lines , Any of them ends as a line , That is, all lines . The solution is “ escape ” Symbol , So you will use

grep
‘$’
sampler.log
broken$tuff
You can still use it “-F” Options , It represents “ Fixed string ” or “ Fast ”, Because it only searches for Strings , Not regular expressions .

added regexp Example

Reference resources :http://gnosis.cx/publish/prog…

AWK

from Aho,Weinberger and Kernighan Create text patterns for scanning and processing languages .

AWK Very complicated , So this is not a complete guide , But it should give you a way to know what awk You can do it . It’s easy to use , Strongly recommended .

AWK Basic knowledge of

awk The program operates on each line of the input file . It can have an optional BEGIN{ } Part of the command executed before processing anything in the file , Then master { } Parts run on every line of the file , Finally, there’s an alternative END{ } Part of the operation will be performed later, and the file reading is completed :

BEGIN { …. initialization awk commands …}
{ …. awk commands
for
each line of the file…}
END { …. finalization awk commands …}
For each line of the input file , It looks to see if there are any pattern matching instructions , In this case, it only runs on lines that match the pattern , Otherwise it runs on all lines . these ‘pattern-matching’ A command can contain and grep The same regular expression .

awk Commands can do some very complex mathematical and string operations ,awk It also supports associative arrays . AWK Think of each line as consisting of multiple fields , Each field consists of “ Spacer ” Separate . By default , This is one or more space characters , So it’s OK :

this is a line of text
contain 6 A field . stay awk in , The first field is called $1, The second field is called $2, wait , All lines are called $0.

The field separator is defined by awk Internal variables FS Set up , So if you set FS= “: ” Then it will be based on ‘:’ In a row , This is for /etc/passwd Documents like that are very useful , Other useful internal variables are NR, The current record number ( Line number ) NF Is the number of fields in the current row .

AWK You can operate on any file , Include std-in, under these circumstances , It is usually with ‘|’ Command is used together , for example , combination grep Or other orders .

for example , If I list all the files in the current directory

ls -l
Total usage
140
-rw-r–r–
1
root root
55121

1
month
3

17
:
03
combined_log_format.log
-rw-r–r–
1
root root
80644

1
month
3

17
:
03
combined_log_format_w_resp_time.log
-rw-r–r–
1
root root
71

1
month
3

17
:
55
sampler.log
` I can see the file size report as 3 Column data . If I want to know their total size , The files in this directory I can do :

ls -l | awk
‘BEGIN {sum=0} {sum=sum+$5} END {print sum}’
135836
Please note that ,’print sum’ Print variables sum Value , So if sum = 2 be ‘print sum’ Give the output ‘2’ and ‘print $ sum’ Will print ‘1’ , Because the second field contains the value ‘1’ .

therefore , Will be very simple to write a can calculate the average and the standard deviation of a column of numbers awk command – Accumulate in the main interior ‘sumx’ and ‘sumx2’ part , Then use the standard formula to calculate END The mean and standard deviation of the part .

AWK Support (’for’ and ‘while’) Loops and branching ( Use ‘if ‘). therefore , If you want to trim a file and only on each page 3 Line operation , You can do that :
ls -l | awk
‘{for (i=1;i<3;i++) {getline}; print NR,$0}’
3
-rw-r–r–
1
root root
80644

1
month
3

17
:
03
combined_log_format_w_resp_time.log
4
-rw-r–r–
1
root root
71

1
month
3

17
:
55
sampler.log

sampler.log
for Recycling “getline” Command traverses the file , And every 3 Print one line at a time .

Be careful , Because the number of lines in the file is 4, Can not be 3 to be divisible by , So the last order is done ahead of time , So the last “print $0” Order to print the 4 That’s ok , You can see that we also printed the line , Use NR Variable output line number .

AWK Pattern matching

AWK It’s a line oriented language . The first is the pattern , And then there’s the action . The operation statement uses { and } Cover up . Patterns may be missing , Or the movement may be missing , however , Of course not all . If there is no pattern , For each input record . A missing action will print the entire record .

AWK Patterns include regular expressions ( Use with “grep -E” The same grammar ) And the combination of special symbols used “&&” Express “ Logic AND ”,“||” Express “ Logic or ”,“!” It means “ No logic ”.

You can also do relationship patterns 、 Pattern group 、 Scope, etc .

AWK Control statement

if
(condition) statement [
else
statement ]
while
(condition) statement
do
statement
while
(condition)
for
(expr1; expr2; expr3) statement
for
(var
in
array) statement
break
continue
exit [ expression ]
AWK Input / Output

Be careful :printf The command allows you to use something like C Specifies the output format more closely for example , You can specify an integer of a given width , Floating point numbers or strings, etc .

AWK Mathematical functions

AWK String function

AWK Command line and usage

You can use it as many times as you need ‘ -v ‘ Flag passes the variable to awk Program , for example

awk -v skip=
3

‘{for (i=1;i<skip;i++) {getline}; print $0}’
sampler.log
booze
bungie
broken$tuff
You can also use the editor to write awk Program , Then save it as a script file , for example :

$ cat awk_strip
#!/usr/bin/awk -f
#only print out every 3rd line of input file
BEGIN {skip=
3
}
{
for
(i=
1
;i<skip;i++)
{getline};
print $0}
You can then use it as a new add-on command

chmod u+x awk_strip
./awk_strip sampler.dat
sed = stream editor

sed For the input stream ( File or input from pipeline ) Perform basic text conversion single through stream , So it’s very efficient . however , sed Ability to filter text through pipes , Especially different from other types of editors .

sed Basics

sed It can be on the command line or shel l Use in script , Edit files in a non interactive way . Perhaps the most useful function is to edit a string “ Search and replace ” To another string . You can use sed Commands are embedded into the use of ‘-e’ Option call sed In the command line of , Or put them in a separate file ‘sed.in’ And use ‘-f sed.in’ Option call sed. The latter option is if sed The command is complex and involves a lot of regexp, The most commonly used , for example :

sed-e’s/input/output/’sampler.log

Will be taken from sampler.log Echo to every line of standard output , Change every line of ‘input’ Line up ‘output’. Be careful sed It’s line oriented , So if you want to change every event in every line , So you need to make it a ‘ greedy ‘ Search and replace , As shown below :

sed -e
‘s/input/output/g’
sampler.log
boot
book
booze
machine
boots
bungie
bark
aardvark
broken$tuff
robots
/…/ The expression in can be a literal string or regular expression . Note that by default , The output will be written to stdout. You can redirect it to a new file , Or if you want to Edit existing files , You should use ‘-i’ sign :

sed -e
‘s/input/output/’
sampler.log > new_file
sed -i -e
‘s/input/output/’
sampler.log
sed And regular expressions

If a character you want to use in a search command is a special symbol , for example ‘/’, What should I do ?( For example, in the file name ) or ‘*’ etc. ? Then you have to be like grep( and awk) So the escape symbol . I want to tell you that I want to edit shell Script to reference /usr/local/bin instead of /bin, Then you can do this

sed -e
‘s//bin//usr/local/bin/’
my_script > new_script
What if you want to use wildcards in your search – How to write an output string ? You need to use a special symbol corresponding to the pattern you find “&”. So you want each line to start with a number in your file , And bracket the number :

sed -e
‘s/[0-9]*/(&)/’

among [0-9] It’s all single digits regexp Range , and ‘*’ It’s a repeat count , The number of digits representing any number . You can also regexp Using position commands in , You can even save some of the matching results in the pattern buffer , So that it can be reused elsewhere .

Other SED command

The general form is

sed -e
‘/pattern/ command’
sampler.log
among ‘pattern’ It’s a regular expression ,’command’ It can be ‘s’= search&replace, or ‘p’= print, or ‘d’= delete, or ‘i’=insert, or ‘a’=append etc. . Please note that , The default operation is to print all not match anyway , So if you want to suppress it , You need to use ‘-n’ Flag call sed, Then you can use ‘p’ Command to control what is printed . therefore , If you want to make a list of all Subdirectories you can use

ls -l | sed -n -e
‘/^d/ p’
Because the long list starts with each line with ‘d’ Symbol , If it’s a directory , So this will only print out those with ‘d’ The line at the beginning of the symbol . Again , If you want to delete all comments with symbols ‘#’ Beginning line , You can use

sed -e
‘/^#/ d’
sampler.log
You can also use the scope form

sed -e
‘1,100 command’
sampler.log
In the 1-100 Do it “ command ”. You can also use a special line number $ To express “ end ” file . therefore , If you want to delete the file before 10 All lines except lines , You can use

sed -e
’11,$ d’
sampler.log
You can also use the pattern range form , The first regular expression defines the beginning of the scope , And the second stop . therefore , for example , If you want to print from ‘boot’ To ‘machine’ All of the line You can do that :

sed -n -e
‘/boot$/,/mach/p’
sampler.log
boot
book
booze
machine
And then just print out (-n)regexp The lines in a given range .

Extended reading

Use sed There is much more that can be done , Specific reference :http://www.grymoire.com/Unix/…

summary

Linux Three swordsmen awk,sed and grep It is widely used in performance modeling 、 Performance monitoring and performance analysis , It’s also a high-frequency interview question for testing posts of major Internet companies , One of the necessary skills for middle and high-end testers

发表回复

您的电子邮箱地址不会被公开。 必填项已用*标注