在Linux中使用Grep和正则表达式搜索文本模式

介绍

Linux终端环境中最有用和最灵活的命令之一是“grep”命令。 名称“grep”代表“全局正则表达式打印”。 这意味着grep可以用于查看它接收的输入是否匹配指定的模式。

这个看似简单的程序在正确使用时非常强大。 它基于复杂规则对输入进行排序的能力使其成为许多命令链中的流行链接。

我们将探索一些选项,然后深入使用正则表达式。 本指南中讨论的所有技术都可以应用于管理您的VPS服务器。

目录

  1. 基本用法
  2. 正则表达式
  3. 扩展正则表达式
  4. 结论

基本用法

在其最简单的形式中,grep可以用于匹配文本文件中的文字模式。 这意味着如果你传递grep一个字来搜索,它将打印出包含该单词的文件中的每一行。

让我们试试一个例子。 我们将使用grep在Ubuntu系统上的GNU通用公共许可证版本3中搜索包含单词“GNU”的每一行。

cd /usr/share/common-licenses
grep "GNU" GPL-3
                    GNU GENERAL PUBLIC LICENSE
  The GNU General Public License is a free, copyleft license for
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it applies also to
  Developers that use the GNU GPL protect your rights with two steps:
  "This License" refers to version 3 of the GNU General Public License.
  13. Use with the GNU Affero General Public License.
under version 3 of the GNU Affero General Public License into a single
...
...

第一个参数“GNU”是我们要搜索的模式,而第二个参数“GPL-3”是我们希望搜索的输入文件。

结果输出将是包含模式文本的每一行。 在某些Linux发行版中,搜索的模式将在结果行中突出显示。

公共选项

默认情况下,grep将只搜索输入文件中的确切指定模式,并返回它找到的行。 我们可以通过向grep添加一些可选标志来使此行为更有用。

如果我们希望grep忽略我们的搜索参数的“case”并搜索大小写变体,我们可以指定“-i”或“--ignore-case”选项。

我们将在与以前相同的文件中搜索单词“许可证”(包括上,下或混合大小写)的每个实例。

grep -i "license" GPL-3
                    GNU GENERAL PUBLIC LICENSE
 of this license document, but changing it is not allowed.
  The GNU General Public License is a free, copyleft license for
  The licenses for most software and other practical works are designed
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it applies also to
price.  Our General Public Licenses are designed to make sure that you
(1) assert copyright on the software, and (2) offer you this License
  "This License" refers to version 3 of the GNU General Public License.
  "The Program" refers to any copyrightable work licensed under this
...
...

正如你所看到的,我们得到的结果包含:“LICENSE”,“license”和“License”。 如果有一个“LiCeNsE”的实例,那么也会返回。

如果我们想找到包含指定模式的所有行,我们可以使用“-v”或“--invert匹配”选项。

我们可以使用以下命令搜索BSD许可证中不包含单词“the”的每一行:

grep -v "the" BSD
All rights reserved.

Redistribution and use in source and binary forms, with or without
are met:
   may be used to endorse or promote products derived from this software
   without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
...
...

正如你可以看到的,因为我们没有指定“忽略大小写”选项,最后两个项目返回为没有单词“the”。

知道匹配发生的行号通常是有用的。 这可以通过使用“-n”或“--line-number”选项来实现。

添加了此标志的上一个示例将返回以下文本:

grep -vn "the" BSD
2:All rights reserved.
3:
4:Redistribution and use in source and binary forms, with or without
6:are met:
13:   may be used to endorse or promote products derived from this software
14:   without specific prior written permission.
15:
16:THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17:ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
...
...

现在我们可以引用行号,如果我们要更改不包含“the”的每一行。

正则表达式

在介绍中,我们说grep代表“全局正则表达式打印”。 “正则表达式”是描述特定搜索模式的文本字符串。

不同的应用程序和编程语言实现正则表达式略有不同。 我们将只探索grep描述其模式的一小部分。

文字匹配

上面的例子,当我们搜索单词“GNU”和“the”时,我们实际上正在搜索非常简单的正则表达式,它匹配字符“GNU”和“the”的确切字符串。

这是有帮助的,总是认为这些匹配的字符串,而不是匹配一个字。 这将变得更加重要的区别,因为我们学习更复杂的模式。

精确指定要匹配的字符的模式称为“字面值”,因为它们与字面上的字符匹配。

除非被其他表达机制修改,否则所有字母和数字字符(以及某些其他字符)在字面上匹配。

锚点匹配

锚是特殊字符,指定匹配必须在行中的哪里出现才有效。

例如,使用锚点,我们可以指定我们只想知道在行的最开始处匹配“GNU”的行。 为此,我们可以在文字字符串之前使用“^”锚点。

这个字符串示例只会加在“GNU”,如果它发生在一行的开头。

grep "^GNU" GPL-3
GNU General Public License for most of our software; it applies also to
GNU General Public License, you may choose any version ever published

类似地,“$”锚可以在字符串之后使用,以指示匹配将仅在线的最末端出现时才是有效的。

我们将匹配在以下正则表达式中以“and”结尾的每一行:

grep "and$" GPL-3
that there is no warranty for this free software.  For both users' and
  The precise terms and conditions for copying, distribution and
  License.  Each licensee is addressed as "you".  "Licensees" and
receive it, in any medium, provided that you conspicuously and
    alternative is allowed only occasionally and noncommercially, and
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
provisionally, unless and until the copyright holder explicitly and
receives a license from the original licensors, to run, modify and
make, use, sell, offer for sale, import and otherwise run, modify and

匹配任何字符

句法字符(。)用于正则表达式中,表示任何单个字符都可以存在于指定位置。

例如,如果我们想匹配任何有两个字符,然后是字符串“cept”的东西,我们可以使用以下模式:

grep "..cept" GPL-3
use, which is precisely where it is most unacceptable.  Therefore, we
infringement under applicable copyright law, except executing it on a
tells the user that there is no warranty for the work (except to the
License by making exceptions from one or more of its conditions.
form of a separately written license, or stated as exceptions;
  You may not propagate or modify a covered work except as expressly
  9. Acceptance Not Required for Having Copies.
...
...

正如你可以看到,我们有“accept”和“except”的实例和两个词的变体。 如果找到的模式也将匹配“z2cept”。

支架表达式

通过将一组字符放在括号(“[”和“]”)中,我们可以指定该位置处的字符可以是在括号组中找到的任何一个字符。

这意味着,如果我们想要找到包含“too”或“two”的行,我们可以使用以下模式简明地指定这些变化:

grep "t[wo]o" GPL-3
your programs, too.
freedoms that you received.  You must make sure that they, too, receive
  Developers that use the GNU GPL protect your rights with two steps:
a computer network, with no transfer of a copy, is not conveying.
System Libraries, or general-purpose tools or generally available free
    Corresponding Source from a network server at no charge.
...
...

我们可以看到,两个变体都在文件中。

括号符号也允许我们一些有趣的选项。 我们可以有图案匹配任何东西, 除了字符由开始用“^”字符括号内的字符列表的支架内。

此示例类似于模式“.ode”,但不匹配模式“代码”:

grep "[^c]ode" GPL-3
  1. Source Code.
    model, to give anyone who possesses the object code either (1) a
the only significant mode of use of the product.
notice like this when it starts in an interactive mode:

你会注意到,在第二行返回,实际上,有“单词”这个词。 这不是正则表达式或grep的失败。

相反,返回此行是因为较早的行中,发现模式“模式”,在单词“模型”中找到。 返回该行,因为存在与模式匹配的实例。

括号的另一个有用的功能是,您可以指定一个范围的字符,而不是单独键入每个可用的字符。

这意味着如果我们想要找到以大写字母开头的每一行,我们可以使用以下模式:

grep "^[A-Z]" GPL-3
GNU General Public License for most of our software; it applies also to
States should not allow patents to restrict development and use of
License.  Each licensee is addressed as "you".  "Licensees" and
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
System Libraries, or general-purpose tools or generally available free
Source.
User Product is transferred to the recipient in perpetuity or for a
...
...

由于一些遗留的排序问题,使用POSIX字符类而不是我们刚刚使用的字符范围通常更准确。

有很多字符类超出了本指南的范围,但是一个示例将完成与上述相同的过程使用括号选择器中的“[:upper:]”字符类:

grep "^[[:upper:]]" GPL-3
GNU General Public License for most of our software; it applies also to
States should not allow patents to restrict development and use of
License.  Each licensee is addressed as "you".  "Licensees" and
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
System Libraries, or general-purpose tools or generally available free
Source.
User Product is transferred to the recipient in perpetuity or for a
...
...

重复图案零或更多次

最后,最常用的元字符之一是“*”,这意味着“重复前一个字符或表达式零次或更多次”。

如果我们想要找到包含开始和结束括号的每一行,其中只有字母和单个空格,我们可以使用以下表达式:

grep "([A-Za-z ]*)" GPL-3
 Copyright (C) 2007 Free Software Foundation, Inc. 
  
distribution (with or without modification), making available to the
than the work as a whole, that (a) is included in the normal form of
Component, and (b) serves only to enable use of the work with that
(if any) on which the executable work runs, or a compiler used to
    (including a physical distribution medium), accompanied by the
    (including a physical distribution medium), accompanied by a
    place (gratis or for a charge), and offer equivalent access to the
...
...

逃脱元字符

有时,我们可能想要搜索字面值或开头括号。 因为这些字符在正则表达式中有特殊的含义,我们需要“转义”这些字符,告诉grep我们不希望在这种情况下使用它们的特殊含义。

我们可以通过在通常有特殊意义的字符前使用反斜杠字符(\)来转义字符。

例如,如果我们想要找到以大写字母开头并以句点结尾的任何行,我们可以使用以下表达式。 结束周期被转义,使它代表字面意义,而不是通常的“任何字符”意义:

grep "^[A-Z].*\.$" GPL-3
Source.
License by making exceptions from one or more of its conditions.
License would be to refrain entirely from conveying the Program.
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
SUCH DAMAGES.
Also add information on how to contact you by electronic and paper mail.

扩展正则表达式

Grep可以使用更广泛的正则表达式语言,通过使用“-E”标志或通过调用“egrep”命令而不是grep。

这些选项打开了“扩展正则表达式”的功能。 扩展正则表达式包括所有基本元字符,以及用于表示更复杂匹配的附加元字符。

分组

扩展正则表达式的最简单和最有用的功能之一是将表达式组合在一起以操作或引用为一个单位的能力。

使用括号将表达式组合在一起。 如果要使用括号而不使用扩展正则表达式,可以使用反斜杠转义它们以启用此功能。

grep "\(grouping\)" file.txt
grep -E "(grouping)" file.txt
egrep "(grouping)" file.txt

上述三个表达式在功能上是等价的。

交替

类似于括号表达式可以为单个字符匹配指定不同的可能选择,交替允许您为字符串或表达式集指定替代匹配。

为了指示交替,我们使用管道字符“|”。 这些常常在括号分组中使用,以指定两个或多个可能性中的一个应被视为匹配。

以下将在文本中找到“GPL”或“通用公共许可证”:

grep -E "(GPL|General Public License)" GPL-3
  The GNU General Public License is a free, copyleft license for
the GNU General Public License is intended to guarantee your freedom to
GNU General Public License for most of our software; it applies also to
price.  Our General Public Licenses are designed to make sure that you
  Developers that use the GNU GPL protect your rights with two steps:
  For the developers' and authors' protection, the GPL clearly explains
authors' sake, the GPL requires that modified versions be marked as
have designed this version of the GPL to prohibit the practice for those
...
...

交替可以通过在由附加管道(|)字符分隔的选择组内添加附加选择来在两个以上的选择之间进行选择。

量词

像“*”元字符,与前一个字符或字符集匹配零次或更多次,扩展正则表达式中还有其他元字符可用于指定出现次数。

要匹配字符零或一次,您可以使用“?” 字符。 这使得字符或字符集在可选之前,实质上。

下面的匹配“版权”和“权利”通过将“复制”在一个可选组:

grep -E "(copy)?right" GPL-3
 Copyright (C) 2007 Free Software Foundation, Inc. 
  
  To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights.  Therefore, you have
know their rights.
  Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
  "Copyright" also means copyright-like laws that apply to other kinds of
...
...

“+”字符匹配一个表达式一次或多次。 这是几乎像“*”元字符,但与“+”字符,则表达式必须至少一次匹配。

以下表达式匹配字符串“free”加上一个或多个不是空格的字符:

grep -E "free[^[:space:]]+" GPL-3
  The GNU General Public License is a free, copyleft license for
to take away your freedom to share and change the works.  By contrast,
the GNU General Public License is intended to guarantee your freedom to
  When we speak of free software, we are referring to freedom, not
have the freedom to distribute copies of free software (and charge for
you modify it: responsibilities to respect the freedom of others.
freedoms that you received.  You must make sure that they, too, receive
protecting users' freedom to change the software.  The systematic
of the GPL, as needed to protect the freedom of users.
patents cannot be used to render the program non-free.

指定匹配重复

如果我们需要指定匹配重复的次数,我们可以使用大括号(“{”和“}”)。 这些字符用于指定精确的数字,范围或表达式可以匹配的次数的上限或下限。

如果我们想要查找包含三元音的所有行,我们可以使用下面的表达式:

grep -E "[AEIOUaeiou]{3}" GPL-3
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
receive it, in any medium, provided that you conspicuously and
give under the previous paragraph, plus a right to possession of the
covered work so as to satisfy simultaneously your obligations under this

如果我们想匹配任何有16到20个字符的字词,我们可以使用下面的表达式:

grep -E "[[:alpha:]]{16,20}" GPL-3
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
    c) Prohibiting misrepresentation of the origin of that material, or

结论

有很多次,grep将有助于在文件或文件系统层次结构中查找模式。 它是值得熟悉的选项和语法,以节省自己的时间,当你需要它。

正则表达式甚至更多样化,并且可以与许多流行的程序一起使用。 例如,许多文本编辑器实现用于搜索和替换文本的正则表达式。

此外,大多数现代编程语言使用正则表达式来对特定数据片段执行过程。 正则表达式是一种可传递到许多常见计算机相关任务的技能。

作者:Justin Ellingwood
赞(52) 打赏
未经允许不得转载:优客志 » 系统运维
分享到:

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏