`grep`

To be continue…

`sed`

Stream EDitor. 是受到 Ken Thompson 为 Unix 研发的古老编辑器 ed 的启发，产生的流式编辑器命令行工具，兼容大量 ed 语法。

主要用于文本编辑，擅长行处理

`tldr`

  sed

  GNU stream editor for filtering and transforming text.
  See also: `awk`, `ed`.
  More information: https://www.gnu.org/software/sed/manual/sed.html.

  - [s]ubstitute all occurrences of "apple" with "mango" on all lines, print to stdout:
    command | sed 's/apple/mango/g'

  - Replace "apple" with "mango" in-place in a file (overwriting original file):
    sed --in-place 's/apple/mango/g' path/to/file

  - Run multiple substitutions in one command:
    command | sed -e 's/apple/mango/g' -e 's/orange/lime/g'

  - Use a custom delimiter (useful when the pattern contains slashes):
    command | sed 's#////#____#g'

  - [d]elete lines 1 to 5 of a file and back up the original file with a .orig extension:
    sed --in-place=.orig '1,5d' path/to/file

  - [p]rint only the first line to stdout:
    command | sed --quiet '1p'

  - [i]nsert a new line at the beginning of a file, overwriting the original file:
    sed --in-place '1i\your new line text\' path/to/file

  - Delete blank lines (with or without spaces/tabs) from a file, overwriting the original file:
    sed --in-place '/^[[:space:]]*$/d' path/to/file

指令格式

sed [options] 'address command' file（当一个 address 后面多个操作时可以在 command 外加大括号）

默认行为为逐行读取，执行命令，打印该行。

常用 Options

-i：原地修改（sed 默认不修改原文本，而是输出修改后的内容）
-n：静默模式，关闭每行执行后的自动打印，和 command p 配合打印匹配内容。
-e：可以用于先后执行多个指令。格式为：sed [options] -e 'address command' -e 'address command' ... file
-E：使用扩展正则，可以无转义使用 (), +, ?, |, {n, m}。
-f：从外部文件加载 sed 脚本（每行一个指令）

Address

Address 共有两种形式: 绝对行号和正则匹配。

绝对行号：
- n：第 n 行，特别地，$为最后一行
- n,m：第 n 到 m 行（包含第 m 行）
- n~m：从第 n 行开始，步长为 m 。
正则匹配：
- /regex/：匹配包含 regex 的行。
- /re1/,/re2/：从匹配 re1 的行到匹配 re2 的行之间的区间。
二者可组合：/regex/,m：从匹配 regex 的行开始到第 m 行。

常用 Command

[s]ubstitute：s/需要被替换内容/替换内容/[flag]
- flag: s 命令默认替换每行匹配到的第一个内容，可以通过 flag 指定或扩大范围，以及增加功能。
  - [g]lobal：替换每行全部匹配内容。
  - number：具体数字，指定替换第 1, 2, 3… 个内容。
  - [i]gnore case：匹配忽略大小写。
  - [w]rite：把替换成功的行写入另一个文件：sed 's/error/ERROR/w error.log' journal.txt。
  - [p]rint：替换成功后打印该行，与 -n 配合使用。
- s 后面的内容是分隔符，可以自定义为任意字符。但是如果匹配内容有相同字符，需要转义。常用分隔符有：/, #, @，如：s@abc@123@g
[p]rint：打印，配合 -n 使用。
[d]elete：删除
[i]nsert：插入，在此行之前添加。
[a]ppend：追加，在此行之后添加。

`awk`

得名于三位创造者的姓氏首字母缩写：Alfred Aho, Peter Weinberger, and Brian Kernighan

主要用于文本提取与格式化，擅长列、字段处理

`tldr`

  awk

  A versatile programming language for working on files.
  Note: Different implementations of AWK often make this a symlink of their binary.
  See also: `gawk`.
  More information: https://github.com/onetrueawk/awk.

  - Print the fifth column (a.k.a. field) in a space-separated file:
    awk '{print $5}' path/to/file

  - Print the second column of the lines containing "foo" in a space-separated file:
    awk '/foo/ {print $2}' path/to/file

  - Print the last column of each line in a file, using a comma (instead of space) as a field separator:
    awk -F ',' '{print $NF}' path/to/file

  - Sum the values in the first column of a file and print the total:
    awk '{s+=$1} END {print s}' path/to/file

  - Print every third line starting from the first line:
    awk 'NR%3==1' path/to/file

  - Print different values based on conditions:
    awk '{if ($1 == "foo") print "Exact match foo"; else if ($1 ~ "bar") print "Partial match bar"; else print "Baz"}' path/to/file

  - Print all the lines which the 10th column value is between a min and a max:
    awk '($10 >= min_value && $10 <= max_value)' path/to/file

  - Print table of users with UID >=1000 with header and formatted output, using colon as separator (%-20s mean: 20 left-align string characters, %6s means: 6 right-align string characters):
    awk 'BEGIN {FS=":";printf "%-20s %6s %25s\n", "Name", "UID", "Shell"} $4 >= 1000 {printf "%-20s %6d %25s\n", $1, $4, $7}' /etc/passwd

指令格式

最常用结构：awk [options] 'pattern { action }' file

完整结构：

awk [options] ' \
BEGIN { action } \
pattern { action } \
END { action }' \
file

pattern：当且仅当满足 pattern 条件的时候，后面的代码被执行。若省略则对每一行都执行。
- 正则匹配：/regex/
- 逻辑表达式：$3 > 10 && $NF < 20
- 范围：NR == 10, NR == 20，打印第 10~20 行。
BEGIN，END：分别在中间主体部分执行前后执行。
action：默认为 { print $0 }，也就是打印整行。

常用 Options

-F：指定分隔符，awk 默认用空格和制表符作为分隔符。
-v：外部变量赋值，可以预定义变量或者使用 bash 变量。*** 比用双引号是更好的选择！***
-f：从外部文件加载 awk 脚本。

常用内置变量

数据流：
- $0：整行内容。
- $number：本行第 number 个字段。
- NF：当前行的字段数，常用 $NF 访问最后一列。
- NR：自处理开始以来读取的总行数。（从 1 开始计数）
- FNR：当前文件已读取的行数。（从 1 开始计数）
分隔符：
- FS：输入字段分隔符（事实上，BEGIN { FS=":" }，等价于 -F ':'。
- OFS：输出字段分隔符。
- RS：输入行分隔符，默认为 \n。
- ORS：输出行分隔符，默认为 \n。

常用 Action

打印：
- print：如 print $1, $2，中间会用 OFS 隔开，自动换行。
- printf：格式化打印，类似于 C 语言。例如：printf "Integer: %-3d String: %10s Float: %5.2f\n", $1, $3, $NF。- 左对齐，默认右对齐。浮点数的点后代表保留位数。
运算：使用变量无须声明
- 常用运算符：+, -, \*, /, %, ^（幂）, ++, +=等，除了 ^ 代表幂运算，其余与 C 一致（甚至前后自增自减行为也一致）
条件：
- 逻辑运算符：||, &&, !
- if (condition) { action } else if (condition) { action } ... else { action }
- ?:
关联数组：类似于 Python 的 dict
- array[key] = value
- 常用于去重统计

示例：

{ count[$1]++ } # 哈希表，统计第一列不同数据出现次数
END { 
    for (ip in count) # 按 key 遍历
        print ip, "access", count[ip], "times" 
}

内置函数：
- 字符串：
  - length(string)
  - substr(string, start, length)：取子串，注意字符从 1 开始计数。
  - tolower(string), toupper(string)
  - gsub(regex, replacement, target)：类似 sed 's/regex/replacement/g ...'
- 数学：
  - int()：取整（awk 默认以双精度浮点数进行数学运算）
  - rand()：生成 0~1 的随机数。
- 调用：system("command")，执行 Shell 命令。

事实上，作为一门图灵完备的语言，awk 可以实现的功能远不止上面这些！不过为了自己和别人，实现太复杂的东西还是选择 Python 之类的脚本语言吧！

一个稍微复杂点的例子：

Apple  10
Banana 5
Apple  20
Cherry 30

#!/usr/bin/awk -f
 
BEGIN {
    print "----------------------"
}
 
$1 == "Apple" { # 只有第一列是 Apple 的行才会进入这个块。
    count++
    sum += $2
    print "找到一笔: " $2 # 注意，无逗号分隔，因此行为为拼接字符串。
}
 
END {
    printf "共找到 %d 笔记录，总金额: %d\n", count, sum
}

シリウスの砂

探索

Command Line Text Tools

`grep`

`sed`

`tldr`

指令格式

常用 Options

Address

常用 Command

`awk`

`tldr`

指令格式

常用 Options

常用内置变量

常用 Action

关系图谱

目录