sh-1.md - sebastiano.tronto.net - Source files and build scripts for my personal website

sh-1.md (14774B)

1 # The man page reading club: sh(1) - part 1: shell grammar
2
3 *This post is part of a [series](../../series)*
4
5 After [last time's short entry](../2022-07-07-shutdown) and a
6 relatively long hiatus, we are back in business with a big one!
7
8 ## A new day
9
10 *After a good night of sleep and a cup of whatever people call
11 coffee in the post-apocalypse, you turn your computer back on. You
12 would like to learn more stuff, but you are unsure where to start
13 from. You vaguely remember a `man afterboot` being mentioned
14 somewhere, so you start from there.*
15
16 ```
17 DESCRIPTION
18 Starting out
19 This document attempts to list items for the system administrator to
20 check and set up after the installation and first complete boot of the
21 system. The idea is to create a list of items that can be checked off so
22 that you have a warm fuzzy feeling that something obvious has not been
23 missed. A basic knowledge of UNIX is assumed, otherwise type:
24
25 $ help
26 ```
27
28 *You do have some knowledge of UNIX, someone might call it "basic",
29 but you believe "scattered" is a more appropriate adjective. In any
30 case, a review won't hurt. You type the command*
31
32 ```
33 $ help
34 ```
35
36 *And a manual page shows up. You could have typed `man help` instead
37 to get the same result. After skimming throught the introduction,
38 you discover something worth digging into.*
39
40 ```
41 The Unix shell
42 After logging in, some system messages are typically displayed, and then
43 the user is able to enter commands to be processed by the shell program.
44 The shell is a command-line interpreter that reads user input (normally
45 from a terminal) and executes commands. There are many different shells
46 available; OpenBSD ships with csh(1), ksh(1), and sh(1). Each user's
47 shell is indicated by the last field of their corresponding entry in the
48 system password file (/etc/passwd).
49 ```
50
51 *You have a look at `/etc/passwd` and you see that your user's shell
52 is `ksh`. So you type `man ksh` and start reading.*
53
54 ```
55 DESCRIPTION
56 ksh is a command interpreter intended for both interactive and shell
57 script use. Its command language is a superset of the sh(1) shell
58 language.
59 ```
60
61 *You are quite rusty on the Math jargon - some of your friends used
62 to talk like that in real life, but you never bothered to learn -
63 but "superset" sounds like "it is larger than". Is this another
64 [`less` vs `more`](../2022-06-08-more) kind of thing, where one
65 command is just a simpler version of the other? Let's see what
66 `sh(1)` has to say about it*
67
68 ```
69 This version of sh is actually ksh in disguise.
70 ```
71
72 *Ah-ah! Exactly as you thought. Just like the other time, you prefer
73 to go with the simpler version. Enough of this "fun is precious"
74 bullshit, you want to learn as soon as possible!*
75
76 ## sh(1)
77
78 *Follow along at [man.openbsd.org](https://man.openbsd.org/OpenBSD-7.1/sh)*
79
80 Despite having less features than more complex shells like `ksh`
81 or `bash`, the manual page for `sh` is still very long. So we are
82 going to split it into two or more parts.
83
84 The main sections I intend to cover are BUILTINS, SHELL GRAMMAR and
85 COMMANDS. Parts of SPECIAL PARAMETERS and ENVIRONMENT are quoted
86 and explained in other sections, so I am probably going to skip
87 these too. I think we can skip the invocation options, since we
88 are mostly going to run our shell implicitly when logging in or
89 when executing a script. Finally, COMMAND HISTORY AND COMMAND LINE
90 EDITING is best explained after we cover `vi(1)`, so we'll skip
91 that too. This still leaves with a big chunk of the man page to
92 discuss.
93
94 A technical manual page is not a novel: the content is often laid
95 out in an arbitrary order, to make it easier to find what you are
96 looking for (e.g. in alphabetic order) and not to make a top-to-bottom
97 read entertaining. So I felt like reordering things a bit: not only
98 I will cover the sections in a differ order than what you find in
99 the manual page, but I will also shuffle the content of each section
100 when it make sense to me.
101
102 Since I am very much a theoretical, grammar-first kind of person,
103 my totally subjective best way to dive into this is starting with
104 the grammar section!
105
106 ## Part 1: shell grammar
107
108 After reading the input, either from a file or from the standard
109 input, `sh` does the following:
110
111 1. It breaks the input into words and operators (special characters).
112 2. It expands the text according to the rules in **Expansion** section below.
113 3. It splits the text into commands and arguments.
114 4. It performs input / output redirection (see the **Redirection** section below).
115 5. It runs the commands.
116 6. It waits for the commands to complete and collects the exit status.
117
118 The next three sub-sections (Redirection, Expansion and Quoting) are found
119 in the exact opposite order in the manual page.
120
121 ### Redirection
122
123 Together with *piping*, which we will cover in one of the next episodes,
124 redirection is one of the key features of UNIX.
125
126 ```
127 Redirection is used to open, close, or otherwise manipulate files, using
128 redirection operators in combination with numerical file descriptors. A
129 minimum of ten (0-9) descriptors are supported; by convention standard
130 input is file descriptor 0, standard output file descriptor 1, and
131 standard error file descriptor 2.
132 ```
133
134 If the number `[n]` is not specified, it defaults to either `0`
135 (standard input) or `1` (standard output) depending if the angled
136 brackets are pointing to the left or to the right.
137
138 The main redirectors are `[n]<file`, to read input from `file`
139 instead of typing it in manually, and its counterpart `[n]>file`
140 to write standard output (or whatever is described by the file
141 descriptor `[n]`) to file. For example, if you want to log every
142 error message of `command` to `file.log`, you can use
143
144 ```
145 $ command 2>file.log
146 ```
147
148 The `[n]>>file` redirector is similar, but it appends stuff to
149 `file` instead of overwriting it. Both `>` and `>>` create the file
150 if it does not exist.
151
152 There is also `[n]<<`:
153
154 ```
155 [n]<< This form of redirection, called a here document, is used to copy
156 a block of lines to a temporary file until a line matching
157 delimiter is read. When the command is executed, standard input
158 is redirected from the temporary file to file descriptor n, or
159 standard input by default.
160 ```
161
162 For example
163
164 ```
165 $ cat <<BYEBYE
166 > one line,
167 > another line
168 > and so on
169 > BYEBYE
170 ```
171
172 Outputs those three lines. It is useful in shell scripts, when you
173 want to output a block of text. The variant `[n]<<-` strips out
174 `Tab` characters.
175
176 Another useful one is `[n]>&fd`, which "merges" the file descriptors
177 `[n]` and `fd`. For example, if you want to make your command
178 completely silent, you can merge standard output and standard error
179 and redirect them both to `/dev/null` with
180
181 ```
182 $ command >/dev/null 2>&1
183 ```
184
185 ### Expansion
186
187 There are essentially five kinds of expansion that the shell performs:
188 tilde expansion, parameter expansion, command expansion, arithmetic
189 expansion and filename expansion.
190
191 **Tilde expansion** is quite straightforward, so let's just quote
192 the man page:
193
194 ```
195 Firstly, tilde expansion occurs on words beginning with the `~'
196 character. Any characters following the tilde, up to the next colon,
197 slash, or blank, are taken as a login name and substituted with that
198 user's home directory, as defined in passwd(5). A tilde by itself is
199 expanded to the contents of the variable HOME. This notation can be used
200 in variable assignments, in the assignment half, immediately after the
201 equals sign or a colon, up to the next slash or colon, if any.
202
203 PATH=~alice:~bob/jobs
204 ```
205
206 **Parameters** can be variable names or special parameters. Variables
207 can be assigned with the simple syntax `variable=value` and their
208 value can be "accessed" with `$variable`. In case of ambiguity you
209 need to enclose the variable name in curly braces `{}`: say you
210 want to type the string `subroutines` and you have a variable
211 `prefix=sub`. The shell will complain at a `$prefixroutines` about
212 there being no variable with such name, so you have to use
213 `${prefix}routines`.
214
215 The most useful special parameters are:
216
217 * Numbers `1`, `2`, `3`... that refer to the *positional parameters*:
218
219 ```
220 These parameters are set when a shell, shell script, or shell function is
221 invoked. Each argument passed to a shell or shell script is assigned a
222 positional parameter, starting at 1, and assigned sequentially.
223 ```
224
225 * The number `0`, which refers to the name of the shell or of the shell
226 script being executed.
227 * The symbols `@` and `*` which expand to all positional parameters
228 at once; they behave differently when enclosed in double quotes:
229 with `"$@"` the parameters are split into fields, with `"$*"` they are not.
230
231 There are some useful constructs to expand a parameter in special
232 ways. The constructs `${parameter:-[word]}` and `${parameter:=[word]}`
233 expand to `[word]` if `parameter` is unset or empty, with the second
234 one also assigning the value `[word]` to `parameter` for subsequent
235 use. Instead, `${parameter:+[word]}` expands to `[word]` *unless*
236 `parameter` is unset or empty, in which case it expands to the empty
237 string. In all these cases, if the colon is omitted `[word]` is
238 substituted only if `parameter` is unset (not if it is empty).
239
240 Another useful one is `${#parameter}`, which expands to the length
241 of `parameter`. Finally there are some constructs that can be used
242 to remove prefixes or suffixes from the expansion of a parameter:
243
244 | Construct | Effect |
245 |:---:|:---:|
246 | `${parameter%[word]}` | Delete smallest possible suffix matching word |
247 | `${parameter%%[word]}` | Delete largest possible suffix matching word |
248 | `${parameter#[word]}` | Delete smallest possible prefix matching word |
249 | `${parameter##[word]}` | Delete largest possible prefix matching word |
250
251 What unfortunately is not explained in the man page of `sh(1)` (but
252 can be found in that of `ksh(1)`) is that `[word]` in this case can
253 be a *pattern*. See [glob(7)](https://man.openbsd.org/OpenBSD-7.1/glob.7)
254 for a description of patterns, which are the same that are used for
255 filename expansion (with the exception that slashes and dots are
256 treated as normal characters).
257
258 For example, using `*` which means "any sequence of zero or more
259 characters":
260
261 ```
262 $ x="we can,separate,stuff,with commas"
263 $ echo ${x#*,}
264 separate,stuff,with commas
265 $ echo ${x##*,}
266 with commas
267 ```
268
269 Then there is **command expansion**:
270
271 ```
272 Command expansion has a command executed in a subshell and the results
273 output in its place. The basic format is:
274
275 $(command)
276 or
277 `command`
278
279 The results are subject to field splitting and pathname expansion; no
280 other form of expansion happens. If command is contained within double
281 quotes, field splitting does not happen either.
282 ```
283
284 **Arithmetic expansion** uses the syntax `$((expression))`. An
285 `expression` can be a combination of integers (no floating point
286 arithmetic in the shell!), parameter names and the usual arithmetic
287 operations. I won't copy them here; if you are familiar with C or
288 C-like languages, you can use pretty much all the operations you
289 are used to, including logic operations (resulting in 0 or 1),
290 assignment operations like `+=` and bitwise operations like `~`,
291 `&` and `<<`. Even the *ternary if* `expression ? expr1 : expr2`
292 is available.
293
294 Finally, **filename expansion** uses the aforementioned rules of
295 [glob(7)](https://man.openbsd.org/OpenBSD-7.1/glob.7) to expand
296 filenames. To sum them up:
297
298 * As we have already seen, `*` expands to any sequence of characters.
299 * `?` matches any single character.
300 * `[..]` matches any character in place of the double dot, or any
301 character *not* listed if the first is an exclamation mark.
302 * `[[:class:]]` matches any character of a certain class; for example
303 `class` could be `alnum` for alphanumeric characters or `upper` for
304 uppercase letters.
305 * `[x-y]` matches any character in the range between `x` and `y`.
306
307 To illustrate what all of this means, check this out (the command `ls` is
308 used to list all files in the current directory):
309
310 ```
311 $ ls
312 box file3 mbox typescript
313 count_args.sh file4 mnt videos
314 file1 git music
315 file2 mail phone-laptop-swap
316 $ echo m*
317 mail mbox mnt music
318 $ echo m???
319 mail mbox
320 $ echo file[2-4]
321 file2 file3 file4
322 ```
323
324 ### Quoting
325
326 Sometimes we may want to write some of the special characters
327 described above, such as dollar signs, without their special meaning.
328 You can do so by *escaping*, or *quoting* them. There are essentially
329 three ways to quote a character or a group of characters:
330
331 * Backslash:
332
333 ```
334 A backslash (\) can be used to quote any character except a newline. If
335 a newline follows a backslash the shell removes them both, effectively
336 making the following line part of the current one.
337 ```
338
339 This means that a backslash can also effectively be used to split
340 long lines into multiple lines, for example for ease of editing a
341 shell script.
342
343 * Single quotes:
344
345 ```
346 A group of characters can be enclosed within single quotes (') to quote
347 every character within the quotes.
348 ```
349
350 * And double quotes:
351
352 ```
353 A group of characters can be enclosed within double quotes (") to quote
354 every character within the quotes except a backquote (`) or a dollar sign
355 ($), both of which retain their special meaning. A backslash (\) within
356 double quotes retains its special meaning, but only when followed by a
357 backquote, dollar sign, double quote, newline, or another backslash. An
358 at sign (@) within double quotes has a special meaning (see SPECIAL
359 PARAMETERS, below).
360 ```
361
362 Basically the difference between single and double quotes is that
363 the former turn literally everything they enclose into simple text,
364 while the latter still parse and expand some special characters
365 (for example the dollar sign `$` for variables).
366
367 As an addition, remember that anything enclosed in single or double
368 quotes is considered a single field (word). This was briefly mentioned
369 in the Expansion section, but I skipped it. To illustrate what I
370 mean, let's write a short script and run it first with some words
371 as arguments and then with the same words enclosed in quotes:
372
373 ```
374 $ echo 'echo $#' > count_args.sh
375 $ count_args.sh how many words are there
376 5
377 $ count_args.sh "how many words are there"
378 1
379 ```
380
381 ## Until next time
382
383 This was a very long post, but it made sense to keep all the grammar
384 rules together. To finish this manual page we are going to need
385 another long post, or two shorter ones.
386
387 See you next time!
388
389 *Next in the series: [sh(1) - part 2: commands and builtins](../2022-09-20-sh-2)*

	sebastiano.tronto.net Source files and build scripts for my personal website
	git clone https://git.tronto.net/sebastiano.tronto.net
	Download \| Log \| Files \| Refs \| README