sh-1.md (14774B)
1 # The man page reading club: sh(1) - part 1: shell grammar 2 3 *This post is part of a [series](../../series)* 4 5 After [last time's short entry](../2022-07-07-shutdown) and a 6 relatively long hiatus, we are back in business with a big one! 7 8 ## A new day 9 10 *After a good night of sleep and a cup of whatever people call 11 coffee in the post-apocalypse, you turn your computer back on. You 12 would like to learn more stuff, but you are unsure where to start 13 from. You vaguely remember a `man afterboot` being mentioned 14 somewhere, so you start from there.* 15 16 ``` 17 DESCRIPTION 18 Starting out 19 This document attempts to list items for the system administrator to 20 check and set up after the installation and first complete boot of the 21 system. The idea is to create a list of items that can be checked off so 22 that you have a warm fuzzy feeling that something obvious has not been 23 missed. A basic knowledge of UNIX is assumed, otherwise type: 24 25 $ help 26 ``` 27 28 *You do have some knowledge of UNIX, someone might call it "basic", 29 but you believe "scattered" is a more appropriate adjective. In any 30 case, a review won't hurt. You type the command* 31 32 ``` 33 $ help 34 ``` 35 36 *And a manual page shows up. You could have typed `man help` instead 37 to get the same result. After skimming throught the introduction, 38 you discover something worth digging into.* 39 40 ``` 41 The Unix shell 42 After logging in, some system messages are typically displayed, and then 43 the user is able to enter commands to be processed by the shell program. 44 The shell is a command-line interpreter that reads user input (normally 45 from a terminal) and executes commands. There are many different shells 46 available; OpenBSD ships with csh(1), ksh(1), and sh(1). Each user's 47 shell is indicated by the last field of their corresponding entry in the 48 system password file (/etc/passwd). 49 ``` 50 51 *You have a look at `/etc/passwd` and you see that your user's shell 52 is `ksh`. So you type `man ksh` and start reading.* 53 54 ``` 55 DESCRIPTION 56 ksh is a command interpreter intended for both interactive and shell 57 script use. Its command language is a superset of the sh(1) shell 58 language. 59 ``` 60 61 *You are quite rusty on the Math jargon - some of your friends used 62 to talk like that in real life, but you never bothered to learn - 63 but "superset" sounds like "it is larger than". Is this another 64 [`less` vs `more`](../2022-06-08-more) kind of thing, where one 65 command is just a simpler version of the other? Let's see what 66 `sh(1)` has to say about it* 67 68 ``` 69 This version of sh is actually ksh in disguise. 70 ``` 71 72 *Ah-ah! Exactly as you thought. Just like the other time, you prefer 73 to go with the simpler version. Enough of this "fun is precious" 74 bullshit, you want to learn as soon as possible!* 75 76 ## sh(1) 77 78 *Follow along at [man.openbsd.org](https://man.openbsd.org/OpenBSD-7.1/sh)* 79 80 Despite having less features than more complex shells like `ksh` 81 or `bash`, the manual page for `sh` is still very long. So we are 82 going to split it into two or more parts. 83 84 The main sections I intend to cover are BUILTINS, SHELL GRAMMAR and 85 COMMANDS. Parts of SPECIAL PARAMETERS and ENVIRONMENT are quoted 86 and explained in other sections, so I am probably going to skip 87 these too. I think we can skip the invocation options, since we 88 are mostly going to run our shell implicitly when logging in or 89 when executing a script. Finally, COMMAND HISTORY AND COMMAND LINE 90 EDITING is best explained after we cover `vi(1)`, so we'll skip 91 that too. This still leaves with a big chunk of the man page to 92 discuss. 93 94 A technical manual page is not a novel: the content is often laid 95 out in an arbitrary order, to make it easier to find what you are 96 looking for (e.g. in alphabetic order) and not to make a top-to-bottom 97 read entertaining. So I felt like reordering things a bit: not only 98 I will cover the sections in a differ order than what you find in 99 the manual page, but I will also shuffle the content of each section 100 when it make sense to me. 101 102 Since I am very much a theoretical, grammar-first kind of person, 103 my totally subjective best way to dive into this is starting with 104 the grammar section! 105 106 ## Part 1: shell grammar 107 108 After reading the input, either from a file or from the standard 109 input, `sh` does the following: 110 111 1. It breaks the input into words and operators (special characters). 112 2. It expands the text according to the rules in **Expansion** section below. 113 3. It splits the text into commands and arguments. 114 4. It performs input / output redirection (see the **Redirection** section below). 115 5. It runs the commands. 116 6. It waits for the commands to complete and collects the exit status. 117 118 The next three sub-sections (Redirection, Expansion and Quoting) are found 119 in the exact opposite order in the manual page. 120 121 ### Redirection 122 123 Together with *piping*, which we will cover in one of the next episodes, 124 redirection is one of the key features of UNIX. 125 126 ``` 127 Redirection is used to open, close, or otherwise manipulate files, using 128 redirection operators in combination with numerical file descriptors. A 129 minimum of ten (0-9) descriptors are supported; by convention standard 130 input is file descriptor 0, standard output file descriptor 1, and 131 standard error file descriptor 2. 132 ``` 133 134 If the number `[n]` is not specified, it defaults to either `0` 135 (standard input) or `1` (standard output) depending if the angled 136 brackets are pointing to the left or to the right. 137 138 The main redirectors are `[n]<file`, to read input from `file` 139 instead of typing it in manually, and its counterpart `[n]>file` 140 to write standard output (or whatever is described by the file 141 descriptor `[n]`) to file. For example, if you want to log every 142 error message of `command` to `file.log`, you can use 143 144 ``` 145 $ command 2>file.log 146 ``` 147 148 The `[n]>>file` redirector is similar, but it appends stuff to 149 `file` instead of overwriting it. Both `>` and `>>` create the file 150 if it does not exist. 151 152 There is also `[n]<<`: 153 154 ``` 155 [n]<< This form of redirection, called a here document, is used to copy 156 a block of lines to a temporary file until a line matching 157 delimiter is read. When the command is executed, standard input 158 is redirected from the temporary file to file descriptor n, or 159 standard input by default. 160 ``` 161 162 For example 163 164 ``` 165 $ cat <<BYEBYE 166 > one line, 167 > another line 168 > and so on 169 > BYEBYE 170 ``` 171 172 Outputs those three lines. It is useful in shell scripts, when you 173 want to output a block of text. The variant `[n]<<-` strips out 174 `Tab` characters. 175 176 Another useful one is `[n]>&fd`, which "merges" the file descriptors 177 `[n]` and `fd`. For example, if you want to make your command 178 completely silent, you can merge standard output and standard error 179 and redirect them both to `/dev/null` with 180 181 ``` 182 $ command >/dev/null 2>&1 183 ``` 184 185 ### Expansion 186 187 There are essentially five kinds of expansion that the shell performs: 188 tilde expansion, parameter expansion, command expansion, arithmetic 189 expansion and filename expansion. 190 191 **Tilde expansion** is quite straightforward, so let's just quote 192 the man page: 193 194 ``` 195 Firstly, tilde expansion occurs on words beginning with the `~' 196 character. Any characters following the tilde, up to the next colon, 197 slash, or blank, are taken as a login name and substituted with that 198 user's home directory, as defined in passwd(5). A tilde by itself is 199 expanded to the contents of the variable HOME. This notation can be used 200 in variable assignments, in the assignment half, immediately after the 201 equals sign or a colon, up to the next slash or colon, if any. 202 203 PATH=~alice:~bob/jobs 204 ``` 205 206 **Parameters** can be variable names or special parameters. Variables 207 can be assigned with the simple syntax `variable=value` and their 208 value can be "accessed" with `$variable`. In case of ambiguity you 209 need to enclose the variable name in curly braces `{}`: say you 210 want to type the string `subroutines` and you have a variable 211 `prefix=sub`. The shell will complain at a `$prefixroutines` about 212 there being no variable with such name, so you have to use 213 `${prefix}routines`. 214 215 The most useful special parameters are: 216 217 * Numbers `1`, `2`, `3`... that refer to the *positional parameters*: 218 219 ``` 220 These parameters are set when a shell, shell script, or shell function is 221 invoked. Each argument passed to a shell or shell script is assigned a 222 positional parameter, starting at 1, and assigned sequentially. 223 ``` 224 225 * The number `0`, which refers to the name of the shell or of the shell 226 script being executed. 227 * The symbols `@` and `*` which expand to all positional parameters 228 at once; they behave differently when enclosed in double quotes: 229 with `"$@"` the parameters are split into fields, with `"$*"` they are not. 230 231 There are some useful constructs to expand a parameter in special 232 ways. The constructs `${parameter:-[word]}` and `${parameter:=[word]}` 233 expand to `[word]` if `parameter` is unset or empty, with the second 234 one also assigning the value `[word]` to `parameter` for subsequent 235 use. Instead, `${parameter:+[word]}` expands to `[word]` *unless* 236 `parameter` is unset or empty, in which case it expands to the empty 237 string. In all these cases, if the colon is omitted `[word]` is 238 substituted only if `parameter` is unset (not if it is empty). 239 240 Another useful one is `${#parameter}`, which expands to the length 241 of `parameter`. Finally there are some constructs that can be used 242 to remove prefixes or suffixes from the expansion of a parameter: 243 244 | Construct | Effect | 245 |:---:|:---:| 246 | `${parameter%[word]}` | Delete smallest possible suffix matching word | 247 | `${parameter%%[word]}` | Delete largest possible suffix matching word | 248 | `${parameter#[word]}` | Delete smallest possible prefix matching word | 249 | `${parameter##[word]}` | Delete largest possible prefix matching word | 250 251 What unfortunately is not explained in the man page of `sh(1)` (but 252 can be found in that of `ksh(1)`) is that `[word]` in this case can 253 be a *pattern*. See [glob(7)](https://man.openbsd.org/OpenBSD-7.1/glob.7) 254 for a description of patterns, which are the same that are used for 255 filename expansion (with the exception that slashes and dots are 256 treated as normal characters). 257 258 For example, using `*` which means "any sequence of zero or more 259 characters": 260 261 ``` 262 $ x="we can,separate,stuff,with commas" 263 $ echo ${x#*,} 264 separate,stuff,with commas 265 $ echo ${x##*,} 266 with commas 267 ``` 268 269 Then there is **command expansion**: 270 271 ``` 272 Command expansion has a command executed in a subshell and the results 273 output in its place. The basic format is: 274 275 $(command) 276 or 277 `command` 278 279 The results are subject to field splitting and pathname expansion; no 280 other form of expansion happens. If command is contained within double 281 quotes, field splitting does not happen either. 282 ``` 283 284 **Arithmetic expansion** uses the syntax `$((expression))`. An 285 `expression` can be a combination of integers (no floating point 286 arithmetic in the shell!), parameter names and the usual arithmetic 287 operations. I won't copy them here; if you are familiar with C or 288 C-like languages, you can use pretty much all the operations you 289 are used to, including logic operations (resulting in 0 or 1), 290 assignment operations like `+=` and bitwise operations like `~`, 291 `&` and `<<`. Even the *ternary if* `expression ? expr1 : expr2` 292 is available. 293 294 Finally, **filename expansion** uses the aforementioned rules of 295 [glob(7)](https://man.openbsd.org/OpenBSD-7.1/glob.7) to expand 296 filenames. To sum them up: 297 298 * As we have already seen, `*` expands to any sequence of characters. 299 * `?` matches any single character. 300 * `[..]` matches any character in place of the double dot, or any 301 character *not* listed if the first is an exclamation mark. 302 * `[[:class:]]` matches any character of a certain class; for example 303 `class` could be `alnum` for alphanumeric characters or `upper` for 304 uppercase letters. 305 * `[x-y]` matches any character in the range between `x` and `y`. 306 307 To illustrate what all of this means, check this out (the command `ls` is 308 used to list all files in the current directory): 309 310 ``` 311 $ ls 312 box file3 mbox typescript 313 count_args.sh file4 mnt videos 314 file1 git music 315 file2 mail phone-laptop-swap 316 $ echo m* 317 mail mbox mnt music 318 $ echo m??? 319 mail mbox 320 $ echo file[2-4] 321 file2 file3 file4 322 ``` 323 324 ### Quoting 325 326 Sometimes we may want to write some of the special characters 327 described above, such as dollar signs, without their special meaning. 328 You can do so by *escaping*, or *quoting* them. There are essentially 329 three ways to quote a character or a group of characters: 330 331 * Backslash: 332 333 ``` 334 A backslash (\) can be used to quote any character except a newline. If 335 a newline follows a backslash the shell removes them both, effectively 336 making the following line part of the current one. 337 ``` 338 339 This means that a backslash can also effectively be used to split 340 long lines into multiple lines, for example for ease of editing a 341 shell script. 342 343 * Single quotes: 344 345 ``` 346 A group of characters can be enclosed within single quotes (') to quote 347 every character within the quotes. 348 ``` 349 350 * And double quotes: 351 352 ``` 353 A group of characters can be enclosed within double quotes (") to quote 354 every character within the quotes except a backquote (`) or a dollar sign 355 ($), both of which retain their special meaning. A backslash (\) within 356 double quotes retains its special meaning, but only when followed by a 357 backquote, dollar sign, double quote, newline, or another backslash. An 358 at sign (@) within double quotes has a special meaning (see SPECIAL 359 PARAMETERS, below). 360 ``` 361 362 Basically the difference between single and double quotes is that 363 the former turn literally everything they enclose into simple text, 364 while the latter still parse and expand some special characters 365 (for example the dollar sign `$` for variables). 366 367 As an addition, remember that anything enclosed in single or double 368 quotes is considered a single field (word). This was briefly mentioned 369 in the Expansion section, but I skipped it. To illustrate what I 370 mean, let's write a short script and run it first with some words 371 as arguments and then with the same words enclosed in quotes: 372 373 ``` 374 $ echo 'echo $#' > count_args.sh 375 $ count_args.sh how many words are there 376 5 377 $ count_args.sh "how many words are there" 378 1 379 ``` 380 381 ## Until next time 382 383 This was a very long post, but it made sense to keep all the grammar 384 rules together. To finish this manual page we are going to need 385 another long post, or two shorter ones. 386 387 See you next time! 388 389 *Next in the series: [sh(1) - part 2: commands and builtins](../2022-09-20-sh-2)*