sebastiano.tronto.net

Source files and build scripts for my personal website
git clone https://git.tronto.net/sebastiano.tronto.net
Download | Log | Files | Refs | README

sh-1.md (14773B)


      1 # The man page reading club: sh(1) - part 1: shell grammar
      2 
      3 *This post is part of a [series](../../series)*
      4 
      5 After [last time's short entry](../2022-07-07-shutdown) and a
      6 relatively long hiatus, we are back in business with a big one!
      7 
      8 ## A new day
      9 
     10 *After a good night of sleep and a cup of whatever people call
     11 coffee in the post-apocalypse, you turn your computer back on. You
     12 would like to learn more stuff, but you are unsure where to start
     13 from. You vaguely remember a `man afterboot` being mentioned
     14 somewhere, so you start from there.*
     15 
     16 ```
     17 DESCRIPTION
     18    Starting out
     19      This document attempts to list items for the system administrator to
     20      check and set up after the installation and first complete boot of the
     21      system.  The idea is to create a list of items that can be checked off so
     22      that you have a warm fuzzy feeling that something obvious has not been
     23      missed.  A basic knowledge of UNIX is assumed, otherwise type:
     24 
     25 	   $ help
     26 ```
     27 
     28 *You do have some knowledge of UNIX, someone might call it "basic",
     29 but you believe "scattered" is a more appropriate adjective. In any
     30 case, a review won't hurt. You type the command*
     31 
     32 ```
     33 $ help
     34 ```
     35 
     36 *And a manual page shows up. You could have typed `man help` instead
     37 to get the same result. After skimming throught the introduction,
     38 you discover something worth digging into.*
     39 
     40 ```
     41    The Unix shell
     42      After logging in, some system messages are typically displayed, and then
     43      the user is able to enter commands to be processed by the shell program.
     44      The shell is a command-line interpreter that reads user input (normally
     45      from a terminal) and executes commands.  There are many different shells
     46      available; OpenBSD ships with csh(1), ksh(1), and sh(1).  Each user's
     47      shell is indicated by the last field of their corresponding entry in the
     48      system password file (/etc/passwd).
     49 ```
     50 
     51 *You have a look at `/etc/passwd` and you see that your user's shell
     52 is `ksh`. So you type `man ksh` and start reading.*
     53 
     54 ```
     55 DESCRIPTION
     56      ksh is a command interpreter intended for both interactive and shell
     57      script use.  Its command language is a superset of the sh(1) shell
     58      language.
     59 ```
     60 
     61 *You are quite rusty on the Math jargon - some of your friends used
     62 to talk like that in real life, but you never bothered to learn -
     63 but "superset" sounds like "it is larger than". Is this another
     64 [`less` vs `more`](../2022-06-08-more) kind of thing, where one
     65 command is just a simpler version of the other? Let's see what
     66 `sh(1)` has to say about it*
     67 
     68 ```
     69      This version of sh is actually ksh in disguise.
     70 ```
     71 
     72 *Ah-ah! Exactly as you thought. Just like the other time, you prefer
     73 to go with the simpler version. Enough of this "fun is precious"
     74 bullshit, you want to learn as soon as possible!*
     75 
     76 ## sh(1)
     77 
     78 *Follow along at [man.openbsd.org](https://man.openbsd.org/OpenBSD-7.1/sh)*
     79 
     80 Despite having less features than more complex shells like `ksh`
     81 or `bash`, the manual page for `sh` is still very long. So we are
     82 going to split it into two or more parts.
     83 
     84 The main sections I intend to cover are BUILTINS, SHELL GRAMMAR and
     85 COMMANDS.  Parts of SPECIAL PARAMETERS and ENVIRONMENT are quoted
     86 and explained in other sections, so I am probably going to skip
     87 these too.  I think we can skip the invocation options, since we
     88 are mostly going to run our shell implicitly when logging in or
     89 when executing a script. Finally, COMMAND HISTORY AND COMMAND LINE
     90 EDITING is best explained after we cover `vi(1)`, so we'll skip
     91 that too. This still leaves with a big chunk of the man page to
     92 discuss.
     93 
     94 A technical manual page is not a novel: the content is often laid
     95 out in an arbitrary order, to make it easier to find what you are
     96 looking for (e.g. in alphabetic order) and not to make a top-to-bottom
     97 read entertaining. So I felt like reordering things a bit: not only
     98 I will cover the sections in a differ order than what you find in
     99 the manual page, but I will also shuffle the content of each section
    100 when it make sense to me.
    101 
    102 Since I am very much a theoretical, grammar-first kind of person,
    103 my totally subjective best way to dive into this is starting with
    104 the grammar section!
    105 
    106 ## Part 1: shell grammar
    107 
    108 After reading the input, either from a file or from the standard
    109 input, `sh` does the following:
    110 
    111 1. It breaks the input into words and operators (special characters).
    112 2. It expands the text according to the rules in **Expansion** section below.
    113 3. It splits the text into commands and arguments.
    114 4. It performs input / output redirection (see the **Redirection** section below).
    115 5. It runs the commands.
    116 6. It waits for the commands to complete and collects the exit status.
    117 
    118 The next three sub-sections (Redirection, Expansion and Quoting) are found
    119 in the exact opposite order in the manual page.
    120 
    121 ### Redirection
    122 
    123 Together with *piping*, which we will cover in one of the next episodes,
    124 redirection is one of the key features of UNIX.
    125 
    126 ```
    127 	Redirection is used to open, close, or otherwise manipulate files, using
    128 	redirection operators in combination with numerical file descriptors.  A
    129 	minimum of ten (0-9) descriptors are supported; by convention standard
    130 	input is file descriptor 0, standard output file descriptor 1, and
    131 	standard error file descriptor 2.
    132 ```
    133 
    134 If the number `[n]` is not specified, it defaults to either `0`
    135 (standard input) or `1` (standard output) depending if the angled
    136 brackets are pointing to the left or to the right.
    137 
    138 The main redirectors are `[n]<file`, to read input from `file`
    139 instead of typing it in manually, and its counterpart `[n]>file`
    140 to write standard output (or whatever is described by the file
    141 descriptor `[n]`) to file. For example, if you want to log every
    142 error message of `command` to `file.log`, you can use
    143 
    144 ```
    145 $ command 2>file.log
    146 ```
    147 
    148 The `[n]>>file` redirector is similar, but it appends stuff to
    149 `file` instead of overwriting it. Both `>` and `>>` create the file
    150 if it does not exist.
    151 
    152 There is also `[n]<<`:
    153 
    154 ```
    155 [n]<<  This form of redirection, called a here document, is used to copy
    156        a block of lines to a temporary file until a line matching
    157        delimiter is read. When the command is executed, standard input
    158        is redirected from the temporary file to file descriptor n, or
    159        standard input by default.
    160 ```
    161 
    162 For example
    163 
    164 ```
    165 $ cat <<BYEBYE
    166 > one line,
    167 > another line
    168 > and so on
    169 > BYEBYE
    170 ```
    171 
    172 Outputs those three lines. It is useful in shell scripts, when you
    173 want to output a block of text. The variant `[n]<<-` strips out
    174 `Tab` characters.
    175 
    176 Another useful one is `[n]>&fd`, which "merges" the file descriptors
    177 `[n]` and `fd`. For example, if you want to make your command
    178 completely silent, you can merge standard output and standard error
    179 and redirect them both to `/dev/null` with
    180 
    181 ```
    182 $ command >&2 >/dev/null
    183 ```
    184 
    185 ### Expansion
    186 
    187 There are essentially five kinds of expansion that the shell performs:
    188 tilde expansion, parameter expansion, command expansion, arithmetic
    189 expansion and filename expansion.
    190 
    191 **Tilde expansion** is quite straightforward, so let's just quote
    192 the man page:
    193 
    194 ```
    195      Firstly, tilde expansion occurs on words beginning with the `~'
    196      character.	 Any characters following the tilde, up to the next colon,
    197      slash, or blank, are taken as a login name and substituted with that
    198      user's home directory, as defined in passwd(5).  A tilde by itself is
    199      expanded to the contents of the variable HOME.  This notation can be used
    200      in variable assignments, in the assignment half, immediately after the
    201      equals sign or a colon, up to the next slash or colon, if any.
    202 
    203 	   PATH=~alice:~bob/jobs
    204 ```
    205 
    206 **Parameters** can be variable names or special parameters. Variables
    207 can be assigned with the simple syntax `variable=value` and their
    208 value can be "accessed" with `$variable`. In case of ambiguity you
    209 need to enclose the variable name in curly braces `{}`: say you
    210 want to type the string `subroutines` and you have a variable
    211 `prefix=sub`. The shell will complain at a `$prefixroutines` about
    212 there being no variable with such name, so you have to use
    213 `${prefix}routines`.
    214 
    215 The most useful special parameters are:
    216 
    217 * Numbers `1`, `2`, `3`... that refer to the *positional parameters*:
    218 
    219 ```
    220     These parameters are set when a shell, shell script, or shell function is
    221     invoked.  Each argument passed to a shell or shell script is assigned a
    222     positional parameter, starting at 1, and assigned sequentially.
    223 ```
    224 
    225 * The number `0`, which refers to the name of the shell or of the shell
    226   script being executed.
    227 * The symbols `@` and `*` which expand to all positional parameters
    228   at once; they behave differently when enclosed in double quotes:
    229   with `"$@"` the parameters are split into fields, with `"$*"` they are not.
    230 
    231 There are some useful constructs to expand a parameter in special
    232 ways.  The constructs `${parameter:-[word]}` and `${parameter:=[word]}`
    233 expand to `[word]` if `parameter` is unset or empty, with the second
    234 one also assigning the value `[word]` to `parameter` for subsequent
    235 use. Instead, `${parameter:+[word]}` expands to `[word]` *unless*
    236 `parameter` is unset or empty, in which case it expands to the empty
    237 string. In all these cases, if the colon is omitted `[word]` is
    238 substituted only if `parameter` is unset (not if it is empty).
    239 
    240 Another useful one is `${#parameter}`, which expands to the length
    241 of `parameter`. Finally there are some constructs that can be used
    242 to remove prefixes or suffixes from the expansion of a parameter:
    243 
    244 | Construct | Effect |
    245 |:---:|:---:|
    246 | `${parameter%[word]}` | Delete smallest possible suffix matching word |
    247 | `${parameter%%[word]}` | Delete largest possible suffix matching word |
    248 | `${parameter#[word]}` | Delete smallest possible prefix matching word |
    249 | `${parameter##[word]}` | Delete largest possible prefix matching word |
    250 
    251 What unfortunately is not explained in the man page of `sh(1)` (but
    252 can be found in that of `ksh(1)`) is that `[word]` in this case can
    253 be a *pattern*.  See [glob(7)](https://man.openbsd.org/OpenBSD-7.1/glob.7)
    254 for a description of patterns, which are the same that are used for
    255 filename expansion (with the exception that slashes and dots are
    256 treated as normal characters).
    257 
    258 For example, using `*` which means "any sequence of zero or more
    259 characters":
    260 
    261 ```
    262 $ x="we can,separate,stuff,with commas"
    263 $ echo ${x#*,}
    264 separate,stuff,with commas
    265 $ echo ${x##*,}
    266 with commas
    267 ```
    268 
    269 Then there is **command expansion**:
    270 
    271 ```
    272      Command expansion has a command executed in a subshell and the results
    273      output in its place.  The basic format is:
    274 
    275 	   $(command)
    276      or
    277 	   `command`
    278 
    279      The results are subject to field splitting and pathname expansion; no
    280      other form of expansion happens.  If command is contained within double
    281      quotes, field splitting does not happen either. 
    282 ```
    283 
    284 **Arithmetic expansion** uses the syntax `$((expression))`. An
    285 `expression` can be a combination of integers (no floating point
    286 arithmetic in the shell!), parameter names and the usual arithmetic
    287 operations. I won't copy them here; if you are familiar with C or
    288 C-like languages, you can use pretty much all the operations you
    289 are used to, including logic operations (resulting in 0 or 1),
    290 assignment operations like `+=` and bitwise operations like `~`,
    291 `&` and `<<`.  Even the *ternary if* `expression ? expr1 : expr2`
    292 is available.
    293 
    294 Finally, **filename expansion** uses the aforementioned rules of
    295 [glob(7)](https://man.openbsd.org/OpenBSD-7.1/glob.7) to expand
    296 filenames.  To sum them up:
    297 
    298 * As we have already seen, `*` expands to any sequence of characters.
    299 * `?` matches any single character.
    300 * `[..]` matches any character in place of the double dot, or any
    301   character *not* listed if the first is an exclamation mark.
    302 * `[[:class:]]` matches any character of a certain class; for example
    303   `class` could be `alnum` for alphanumeric characters or `upper` for
    304   uppercase letters.
    305 * `[x-y]` matches any character in the range between `x` and `y`.
    306 
    307 To illustrate what all of this means, check this out (the command `ls` is
    308 used to list all files in the current directory):
    309 
    310 ```
    311 $ ls
    312 box                  file3                mbox                 typescript
    313 count_args.sh        file4                mnt                  videos
    314 file1                git                  music
    315 file2                mail                 phone-laptop-swap
    316 $ echo m*
    317 mail mbox mnt music
    318 $ echo m???
    319 mail mbox
    320 $ echo file[2-4]
    321 file2 file3 file4
    322 ```
    323 
    324 ### Quoting
    325 
    326 Sometimes we may want to write some of the special characters
    327 described above, such as dollar signs, without their special meaning.
    328 You can do so by *escaping*, or *quoting* them. There are essentially
    329 three ways to quote a character or a group of characters:
    330 
    331 * Backslash:
    332 
    333 ```
    334      A backslash (\) can be used to quote any character except a newline.  If
    335      a newline follows a backslash the shell removes them both, effectively
    336      making the following line part of the current one.
    337 ```
    338 
    339 This means that a backslash can also effectively be used to split
    340 long lines into multiple lines, for example for ease of editing a
    341 shell script.
    342 
    343 * Single quotes:
    344 
    345 ```
    346      A group of characters can be enclosed within single quotes (') to quote
    347      every character within the quotes.
    348 ```
    349 
    350 * And double quotes:
    351 
    352 ```
    353      A group of characters can be enclosed within double quotes (") to quote
    354      every character within the quotes except a backquote (`) or a dollar sign
    355      ($), both of which retain their special meaning.  A backslash (\) within
    356      double quotes retains its special meaning, but only when followed by a
    357      backquote, dollar sign, double quote, newline, or another backslash.  An
    358      at sign (@) within double quotes has a special meaning (see SPECIAL
    359      PARAMETERS, below).
    360 ```
    361 
    362 Basically the difference between single and double quotes is that
    363 the former turn literally everything they enclose into simple text,
    364 while the latter still parse and expand some special characters
    365 (for example the dollar sign `$` for variables).
    366 
    367 As an addition, remember that anything enclosed in single or double
    368 quotes is considered a single field (word). This was briefly mentioned
    369 in the Expansion section, but I skipped it. To illustrate what I
    370 mean, let's write a short script and run it first with some words
    371 as arguments and then with the same words enclosed in quotes:
    372 
    373 ```
    374 $ echo 'echo $#' > count_args.sh
    375 $ count_args.sh how many words are there
    376 5
    377 $ count_args.sh "how many words are there"
    378 1
    379 ```
    380 
    381 ## Until next time
    382 
    383 This was a very long post, but it made sense to keep all the grammar
    384 rules together. To finish this manual page we are going to need
    385 another long post, or two shorter ones.
    386 
    387 See you next time!
    388 
    389 *Next in the series: [sh(1) - part 2: commands and builtins](../2022-09-20-sh-2)*