Added blog post - sebastiano.tronto.net - Source files and build scripts for my personal website

commit e3a075239cdf98e42435106d111b187d056c4150
parent a8a05450f8056d8bbb7de8376118235b2578021a
Author: Sebastiano Tronto <sebastiano@tronto.net>
Date:   Sun, 14 Aug 2022 08:54:15 +0200

Added blog post

Diffstat:
A src/blog/2022-08-14-website/website.md  | 286 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M src/blog/blog.md  | 1 +
M src/blog/feed.xml  | 7 +++++++

3 files changed, 294 insertions(+), 0 deletions(-)
diff --git a/src/blog/2022-08-14-website/website.md b/src/blog/2022-08-14-website/website.md
@@ -0,0 +1,286 @@
+# How I update my website
+
+When I created my website, I decided that I wanted to understand 100% of what
+I did. In practice, this means that I did not want to use any framework, not
+even a simple one. Someone might call this *minimalism*, someone else might
+call it *being a control freak*. I think I like the second one more.
+
+The principles I follow, which are actually an afterthought after a few months
+of trial-and-error, are roughly these:
+
+* **Minimalism**, a.k.a **control-freakism**: As mentioned above, I want to
+  only use tools that I understand completely,
+  at the cost of writing code by hand sometimes.
+* **Ease of use**: I want to write the bulk of my pages in
+  [Markdown](https://en.wikipedia.org/wiki/Markdown), which is easier to
+  read and edit than html code.
+* **Reproducibility**: Ideally, building new html pages (from a
+  newly-written Markdown document) and uploading them to my server
+  should be done by issuing simple commands such as `make` and `make deploy`.
+
+In more practical terms, this boils down to writing a couple of CSS and html
+files, writing a script that adds a header and a footer to the output of
+[lowdown](https://kristaps.bsd.lv/lowdown/), and using
+[rsync](https://en.wikipedia.org/wiki/Rsync) to deploy the files to my server.
+All of this is available on
+[my git page](https://git.tronto.net/sebastiano.tronto.net/), but I won't
+explain every detail of these build scripts here. In particular, my script
+also builds a [gemini](https://sebastiano.tronto.net/blog/2022-06-04-gemini/)
+version of my website, which I won't discuss here.
+
+## Prerequisites
+
+My website is hosted on an OpenBSD virtual machine in a remote server. I can
+access this virtual machine via
+[SSH](https://en.wikipedia.org/wiki/Secure_Shell), which gives me root
+access to the operating system. I use rsync for uploading my files to this
+server. As long as you have an http server that can serve static html
+files and a way to upload your files to this server, you can easily manage
+your website in a similar way. I am not going to explain how to do all of this
+here; if you need help I suggest you have a look at
+[Roman Zolotarev's website](https://rgz.ee).
+
+As for my local machine, the one I am actually using to write these blog
+posts, I just use a Markdown translator, a text editor, rsync
+and other basic UNIX tools.
+
+## Directory structure
+
+In my working directory
+there are two main folders with the exact same sub-folder structure: the first
+one is `src`, which contains all the markdown files I write, plus any other
+file I need for my pages, such as pictures; the other is `http`, which contains
+the html pages exactly as they are on my website. The `http` folder is
+generated from the src folder when I run the `build.sh` script - more
+on this later! (You won't find the `http` folder on my git page)
+
+There is one small caveat here:
+I like my urls to be clean and I want them stripped of the `.html`
+extension. To do this, I set up my `src` folder so that every
+subfolder contains at most one `.md` (Markdown) file, which is converted to
+an `index.html` file in the corresponding subfolder of `http`.
+In practice, if there are the following files:
+
+```
+├── src
+│   ├── git
+│   │   └── git-or-any-other-name.md
+```
+
+The following is generated by `build.sh`:
+
+```
+├── http
+│   ├── git
+│   │   └── index.html
+```
+
+So that when one accesses
+[sebastiano.tronto.net/git](https://sebastiano.tronto.net/git/)
+the web server automatically serves the `index.html` file.
+Without this trick the correct URL would have been
+`sebastiano.tronto.net/git.html` or something, which I don't like.
+
+The main working directory also contains the 
+[`top.html`](https://git.tronto.net/sebastiano.tronto.net/file/top%2Ehtml%2Ehtml)
+and
+[`bottom.html`](https://git.tronto.net/sebastiano.tronto.net/file/bottom%2Ehtml%2Ehtml)
+files. These files are not uploaded directly to my server - they are not
+even well-formed html files! - but they are used to build all other
+html pages.
+
+## Building the pages
+
+The basic idea behind `build.sh` is very simple. If we want to create the
+html file corresponding to, say, `src/page/file.md`, we just need to create
+a file called `http/page/index.html` and copy there the contents of
+`top.html`, followed by the output of `lowdown src/page/file.md`, followed
+by the contents of `bottom.html`. In shell language:
+
+```
+cat top.html > http/page/index.html
+lowdown src/page/file.md >> http/page/index.html
+cat bottom.html >> http/page/index.html
+```
+
+Of course you don't have to use lowdown as I do: any Markdown translator
+works. Indeed, if you want to use a different markup language to write
+your pages you just need to replace the second line in the code above.
+
+We would also like to make a small change to the header while
+we build this page, namely we want the title of the page (the one
+displayed in your browser's top bar or tab) to match the actual
+title of the page or blog post. To do this easily I have left a
+placeholder `TITLE` in `top.html`, that we just need to replace with the
+actual title of the page. To find out what the title is we just need to
+get the text following
+the first `# ` (hash space) in the Markdown file - that is, the first
+"big title" of the page. We can do this
+thanks to the classic UNIX tools sed, grep and head:
+
+```
+sed "s/TITLE/$(grep '^\# ' < src/page/file.md \
+	| head -n 1 | sed 's/^\# //')/" < top.html > http/page/index.html
+lowdown src/page/file.md >> http/page/index.html
+cat bottom.html >> http/page/index.html
+```
+
+The first two lines might be a bit complicated to work out if you
+are not familiar with these commands. Let's break them down!
+
+The main command is `sed "s/TITLE/...stuff.../"` which replaces the
+first occurrence of the string `TITLE` with that complicated stuff.
+The end of the second line tells sed to use `top.html` as input and
+write the output to `http/page/index.html`. The complicated stuff
+that is going to replace `TITLE` is enclosed in `$()`, which means
+that it is the result of a command. This command is itself a chain
+of commands: first we find all lines that start with `# ` with
+`grep '^\# '` on the correct file (`< src/page/file.html`), then
+we take the first of these lines (`head -n 1`) and finally we
+trim the leading `# ` with `sed`. As you can see, the UNIX shell
+is quite a powerful tool!
+
+Now we just need to do all of this recursively on the `src` folder.
+The final result looks something like this:
+
+```
+#!/bin/sh
+
+recursivebuild() {
+	local destdir=$(echo $1 | sed 's|^src|http|')
+	mkdir -p "$destdir"
+	for file in $(ls $1); do
+		if [ -d "$1/$file" ]; then
+
+			# Recursively build subdirectories
+			mkdir -p "$destdir/$file"
+			recursivebuild "$1/$file"
+		else
+			extension=$(echo "$file" | sed 's/.*\.//')
+			if [ "$extension" = "md" ]; then
+
+				# Process Markdown files, as above
+				sed "s/TITLE/$(grep '^\# ' < "$1/$file" \
+					| head -n 1 \
+					| sed 's/^\# //')/" < top.html \
+					> "$destdir/index.html"
+				lowdown "$1/$file" >> "$destdir/index.html"
+				cat bottom.html >> "$destdir/index.html"
+			else
+				
+				# Copy all other files as they are
+				cp "$1/$file" "$destdir/$file"
+			fi
+		fi
+	done
+}
+
+recursivebuild src
+```
+
+## Extras: the blog index and RSS feed
+
+The [blog index page](https://sebastiano.tronto.net/blog/) is also
+generated by the build script, but the corresponding Markdown file
+in `src` is not created by hand. Instead, this file is generated by
+scanning the `src/blog` subfolder. For each post, the date is deduced
+from the name of the folder containing the markdown file, which always
+starts with the date itself in the `yyyy-mm-dd` format.
+
+While we scan the blog directory to create a list of posts, we might as
+well make an [RSS feed](https://en.wikipedia.org/wiki/RSS) file for the
+blog. This is a file used by feed reader applications to check if there
+is any new post. The format is quite simple: check out
+[mine](https://sebastiano.tronto.net/blog/feed.xml).
+
+The code to accomplish this looks something like this:
+
+```
+makeblog() {
+	bf=src/blog/blog.md    # Blog index file
+	ff=src/blog/feed.xml   # RSS feed file
+
+	printf "# Blog\n\n[RSS Feed](feed.xml)\n\n" > $bf
+	cp feed-top.xml $ff
+
+	for i in $(ls src/blog | sort -r); do
+		if [ -d src/blog/$i ]; then
+
+			# Get basic data of the post (date, title)
+			f="src/blog/$i/*.md"
+			d=$(echo $i | grep -oE '^[0-9]{4}-[0-9]{2}-[0-9]{2}')
+			t=$(head -n 1 $f | sed 's/# //')
+
+			# Add blog post to the list
+			echo "* $d [$t]($i)" >> $bf
+
+			# Create RSS feed item
+			echo "<item>" >> $ff
+			echo "<title>$t</title>" >> $ff
+			echo "<link>https://sebastiano.tronto.net/blog/$i</link>" >> $ff
+			echo "<description>$t</description>" >> $ff
+			echo "<pubDate>$d</pubDate>" >> $ff
+			echo "</item>" >> $ff
+			echo "" >> $ff
+		fi
+	done
+
+	# Close the RSS feed file
+	echo "" >> $ff
+	echo "</channel>" >> $ff
+	echo "</rss>" >> $ff
+}
+```
+
+## Deploying with make
+
+Updating or adding a page is now very easy: I just need to edit the
+corresponding Markdown file, run `./build.sh` to build the new html
+pages and run
+
+```
+rsync -rv --delete --rsync-path=openrsync http/ \
+	tronto.net:/var/www/htdocs/sebastiano.tronto.net
+```
+
+to sync the `http` directory with my server. I need to use the
+`--rsync-path` option because the `rsync` binary has a different 
+name on my local system (Linux) than on my server (OpenBSD). But apart from
+this the command is straightforward.
+
+Of course I don't want to type this lenghty command every time. It is very
+convenient in this case to write a short Makefile:
+
+```
+all: clean
+	./build.sh
+
+clean:
+	rm -r http
+	mkdir -p http
+
+deploy:
+	rsync -rv --delete --rsync-path=openrsync http/ \
+		tronto.net:/var/www/htdocs/sebastiano.tronto.net
+
+.PHONY: all clean deploy
+```
+
+So that I just need to run `make` to build and `make deploy` to upload the
+new files. Watch out: if you want to reproduce this on your system, make
+sure that the user on your server has sufficient permissions to run
+that rsync command - in particular you need write permission on the
+`/var/www/htdocs` folder.
+
+If you are not familiar with the [make(1)](https://man.openbsd.org/make)
+syntax, this step is completely optional and you can simply type
+the full commands every time, or make another small script called
+`deploy.sh` and run that instead.
+
+## Follow-up?
+
+I am sure my build scripts will keep evolving over time, so at some point I 
+might write a new post about the same topic. I am also probably going to write
+something about how I generate my [git page](https://git.tronto.net/) using
+[stagit](https://codemadness.org/stagit.html), if anything just to document
+my post-receive hooks. So, if you liked this post, stay tuned for more!
diff --git a/src/blog/blog.md b/src/blog/blog.md
@@ -2,6 +2,7 @@
 
 [RSS Feed](feed.xml)
 
+* 2022-08-14 [How I update my website](2022-08-14-website)
 * 2022-07-07 [The man page reading club: shutdown(8)](2022-07-07-shutdown)
 * 2022-06-12 [The UNIX shell as an IDE: look stuff up with sed](2022-06-12-shell-ide-sed)
 * 2022-06-08 [The man page reading club: more(1)](2022-06-08-more)
diff --git a/src/blog/feed.xml b/src/blog/feed.xml
@@ -9,6 +9,13 @@ Thoughts about software, computers and whatever I feel like sharing
 </description>
 
 <item>
+<title>How I update my website</title>
+<link>https://sebastiano.tronto.net/blog/2022-08-14-website</link>
+<description>How I update my website</description>
+<pubDate>2022-08-14</pubDate>
+</item>
+
+<item>
 <title>The man page reading club: shutdown(8)</title>
 <link>https://sebastiano.tronto.net/blog/2022-07-07-shutdown</link>
 <description>The man page reading club: shutdown(8)</description>

	sebastiano.tronto.net Source files and build scripts for my personal website
	git clone https://git.tronto.net/sebastiano.tronto.net
	Download \| Log \| Files \| Refs \| README

A	src/blog/2022-08-14-website/website.md	\|	286	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	src/blog/blog.md	\|	1	+
M	src/blog/feed.xml	\|	7	+++++++