sebastiano.tronto.net

Source files and build scripts for my personal website
git clone https://git.tronto.net/sebastiano.tronto.net
Download | Log | Files | Refs | README

feed.md (11153B)


      1 # My minimalistic RSS feed setup
      2 
      3 A couple of years ago I started using
      4 [RSS](https://en.wikipedia.org/wiki/Rss)
      5 (or [atom](https://en.wikipedia.org/wiki/Atom_(standard)))
      6 feeds to stay up to date with websites and blogs I wanted to read.
      7 This method is more convenient than what I used before (i.e. open
      8 Firefox and open each website I want to follow in a new tab, one
      9 by one), but unfortunately not every website provides an RSS feed
     10 these days.
     11 
     12 At first I used [newsboat](https://newsboat.org), but I soon started
     13 disliking the curses interface - see also my rant on curses at the
     14 end of [this other blog post](../2022-12-24-ed). Then I discovered
     15 `sfeed`.
     16 
     17 ## sfeed
     18 
     19 [`sfeed`](https://codemadness.org/sfeed-simple-feed-parser.html)
     20 is an extremely minimalistic RSS and atom reader: it reads
     21 the xml content of feed file from standard input and it outputs one line per
     22 feed item, with tab-separated timestamps, title, link and so on. This tool
     23 comes bundled with other commands that can be combined with it, such as
     24 `sfeed_plain`, which converts the output of sfeed into something
     25 more readable:
     26 
     27 ```
     28 $ curl -L https://sebastiano.tronto.net/blog/feed.xml | sfeed | sfeed_plain
     29   2023-06-16 02:00  UNIX text filters, part 0 of 3: regular expressions                    https://sebastiano.tronto.net/blog/2023-06-16-regex
     30   2023-05-05 02:00  I had to debug C code on a smartphone                                  https://sebastiano.tronto.net/blog/2023-05-05-debug-smartphone
     31   2023-04-10 02:00  The big rewrite                                                        https://sebastiano.tronto.net/blog/2023-04-10-the-big-rewrite
     32   2023-03-30 02:00  The man page reading club: dc(1)                                       https://sebastiano.tronto.net/blog/2023-03-30-dc
     33   2023-03-06 01:00  Resizing my website's pictures with ImageMagick and find(1)            https://sebastiano.tronto.net/blog/2023-03-06-resize-pictures
     34 ...
     35 ```
     36 
     37 One can also write a configuration file with all the desired feeds
     38 and fetch them with `sfeed_update`, or even use the `sfeed_curses`
     39 UI. But the reasons I tried out `sfeed` in the first place is that
     40 I *did not* want to use a curses UI, so I decided to stick with
     41 `sfeed_plain`.
     42 
     43 ## My wrapper script - old versions
     44 
     45 In the project's homepage the following short script is presented to
     46 demonstrate the flexibility of sfeed:
     47 
     48 ```
     49 #!/bin/sh
     50 url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \
     51 	sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p')
     52 test -n "${url}" && $BROWSER "${url}"
     53 ```
     54 
     55 The first line shows a list of feed items in
     56 [dmenu](https://tools.suckless.org/dmenu)
     57 to let the user select one, the second line opens the selected item
     58 in a web browser. I was impressed by how simple and clever this
     59 example was, and I decided to expand on it to build "my own" feed
     60 reader UI.
     61 
     62 In the first version I made, my feeds were separated in folders,
     63 one per file, and one could select multiple feeds or even entire
     64 folders via dmenu using
     65 [dmenu-filepicker](https://git.tronto.net/scripts/file/dmenu-filepicker.html)
     66 for file selection.
     67 Once the session was terminated, all shown feeds were marked as
     68 "read" by writing the timestamp of the last read item on a cache
     69 file, and they were not shown again on successive calls.
     70 
     71 This system worked fine for me, but at some point I grew tired of
     72 feeds being marked as "read" automatically. I also disliked the
     73 complexity of my own script.  So I rewrote it from scratch, giving
     74 up the idea of marking feeds as read. This second version can still
     75 be found in the *old* folder of my
     76 [scripts repo](https://git.tronto.net/scripts), but I may remove it
     77 in the future. You will still be able to find it in the git history.
     78 
     79 I have happily used this second version for more than a year, but
     80 I had some minor issues with it. The main one was that, as I started
     81 adding more and more websites to my feed list, fetching them took
     82 longer and longer - up to 20-30 seconds; while the feed was loading,
     83 I could not start doing other stuff, because later dmenu would have
     84 grapped my keyboard while I was typing. Moreover, having a way to
     85 filter out old feed items is kinda useful when you check your feed
     86 relatively often.  A few weeks ago I had enough and I decided to
     87 rewrite my wrapper script once again.
     88 
     89 ## My wrapper script - current version
     90 
     91 In its current version, my `feed` scripts accepts four sub-commands:
     92 `get` to update the feed, `menu` to prompt a dmenu selection, `clear`
     93 to remove the old items and `show` to list all the new items.
     94 Since `clear` is a separate action, I do not have the problem I
     95 used to have with my first version, i.e. that feeds are automatically
     96 marked as read even if I sometimes do not want them to be.
     97 
     98 Let's walk through my last iteration on this script - you can find
     99 it in my scripts repository, but I'll include it at the end of this
    100 section too.
    101 
    102 At first I define some variables (mostly filenames), so that I can
    103 easily adapt the script if one day I want to move stuff around:
    104 
    105 ```
    106 dir=$HOME/box/sfeed
    107 feeddir=$dir/urls
    108 destdir=$dir/new
    109 olddir=$dir/old
    110 readdir=$dir/last
    111 menu="dmenu -l 20 -i"
    112 urlopener=open-url
    113 ```
    114 
    115 Here `open-url` is another one of my utility scripts.
    116 
    117 To update the feed, I loop over the files in my feed folder.  Each
    118 file contains a single line with the feed's url, and the name of
    119 the file is the name / title of the website. The results of `sfeed`
    120 are piped into `sfeed_plain` and then saved to a file, and the most
    121 recent time stamp for each feed is updated.
    122 
    123 ```
    124 getnew() {
    125 	for f in "$feeddir"/*; do
    126 		read -r url < "$f"
    127 		name=$(basename "$f")
    128 		d="$destdir/$name"
    129 		r="$readdir/$name"
    130 
    131 		[ -f "$r" ] && read -r lr < "$r" || lr=0
    132 
    133 		# Get new feed items
    134 		tmp=$(mktemp)
    135 		curl -s "$url" | sfeed | \
    136 		awk -v lr="$lr" '$1 > lr {print $0}' | \
    137 		tee "$tmp" | sfeed_plain >> "$d"
    138 
    139 		# Update last time stamp
    140 		awk -v lr="$lr" '$1 > lr {lr=$1} END {print lr}' <"$tmp" >"$r"
    141 	done
    142 }
    143 ```
    144 
    145 The next snippet is used to show the new feed items.
    146 The `for` loop could be replaced by a simple
    147 `cat "$destdir"/*`, but I also want to prepend each line with 
    148 the name of the website.
    149 
    150 ```
    151 show() {
    152 	for f in "$destdir"/*; do
    153 		ff=$(basename "$f")
    154 		if [ -s "$f" ]; then
    155 			while read -r line; do
    156 				printf '%20s    %s\n' "$ff" "$line"
    157 			done < "$f"
    158 		fi
    159 	done
    160 }
    161 ```
    162 
    163 Finally, the following one-liner can be used to prompt the user to
    164 select and open the desired items in a browser using dmenu:
    165 
    166 ```
    167 selectmenu() {
    168 	$menu | awk '{print $NF}' | xargs $urlopener
    169 }
    170 ```
    171 
    172 The "clear" action is a straightfortward file management routine,
    173 and the rest of the script is just shell boilerplate code to parse
    174 the command line options and sub-commands. Putting it all together,
    175 the script looks like this:
    176 
    177 ```
    178 #!/bin/sh
    179 
    180 # RSS feed manager
    181 
    182 # Requires: sfeed, sfeed_plain (get), dmenu, open-url (menu)
    183 
    184 # Usage: feed [-m menu] [get|menu|clear|show]
    185 
    186 dir=$HOME/box/sfeed
    187 feeddir=$dir/urls
    188 destdir=$dir/new
    189 olddir=$dir/old
    190 readdir=$dir/last
    191 menu="dmenu -l 20 -i"
    192 urlopener=open-url
    193 
    194 usage() {
    195 	echo "Usage: feed [get|menu|clear|show]"
    196 }
    197 
    198 getnew() {
    199 	for f in "$feeddir"/*; do
    200 		read -r url < "$f"
    201 		name=$(basename "$f")
    202 		d="$destdir/$name"
    203 		r="$readdir/$name"
    204 
    205 		[ -f "$r" ] && read -r lr < "$r" || lr=0
    206 
    207 		# Get new feed items
    208 		tmp=$(mktemp)
    209 		curl -s "$url" | sfeed | \
    210 		awk -v lr="$lr" '$1 > lr {print $0}' | \
    211 		tee "$tmp" | sfeed_plain >> "$d"
    212 
    213 		# Update last time stamp
    214 		awk -v lr="$lr" '$1 > lr {lr=$1} END {print lr}' <"$tmp" >"$r"
    215 	done
    216 }
    217 
    218 show() {
    219 	for f in "$destdir"/*; do
    220 		ff=$(basename "$f")
    221 		if [ -s "$f" ]; then
    222 			while read -r line; do
    223 				printf '%20s    %s\n' "$ff" "$line"
    224 			done < "$f"
    225 		fi
    226 	done
    227 }
    228 
    229 selectmenu() {
    230 	$menu | awk '{print $NF}' | xargs $urlopener
    231 }
    232 
    233 while getopts "m:" opt; do
    234 	case "$opt" in
    235 		m)
    236 			menu="$OPTARG"
    237 			;;
    238 		*)
    239 			usage
    240 			exit 1
    241 			;;
    242 	esac
    243 done
    244 
    245 shift $((OPTIND - 1))
    246 
    247 if [ -z "$1" ]; then
    248 	usage
    249 	exit 1
    250 fi
    251 
    252 case "$1" in
    253 	get)
    254 		getnew
    255 		countnew=$(cat "$destdir"/* | wc -l)
    256 		echo "$countnew new feed items"
    257 		;;
    258 	menu)
    259 		show | selectmenu
    260 		;;
    261 	clear)
    262 		d="$olddir/$(date +'%Y-%m-%d-%H-%M-%S')"
    263 		mkdir "$d"
    264 		mv "$destdir"/* "$d/"
    265 		;;
    266 	show)
    267 		show
    268 		;;
    269 	*)
    270 		usage
    271 		exit 1
    272 		;;
    273 esac
    274 ```
    275 
    276 I personally like this approach of taking a simple program that
    277 only uses standard output and standard input and wrapping it around
    278 a shell script to have it do exactly what I want. The bulk of the
    279 work is done the "black box" program, and the shell scripts glues
    280 it together with the "configuration" files (in this case, my feed
    281 folder) and presents the results to me, interactively (e.g. via
    282 dmenu) or otherwise.
    283 
    284 At this point my feed-comsumption workflow would be something like
    285 this: first I `feed get`, then I do other stuff while the feed loads
    286 and later, after a couple of minutes or so, I run a `feed show` or
    287 `feed menu`.  This is still not ideal, because whenever I want to
    288 check my feeds I still have to wait for them to be downloaded.  The
    289 only way to go around it would be to have `feed get` run automatically
    290 when I am not thinking about it...
    291 
    292 ## Setting up a cron job
    293 
    294 My personal laptop is not always connected to the internet, and in
    295 general I do not like having too many network-related jobs running
    296 in the background.  But I do have a machine that is always connected
    297 to the internet: the VM instance hosting this website.
    298 
    299 Since my new setup saves my feed updates to local files, I can have
    300 a [cron job](https://en.wikipedia.org/wiki/Cron_job) fetch the new
    301 items and update files in a folder sync'd via
    302 [syncthing](https://syncthing.net) (yes, I do have that *one* network
    303 service constantly running in the background...). This setup is
    304 similar to the one I use to [fetch my email](../2022-10-19-email-setup).
    305 
    306 I rarely use cron, and I am always a little intimitaded by its
    307 syntax. But in the end to have `feed get` run every hour I just
    308 needed to add the following two lines via `crontab -e`:
    309 
    310 ```
    311 MAILTO=""
    312 0 * * * * feed get
    313 ```
    314 
    315 This is my definitive new setup, and I like it. It also has the
    316 advantage that I only need to install `sfeed` on my server and not
    317 locally, though I prefer to still keep it around.
    318 
    319 So far I have found one little caveat: if my feed gets updated after
    320 I read it and before I run a `feed clear`, some items may be deleted
    321 before I see them.  This is easilly worked around by running a quick
    322 `feed show` before I clear the feeds up, but it is still worth
    323 keeping in mind.
    324 
    325 ## Conclusions
    326 
    327 This is a summary of my last script-crafting adventure. As I was
    328 writing this post I realized I could probably use `sfeed_update`
    329 to simplify the script a bit, since I do not separate feeds into
    330 folders anymore. I have also found out that `sfeed_mbox` was created
    331 (at least I *think* it was not there the last time I checked) and I
    332 could use it to browse my feed with a mail client - see also
    333 [this video tutorial](https://josephchoe.com/rss-terminal) for a demo.
    334 
    335 With all of this, did I solve my problem in the best possible way?
    336 Definitely not. But does it work for me? Absolutely! Did I learn
    337 something new while doing this? Kind of, but mostly I have just
    338 excercised skills that I already had.
    339 
    340 All in all, it was a fun exercise.