feed.md (11153B)
1 # My minimalistic RSS feed setup 2 3 A couple of years ago I started using 4 [RSS](https://en.wikipedia.org/wiki/Rss) 5 (or [atom](https://en.wikipedia.org/wiki/Atom_(standard))) 6 feeds to stay up to date with websites and blogs I wanted to read. 7 This method is more convenient than what I used before (i.e. open 8 Firefox and open each website I want to follow in a new tab, one 9 by one), but unfortunately not every website provides an RSS feed 10 these days. 11 12 At first I used [newsboat](https://newsboat.org), but I soon started 13 disliking the curses interface - see also my rant on curses at the 14 end of [this other blog post](../2022-12-24-ed). Then I discovered 15 `sfeed`. 16 17 ## sfeed 18 19 [`sfeed`](https://codemadness.org/sfeed-simple-feed-parser.html) 20 is an extremely minimalistic RSS and atom reader: it reads 21 the xml content of feed file from standard input and it outputs one line per 22 feed item, with tab-separated timestamps, title, link and so on. This tool 23 comes bundled with other commands that can be combined with it, such as 24 `sfeed_plain`, which converts the output of sfeed into something 25 more readable: 26 27 ``` 28 $ curl -L https://sebastiano.tronto.net/blog/feed.xml | sfeed | sfeed_plain 29 2023-06-16 02:00 UNIX text filters, part 0 of 3: regular expressions https://sebastiano.tronto.net/blog/2023-06-16-regex 30 2023-05-05 02:00 I had to debug C code on a smartphone https://sebastiano.tronto.net/blog/2023-05-05-debug-smartphone 31 2023-04-10 02:00 The big rewrite https://sebastiano.tronto.net/blog/2023-04-10-the-big-rewrite 32 2023-03-30 02:00 The man page reading club: dc(1) https://sebastiano.tronto.net/blog/2023-03-30-dc 33 2023-03-06 01:00 Resizing my website's pictures with ImageMagick and find(1) https://sebastiano.tronto.net/blog/2023-03-06-resize-pictures 34 ... 35 ``` 36 37 One can also write a configuration file with all the desired feeds 38 and fetch them with `sfeed_update`, or even use the `sfeed_curses` 39 UI. But the reasons I tried out `sfeed` in the first place is that 40 I *did not* want to use a curses UI, so I decided to stick with 41 `sfeed_plain`. 42 43 ## My wrapper script - old versions 44 45 In the project's homepage the following short script is presented to 46 demonstrate the flexibility of sfeed: 47 48 ``` 49 #!/bin/sh 50 url=$(sfeed_plain "$HOME/.sfeed/feeds/"* | dmenu -l 35 -i | \ 51 sed -n 's@^.* \([a-zA-Z]*://\)\(.*\)$@\1\2@p') 52 test -n "${url}" && $BROWSER "${url}" 53 ``` 54 55 The first line shows a list of feed items in 56 [dmenu](https://tools.suckless.org/dmenu) 57 to let the user select one, the second line opens the selected item 58 in a web browser. I was impressed by how simple and clever this 59 example was, and I decided to expand on it to build "my own" feed 60 reader UI. 61 62 In the first version I made, my feeds were separated in folders, 63 one per file, and one could select multiple feeds or even entire 64 folders via dmenu using 65 [dmenu-filepicker](https://git.tronto.net/scripts/file/dmenu-filepicker.html) 66 for file selection. 67 Once the session was terminated, all shown feeds were marked as 68 "read" by writing the timestamp of the last read item on a cache 69 file, and they were not shown again on successive calls. 70 71 This system worked fine for me, but at some point I grew tired of 72 feeds being marked as "read" automatically. I also disliked the 73 complexity of my own script. So I rewrote it from scratch, giving 74 up the idea of marking feeds as read. This second version can still 75 be found in the *old* folder of my 76 [scripts repo](https://git.tronto.net/scripts), but I may remove it 77 in the future. You will still be able to find it in the git history. 78 79 I have happily used this second version for more than a year, but 80 I had some minor issues with it. The main one was that, as I started 81 adding more and more websites to my feed list, fetching them took 82 longer and longer - up to 20-30 seconds; while the feed was loading, 83 I could not start doing other stuff, because later dmenu would have 84 grapped my keyboard while I was typing. Moreover, having a way to 85 filter out old feed items is kinda useful when you check your feed 86 relatively often. A few weeks ago I had enough and I decided to 87 rewrite my wrapper script once again. 88 89 ## My wrapper script - current version 90 91 In its current version, my `feed` scripts accepts four sub-commands: 92 `get` to update the feed, `menu` to prompt a dmenu selection, `clear` 93 to remove the old items and `show` to list all the new items. 94 Since `clear` is a separate action, I do not have the problem I 95 used to have with my first version, i.e. that feeds are automatically 96 marked as read even if I sometimes do not want them to be. 97 98 Let's walk through my last iteration on this script - you can find 99 it in my scripts repository, but I'll include it at the end of this 100 section too. 101 102 At first I define some variables (mostly filenames), so that I can 103 easily adapt the script if one day I want to move stuff around: 104 105 ``` 106 dir=$HOME/box/sfeed 107 feeddir=$dir/urls 108 destdir=$dir/new 109 olddir=$dir/old 110 readdir=$dir/last 111 menu="dmenu -l 20 -i" 112 urlopener=open-url 113 ``` 114 115 Here `open-url` is another one of my utility scripts. 116 117 To update the feed, I loop over the files in my feed folder. Each 118 file contains a single line with the feed's url, and the name of 119 the file is the name / title of the website. The results of `sfeed` 120 are piped into `sfeed_plain` and then saved to a file, and the most 121 recent time stamp for each feed is updated. 122 123 ``` 124 getnew() { 125 for f in "$feeddir"/*; do 126 read -r url < "$f" 127 name=$(basename "$f") 128 d="$destdir/$name" 129 r="$readdir/$name" 130 131 [ -f "$r" ] && read -r lr < "$r" || lr=0 132 133 # Get new feed items 134 tmp=$(mktemp) 135 curl -s "$url" | sfeed | \ 136 awk -v lr="$lr" '$1 > lr {print $0}' | \ 137 tee "$tmp" | sfeed_plain >> "$d" 138 139 # Update last time stamp 140 awk -v lr="$lr" '$1 > lr {lr=$1} END {print lr}' <"$tmp" >"$r" 141 done 142 } 143 ``` 144 145 The next snippet is used to show the new feed items. 146 The `for` loop could be replaced by a simple 147 `cat "$destdir"/*`, but I also want to prepend each line with 148 the name of the website. 149 150 ``` 151 show() { 152 for f in "$destdir"/*; do 153 ff=$(basename "$f") 154 if [ -s "$f" ]; then 155 while read -r line; do 156 printf '%20s %s\n' "$ff" "$line" 157 done < "$f" 158 fi 159 done 160 } 161 ``` 162 163 Finally, the following one-liner can be used to prompt the user to 164 select and open the desired items in a browser using dmenu: 165 166 ``` 167 selectmenu() { 168 $menu | awk '{print $NF}' | xargs $urlopener 169 } 170 ``` 171 172 The "clear" action is a straightfortward file management routine, 173 and the rest of the script is just shell boilerplate code to parse 174 the command line options and sub-commands. Putting it all together, 175 the script looks like this: 176 177 ``` 178 #!/bin/sh 179 180 # RSS feed manager 181 182 # Requires: sfeed, sfeed_plain (get), dmenu, open-url (menu) 183 184 # Usage: feed [-m menu] [get|menu|clear|show] 185 186 dir=$HOME/box/sfeed 187 feeddir=$dir/urls 188 destdir=$dir/new 189 olddir=$dir/old 190 readdir=$dir/last 191 menu="dmenu -l 20 -i" 192 urlopener=open-url 193 194 usage() { 195 echo "Usage: feed [get|menu|clear|show]" 196 } 197 198 getnew() { 199 for f in "$feeddir"/*; do 200 read -r url < "$f" 201 name=$(basename "$f") 202 d="$destdir/$name" 203 r="$readdir/$name" 204 205 [ -f "$r" ] && read -r lr < "$r" || lr=0 206 207 # Get new feed items 208 tmp=$(mktemp) 209 curl -s "$url" | sfeed | \ 210 awk -v lr="$lr" '$1 > lr {print $0}' | \ 211 tee "$tmp" | sfeed_plain >> "$d" 212 213 # Update last time stamp 214 awk -v lr="$lr" '$1 > lr {lr=$1} END {print lr}' <"$tmp" >"$r" 215 done 216 } 217 218 show() { 219 for f in "$destdir"/*; do 220 ff=$(basename "$f") 221 if [ -s "$f" ]; then 222 while read -r line; do 223 printf '%20s %s\n' "$ff" "$line" 224 done < "$f" 225 fi 226 done 227 } 228 229 selectmenu() { 230 $menu | awk '{print $NF}' | xargs $urlopener 231 } 232 233 while getopts "m:" opt; do 234 case "$opt" in 235 m) 236 menu="$OPTARG" 237 ;; 238 *) 239 usage 240 exit 1 241 ;; 242 esac 243 done 244 245 shift $((OPTIND - 1)) 246 247 if [ -z "$1" ]; then 248 usage 249 exit 1 250 fi 251 252 case "$1" in 253 get) 254 getnew 255 countnew=$(cat "$destdir"/* | wc -l) 256 echo "$countnew new feed items" 257 ;; 258 menu) 259 show | selectmenu 260 ;; 261 clear) 262 d="$olddir/$(date +'%Y-%m-%d-%H-%M-%S')" 263 mkdir "$d" 264 mv "$destdir"/* "$d/" 265 ;; 266 show) 267 show 268 ;; 269 *) 270 usage 271 exit 1 272 ;; 273 esac 274 ``` 275 276 I personally like this approach of taking a simple program that 277 only uses standard output and standard input and wrapping it around 278 a shell script to have it do exactly what I want. The bulk of the 279 work is done the "black box" program, and the shell scripts glues 280 it together with the "configuration" files (in this case, my feed 281 folder) and presents the results to me, interactively (e.g. via 282 dmenu) or otherwise. 283 284 At this point my feed-comsumption workflow would be something like 285 this: first I `feed get`, then I do other stuff while the feed loads 286 and later, after a couple of minutes or so, I run a `feed show` or 287 `feed menu`. This is still not ideal, because whenever I want to 288 check my feeds I still have to wait for them to be downloaded. The 289 only way to go around it would be to have `feed get` run automatically 290 when I am not thinking about it... 291 292 ## Setting up a cron job 293 294 My personal laptop is not always connected to the internet, and in 295 general I do not like having too many network-related jobs running 296 in the background. But I do have a machine that is always connected 297 to the internet: the VM instance hosting this website. 298 299 Since my new setup saves my feed updates to local files, I can have 300 a [cron job](https://en.wikipedia.org/wiki/Cron_job) fetch the new 301 items and update files in a folder sync'd via 302 [syncthing](https://syncthing.net) (yes, I do have that *one* network 303 service constantly running in the background...). This setup is 304 similar to the one I use to [fetch my email](../2022-10-19-email-setup). 305 306 I rarely use cron, and I am always a little intimitaded by its 307 syntax. But in the end to have `feed get` run every hour I just 308 needed to add the following two lines via `crontab -e`: 309 310 ``` 311 MAILTO="" 312 0 * * * * feed get 313 ``` 314 315 This is my definitive new setup, and I like it. It also has the 316 advantage that I only need to install `sfeed` on my server and not 317 locally, though I prefer to still keep it around. 318 319 So far I have found one little caveat: if my feed gets updated after 320 I read it and before I run a `feed clear`, some items may be deleted 321 before I see them. This is easilly worked around by running a quick 322 `feed show` before I clear the feeds up, but it is still worth 323 keeping in mind. 324 325 ## Conclusions 326 327 This is a summary of my last script-crafting adventure. As I was 328 writing this post I realized I could probably use `sfeed_update` 329 to simplify the script a bit, since I do not separate feeds into 330 folders anymore. I have also found out that `sfeed_mbox` was created 331 (at least I *think* it was not there the last time I checked) and I 332 could use it to browse my feed with a mail client - see also 333 [this video tutorial](https://josephchoe.com/rss-terminal) for a demo. 334 335 With all of this, did I solve my problem in the best possible way? 336 Definitely not. But does it work for me? Absolutely! Did I learn 337 something new while doing this? Kind of, but mostly I have just 338 excercised skills that I already had. 339 340 All in all, it was a fun exercise.