Grimoire-
Command
.es

GNU+Linux command memo

Web Daily Diff, sent by email

Vérifier automatiquement chaque jour le contenu d’une page web

1. WebDailyDiff.sh

WebDailyDiff.sh

#! /bin/bash
# Name      : WebDailyDiff.sh
# Author    : Simon Descarpentries - simon /\ acoeuro [] com
# Date      : 2013-11-26, 2018-12-12, 2021-09-07
# Licence   : GPLv3
# Usage     : WebDailyDiff.sh title URL dest@monitorer (1)

cd ~/.sbin

BASELINE='Baseline'
CURDATE=`date +%F_%X`
HTML='.html'
TXT='.txt'

wget -U Mozilla --quiet "$2" -O "$1${CURDATE}${HTML}"
html2text "$1${CURDATE}${HTML}" > "$1${CURDATE}${TXT}"

CURDIFF=`diff -d "$1${BASELINE}${TXT}" "$1${CURDATE}${TXT}" | grep -e '^>' | grep -v -E "$4"`(2)

if [ "$CURDIFF" != '' ]; then
    echo "$CURDIFF" | mail -E -s "[$1] Diff on $2" $3
    cp "$1${CURDATE}${TXT}" "$1${BASELINE}${TXT}"
fi

rm "$1${CURDATE}${HTML}"
rm "$1${CURDATE}${TXT}" (3)
1 Here are example of lines to add to your crontab -e :
1 1 * * * /home/$user/WebDailyDiff.sh $title http://a.com/ monitorer@dom.tld "exclude|pattern"
3 7 * * * ~/.sbin/WebDailyDiff.sh "Alerte_FDN" "http://fdn.ldn-fai.net/" siltaar@XXX.fr
2 The last grep of the line avoids some lines (that are changing everydays for instance) following a given RegEx pattern (like : "Local time|UTC time|Load Average")
3 Comment this line and run the script to create your baseline files at first place

2. urlwatch

In the urlwatch package of Debian based distributions you can find the urlwatch command which fulfills the same task.