Vérifier automatiquement chaque jour le contenu d’une page web
1. WebDailyDiff.sh
WebDailyDiff.sh
#! /bin/bash
# Name : WebDailyDiff.sh
# Author : Simon Descarpentries - simon /\ acoeuro [] com
# Date : 2013-11-26, 2018-12-12, 2021-09-07
# Licence : GPLv3
# Usage : WebDailyDiff.sh title URL dest@monitorer (1)
cd ~/.sbin
BASELINE='Baseline'
CURDATE=`date +%F_%X`
HTML='.html'
TXT='.txt'
wget -U Mozilla --quiet "$2" -O "$1${CURDATE}${HTML}"
html2text "$1${CURDATE}${HTML}" > "$1${CURDATE}${TXT}"
CURDIFF=`diff -d "$1${BASELINE}${TXT}" "$1${CURDATE}${TXT}" | grep -e '^>' | grep -v -E "$4"`(2)
if [ "$CURDIFF" != '' ]; then
echo "$CURDIFF" | mail -E -s "[$1] Diff on $2" $3
cp "$1${CURDATE}${TXT}" "$1${BASELINE}${TXT}"
fi
rm "$1${CURDATE}${HTML}"
rm "$1${CURDATE}${TXT}" (3)
1 | Here are example of lines to add to your crontab -e :1 1 * * * /home/$user/WebDailyDiff.sh $title http://a.com/ monitorer@dom.tld "exclude|pattern" 3 7 * * * ~/.sbin/WebDailyDiff.sh "Alerte_FDN" "http://fdn.ldn-fai.net/" siltaar@XXX.fr |
2 | The last grep of the line avoids some lines (that are changing everydays for instance) following a given RegEx pattern (like : "Local time|UTC time|Load Average" ) |
3 | Comment this line and run the script to create your baseline files at first place |
2. urlwatch
In the urlwatch
package of Debian based distributions you can find the
urlwatch
command which fulfills the same task.