.. _backup-and-monitoring: Backups and Monitoring ====================== System Backup Strategies ------------------------ The most important asset of a live Registration Server is the content of its MySQL database. The Registration Server's MySQL databases that need to be backed up are named ``td2reg`` and (optionally) ``td2apilog``. They use MySQL's InnoDB storage engine to provide transaction support, fast recovery and consistency. The backup schedule depends on the amount of users, their activity and your recovery point objective. We recommend to run a backup at least once a day. The backups should be safely stored on another system. Ideally, the time and frequency of the Registration Server backup should be synchronized with the backup schedule used on the associated Host Server(s) |---| this ensures that the information about Users and their Space Depots is consistent across these servers. In a virtualized environment, the usage of VM snapshots is highly recommended, as these provide atomic and instant full-system copies across multiple systems that can be backed up offline. The MySQL backup can be performed using any established MySQL backup method, e.g. running a ``mysqldump`` via a cron job, or using more sophisticated tools like Percona XtraBackup or Oracle's MySQL Enterprise Backup. Other commercial backup solutions usually offer MySQL-specific plugins or extensions as well. An example MySQL backup job using ``mysqldump`` could look like as follows. The SQL dump is piped through ``gzip`` for compression before it is written to a directory ``/backup``, using a time stamp for the file name:: [root@regserver ~]# mysqldump -u root -p --single-transaction \ --databases td2reg td2apilog \ | gzip > /backup/td-regserver-mysql-$(date +%Y-%m-%d_%H.%M).sql.gz See the MySQL documentation at https://dev.mysql.com/doc/refman/5.1/en/backup-and-recovery.html for more details and hints on how to define a MySQL backup strategy. If the I/O overhead introduced by running the backup job on the production database is a concern, we recommend setting up a MySQL replication slave on another host and use this one to perform the backup. This second MySQL instance can also function as a hot standby server for high-availability purposes. More details about MySQL replication and high availability can be found in the MySQL reference manual at https://dev.mysql.com/doc/refman/5.1/en/replication.html and https://dev.mysql.com/doc/refman/5.1/en/ha-overview.html. In addition to the MySQL databases, we recommend to create backup copies of the Server's configuration files and the email templates located in ``$PRIMEBASEHOME/setup/scripts/template/``. Please refer to the *TeamDrive Registration Server Installation Guide* for details on the relevant configuration files. These files should be backed up at least every time you changed them. These backups can be performed using any file-based backup method, e.g. using ``tar``, ``rsync`` or more sophisticated backup tools, e.g. Amanda or Bacula. System Monitoring ----------------- It's highly recommended to set up some kind of system monitoring, to receive notifications in case of any critical conditions or failures. Since the TeamDrive Registration Server is based on standard Linux components like the Apache http Server and the MySQL database, almost any system monitoring solution can be used to monitor the health of these services. We recommend using Nagios or a derivative like Icinga or Centreon. Other well-established monitoring systems like Zabbix or Munin will also work. Most of these offer standard checks to monitor CPU usage, memory utilization, disk space and other critical server parameters. In addition to these basic system parameters, the existence and operational status of the following services/processes should be monitored: - The MySQL Server (system process ``mysqld``) is up and running and answering to SQL queries - The Apache http Server (``httpd``) is up and running and answering to http requests. This can be verified by accessing the following URL: https://regserver.yourdomain.com/pbas/td2as/reg/ping.xml?tdns=$true (remove the ``?tdns=true`` part, if your Registration Server is not connected to the TeamDrive Name Service TDNS) - The ``teamdrive`` auto task is running (process name ``pbac``) - The mail service (e.g. a local ``postfix`` instance) is up and running and mails are sent out correctly