Backups and Monitoring¶
Host Server Backup Considerations¶
The two most important assets of a TeamDrive Host Server are the storage volumes that host the actual TeamDrive Spaces as well as the MySQL database that stores the related meta data.
The backup schedule depends on the amount of users, their activity and your recovery point objective. We recommend to run a backup at least once a day. The backups should be safely stored on another system.
Ideally, the time and frequency of the Host Server backup should be synchronized with the backup schedule used on the associated Registration Server — this ensures that the information about Users and their Space Depots is consistent across these systems.
In a virtualized environment, the usage of VM snapshots is highly recommended, as these provide atomic and instant full-system copies across multiple instances that can be backed up offline.
The backup of the Host Server’s Space Volume(s) can be performed by any given file system backup tool.
When planning a backup of the volume containing the TeamDrive Spaces, keep in
mind that the last.log
files, located in each Space directory in the
directory protolog
are frequently updated by the TeamDrive Clients. New
space data and events are constantly appended to the file. When the log files
reach a certain size (currently set to 8MB, but this value is not fixed and
could change in later versions of TeamDrive), they get renamed and new
last.log
files will be created. This operation is initiated by the
Clients. The naming scheme is to rename last.log
to <number>.log
,
where <number>
is the next free number, starting from 0. Previously
renamed log files are not modified anymore, but must remain available to the
clients since these logs must be read when a Space is joined.
To create a consistent backup, the best approach is to perform a snapshot of
the entire Space Volume file system, preferably after shutting down the Apache
http Server beforehand. If you are using an incremental backup method like
rsync
, keep in mind that some Spaces may have been changed while the rsync
job is still running. For consistency, we suggest to perform a full rsync run
while the service is running (to sync the bulk of the changes), then briefly
change the volume’s status to Standby or shut down the Apache HTTP Server
and run rsync once more, to transfer the remaining changes that have occurred
in the meanwhile. Once the rsync job has finished, the Apache HTTP Server can
be started again.
The MySQL databases must also be backed up periodically, ideally at the same time the Space Volume(s) are being backed up. This ensures a consistent snapshot of the file system and the related meta data included in the MySQL database.
The Host Server’s MySQL databases that need to be backed up are named
pspace
and (optionally) hostapilog
. They use MySQL’s InnoDB storage
engine to provide transaction support, fast recovery and consistency.
Any of the ususal MySQL backup methods may be used, e.g. mysqldump
. The
size of the Host Server’s MySQL Databases is usually quite small, if API
logging is not enabled.
The MySQL backup can be performed using any established MySQL backup method,
e.g. running a mysqldump
via a cron job, or using more sophisticated tools
like Percona XtraBackup or Oracle’s MySQL Enterprise Backup. Other commercial
backup solutions usually offer MySQL-specific plugins or extensions as well.
An example MySQL backup job using mysqldump
could look like as follows.
The SQL dump is piped through gzip
for compression before it is written to
a directory /backup
, using a time stamp for the file name:
[root@regserver ~]# mysqldump -u root -p --single-transaction \
--databases pspace hostapilog \
| gzip > /backup/td-hostserver-mysql-$(date +%Y-%m-%d_%H.%M).sql.gz
See the MySQL documentation at https://dev.mysql.com/doc/refman/5.1/en/backup-and-recovery.html for more details and hints on how to define a MySQL backup strategy.
If the I/O overhead introduced by running the backup job on the production database is a concern, we recommend setting up a MySQL replication slave on another host and use this one to perform the backup. This second MySQL instance can also function as a hot standby server for high-availability purposes.
More details about MySQL replication and high availability can be found in the MySQL reference manual at https://dev.mysql.com/doc/refman/5.1/en/replication.html and https://dev.mysql.com/doc/refman/5.1/en/ha-overview.html.
In addition to the Space Volumes and MySQL databases, we recommend to create backup copies of the Server’s configuration files. Please refer to the TeamDrive Host Server Installation Guide for details on the relevant configuration files.
These files should be backed up at least every time you changed them. These
backups can be performed using any file-based backup method, e.g. using
tar
, rsync
or more sophisticated backup tools, e.g. Amanda or Bacula.
Restoring individual Spaces or Volumes¶
In case of corrupted or lost data of a single Space or complete Volume, is is possible to restore the Space or Volume data from a previously created backup.
An example scenario would be a Space that was entirely deleted by a user by accident, or the recovery of a file that was moved to the Space’s Trash Folder and the Trash was then emptied.
Note
Note that it is not possible to restore an individual file from a particular Space on the Host Server — due to the client-side encryption it’s impossible to determine the correct file on the server side. However it is possible to restore the entire state of a Space and all of its files to a previous version, which will allow the user to extract the missing file(s) on the client side.
An additional challenge is identifying the Space(s) you want to restore; by default, Space names are not stored on the Host Server and are only referenced by their ID. Take extra caution and double check you’re working on the correct Space.
The process of restoring a Space or Volume involves the following steps:
- Identify the ID of the Space or Volume you want to restore.
- Deactivate the Space or all Spaces of a Volume by setting the “Deactivated for restore” status, to prevent TeamDrive Clients from accessing the affected TeamDrive Spaces.
- Restore the Space(s) by restoring the necessary Space directory or entire volume directory from your backup to the corresponding location.
- Reactivate the Space(s) to make it/them available to the TeamDrive Clients again.
- The Clients will be notified that a Space recovery is required, which should be performed according to the procedure outlined in the TeamDrive Client documentation.
If a restore of a single Space is required, the task of (de-)activating it can be performed via the Host Server’s Administration Console. Open the Space Details page, check the Deactivated for restore checkbox and click Save to change the state.
After the restore has finished, click Complete Restore to re-enable the Space.
Alternatively, the Space deactivation and reactivation can be performed on the command line, as outlined below.
If you need to restore an entire volume, all Spaces contained in this volume need to be marked for restore. You can use a script included in the Host Server installation, which performs the tasks of deactivating and reactivation of all Spaces on the affected volume.
In any case, the actual task of restoring the Space(s) from backup has to be perfomed manually by the administrator and the TeamDrive clients will have to perform a local Space recovery to get the local Spaces back into a consistent state.
The examples describe the usage of the “Restore Script” used to restore an individual Space as well as an entire Space Volume.
Load and start the Restore Script¶
The Restore Script has to be executed using the Yvva Runtime Environment’s
commandline shell yvva
:
[root@hostserver ~] yvva
Welcome to yvva shell (version 1.2.0).
Enter "go" or end the line with ';;' to execute submitted code.
For a list of commands enter "help".
> execute file 'RestoreSpace.pbt' location 'setup/scripts/Restore';;
Usage:
-------
[] List volumes available for restore:
RestoreSpace:volumes();
[] List spaces availbale for restore:
RestoreSpace:spaces();
[] Deactivate volume before restoring:
RestoreSpace:deactivateVolume(<ID>);
i.e. 'RestoreSpace:deactivateVolume(5)'; to deactivate volume with ID 5.
[] Deactivate space before restoring:
RestoreSpace:deactivateSpace(<ID>);
i.e. 'RestoreSpace:deactivatSpace(7);' to deactivate space with ID 6.
[] Reactivate volume after restoring:
RestoreSpace:reactivateVolume(<ID>);
i.e. 'RestoreSpace:reactivateVolume(5);' to reactivate volume with ID 5.
[] Reactivate space after restoring:
RestoreSpace:reactivateSpace(<ID>);
i.e. 'RestoreSpace:reactivateSpace(7);' to reactivate space with ID 7.
Loading the Restore Script does not execute any task yet, it only displays a short usage information as shown above. You can execute selected tasks by entering their name and providing the respective Volume or Space ID in brackets. Finish the command with two consecutive semicolons to immediately execute the submitted code.
Identify the Space you want to Restore¶
Usually, the ID of the Space to restore should be obtained from the TeamDrive Client. The Space Information displayed in the Client window for each Space contains a field Space ID that contains the ID used on the Host Server. The actual Space data will is stored in a subdirectory below the Space volume, using the Space ID as the directory name.
To list all available Spaces on the Hosting Service you can execute the
command RestoreSpace:spaces()
:
> RestoreSpace:spaces();;
.----------------------------------------------------------------------------.
| Spaces |
|----------------------------------------------------------------------------|
| Volume | ID | Title | ResID | Status |
|--------|--------|-------------------------------|-------|------------------|
| vol01 | 3 | | 2 | 0 |
| vol01 | 4 | | 3 | 0 |
'----------------------------------------------------------------------------'
A list of all active Spaces will be displayed which should look similar to the list shown above. Identify the ID of the Space you want to restore.
Deactivate the Space to Restore¶
After identifying the Space you want to restore, you have to deactivate it
by providing the ID to the command RestoreSpace:deactivateSpace(<ID>)
:
> RestoreSpace:deactivateSpace(3);;
140610 13:49:56 [Note] Deactivate space [3]
140610 13:49:56 [Note] Space with ID '3' deactivated.
In the example above the Space with the ID ‘3’ has been deactivated. After a short while, the Client will notice this change and mark the Space accordingly on its side.
Restore Backup¶
After deactivating a Space, you can now restore its data by copying the backup
of that Space into the corresponding location on the Space volume, e.g.
/spacedata/vol01/3
in our case.
Reactivate Space¶
After copying the backup to the deactivated Space, you have to reactivate the
Space which makes it available to the TeamDrive Clients again. To reactivate a
certain Space you have to execute the command
RestoreSpace:reactivateSpace(<ID>)
:
> RestoreSpace:reactivateSpace(3);;
140610 13:54:07 [Note] Reactivate space [3]
140610 13:54:07 [Note] /spacedata/vol01/3
140610 13:54:07 [Note] Space '3' reactivated successful. Restore ID: 3.
Restore Log No: 0, Restore Log Offset: 855
In the example above the Space with the ID ‘3’ has been reactivated.
The Client will now notify that the Space has been reactivated and a local Space recovery operation has to be performed.
Identify the Volume you want to Restore¶
To identify all available volumes on the Hosting Service you have to execute
the command RestoreSpace:volumes()
:
> RestoreSpace:volumes();;
.--------------------------------------------------------.
| Volumes |
|--------------------------------------------------------|
| ID | Name | Status |
|-------|-------|----------------------------------------|
| 1 | vol01 | Operational |
| 2 | vol02 | Operational |
'--------------------------------------------------------'
A list with the volumes will be displayed which should look similar to the list shown above. Identify the ID of the volume you want to restore.
Deactivate the Volume to Restore¶
After identifying the volume you want to restore, you have to deactivate the
volume with all its Spaces. To deactivate a volume with a certain ID you have
to execute the command RestoreSpace:deactivateVolume(<ID>)
:
> RestoreSpace:deactivateVolume(1);;
140603 17:13:21 [Note] Deactivate volume [1]
140603 17:13:21 [Note] 2 Spaces of Volume 'vol01' left to deactivate.
140603 17:13:21 [Note] Space with ID '3' deactivated.
140603 17:13:21 [Note] Space with ID '4' deactivated.
In the example above the volume with the ID ‘1’ and all the spaces within that volume will be deactivated.
Restore Backup¶
After deactivating a volume, you can now restore its data by copying the backup of that volume into the corresponding location.
Reactivate Volume¶
After copying the backup to the deactivated Volume, you have to reactivate the
Volume which makes the Spaces available to the TeamDrive Clients again. To
reactivate a certain Volume you have to execute the command
RestoreSpace:reactivateVolume(<ID>)
:
> RestoreSpace:reactivateVolume(1);;
140603 17:15:37 [Note] Reactivate volume [1]
140603 17:15:37 [Note] Reactivate space [3]
140603 17:15:37 [Note] /spacedata/vol01/3
140603 17:15:37 [Note] Space '3' reactivated successful. Restore ID: 2.
Restore Log No: 0, Restore Log Offset: 855
140603 17:15:37 [Note] Reactivate space [4]
140603 17:15:37 [Note] /spacedata/vol01/4
140603 17:15:37 [Note] Space '4' reactivated successful. Restore ID: 2.
Restore Log No: 0, Restore Log Offset: 384
In the example above all Spaces on the Volume with the ID ‘1’ have been reactivated.
Exit the yvva
session¶
You can close the yvva
session by typing quit
or pressing Ctrl+D on
the >
prompt.
Setting up Server Monitoring¶
It’s highly recommended to set up some kind of system monitoring, to receive notifications in case of any critical conditions or failures.
Since the TeamDrive Host Server is based on standard Linux components like the Apache HTTP Server and the MySQL database, almost any system monitoring solution can be used to monitor the health of these services.
We recommend using Nagios or a derivative like Icinga or Centreon. Other well-established monitoring systems like Zabbix or Munin will also work. Most of these offer standard checks to monitor CPU usage, memory utilization, disk space (especially the file systems providing the TeamDrive Space Volumes) and other critical server parameters.
In addition to these basic system parameters, the existence and operational status of the following services/processes should be monitored:
- The MySQL Server (system process
mysqld
) is up and running and answering to SQL queries - The Apache HTTP Server (
httpd
) is up and running and answering to http requests (this can be verified by accessing the files http://hostserver.yourdomain.com/ping.xml and http://hostserver.yourdomain.com/admin/ping.xml) - The
td-hostserver
service is up and running (process nameyvvad
) - For Host Servers using an Amazon S3 compatible object store (see Setting up an Amazon S3-Compatible Object Store
for details): the
s3d
process is up and running - For Host Servers using TeamDrive Scalable Hosting Storage (TSHS, see
TeamDrive Scalable Hosting Storage for details): the
tshs
process is up and running (and all related MySQL nodes are up and running, too)