How to Re-Use VirtualWisdom Data in 3rd-Party Tools

Everyone I’ve personally met who has witnessed the detail of VirtualWisdom metrics tends to be first amazed, then relates it to LAN and ethernet tools, then questions why we haven’t seen this before. The next question in very large organizations is “how can we re-use this data in our [insert home-grown tool here] ?”

Incorporating VirtualWisdom into an organization has various points of “friction”: training on a new tool, understanding the metrics, collection of data to help VirtualWisdom correlate, and beginning to use it internally. As a Virtual Instruments Field Application Engineer (AE or FAE), I tend to see the initial friction (collection of data, such as nicknames, or grouping Business-Units as UDCs. The less common friction is “OK, we love VirtualWisdom, but our expansive storage team want to exploit the metrics in our custom home-grown planning tools”.

Converting VirtualWisdom into basic data-collector ignores the reporting, recording, and alerting capabilities it offers; re-using its data in multiple entities of a corporation is an expansion on VirtualWisdom’s utility, and I’m more than happy to help a customer do that. The more we can help our customers make an informed decision — including leveraging more data in their own tools — the more we can help our customers “free their data” and improve the performance and reliability of a complex data-storage environment.

My entries here on the Virtual Instruments Bast Practices blog tend to be of a how-to nature; in this article, I’d like to show how the opensource tool “MailDropQueue” can help push VirtualWisdom data into your home-grown toolset.

There was a time that customers tried to find our reports by digging through the “exports” directory after a new report was produced — because the most recent report.csv.zip is the correct one, right? This ran into problems when users generated reports just before scheduled reports, and when the scheduled “searcher” went searching, the wrong report would be found. Additionally, some reports took a long time, and would not always be finished by the time the customer’s scripts went searching. Customers typically knew what script they could run to consume the data and push it to their own systems, but the issues in finding that file (moreso due to a lack of shell on Windows) caused this solution to become over-complex.

Replication at the database level gives us the same problem: the data is in a schema, and it’s difficult to make sense of without the reporting engine.

A while ago, VirtualWisdom gained the ability to serialize colliding reports: if a user asks for a report at the same time the system or another user is generating a report, the requests get serialized. This allows VirtualWisdom to avoid deadlock/livelock situations at the risk of the delay we’re used to at printers: your two-page TPS Reports are waiting behind a 4000 page print of the History of the World, Part 1A. The benefit of a consistently-responsive VirtualWisdom platform are well worth this benefit. Unfortunately, the API that many users ask for poses this same risk: adding a parallel load onto VirtualWisdom that needs an immediate response, adding delay in both responses and risking concurrency delays at the underlying datastore.

The asynchronous approach — wherein VirtualWisdom can generate data to share through its reporting engine — is more cooperative to VirtualWisdom’s responsiveness, but returns us to the issue of “how do I find that report on the filesystem?  The correct report?”

MailDropQueue is a tool in the traditional nature of UNIX: small things that do specific jobs. UNIX was flush with small tools such as sed, awk, lpr, wc, nohup, nice, cut, etc that could be streamed to achieve complex tasks. In a similar way, MailDropQueue receives an email, strips off the attachment, and for messages matching certain criteria, executes actions for each.

It’s possible for VirtualWisdom to generate “the right data” (blue section, above), send it to MailDropQueue (red portion, above), and have MailDropQueue execute the action on that attachment (green part above).  In our example, let’s consider where a customer knows what they want to do with a CSV file; suppose they have a script such as:

@echo off
call DATABASE-IMPORT.BAT TheData.CSV

The actual magic in this script isn’t as important as the fact that we can indeed trigger it for every attachment we see to a certain destination. Now all we need is to make a destination trigger this script (ie the green portion of the diagram above):


<?xml version='1.0' ?>
<actions>

  <trigger name="all">
    <condition type="true"/>
    <action>IMPORT</action>
  </trigger>

  <script id="IMPORT" name="import" script="DATABASE-IMPORT.BAT" parameters="$attachmentname"/>

</actions>

From the above, the “condition type=true” stands out, but it is possible to constrain this once we know it works, such as to trigger that specific script only when the recipient email matches “ftp@example.com”:


  <condition type="equal">
    <recipient/>
    <value>ftp@example.com</value>
  </condition>

Also, it’s not so obvious, but the result of every received email that matches the condition (“true”) is to run the script with the attachment as the first parameter. This means that if an email arrives with an attachment “performance.csv.zip”, MailDropQueue would Runtime.exec("DATABASE-IMPORT.BAT performance.csv.zip").

For reference, I’m running this on a host called fakemta.example.com, compiled to use the default port (8463) as:

java -jar maildropqueue.jar -c maildropqueue.xml

Where maildropqueue.jar is compiled by defaults (./configure && make) from a “git clone”, and maildropqueue.xml contains the configuration above. There’s a downloadable

Finally, We need to configure VirtualWisdom to generate and send this data attached to an email; this is a fairly simple problem for any VirtualWisdom administrator to do.  Following is a walk-thru up to confirming that the content is being generated and sent to the MailDropQueue; the composition of the report and the handler script “IMPORT-DATABASE.BAT” is too environmentally-specific to cover in this article.

  1. Create the report (outside the scope of this article) — confirm that it produces output. The following snapshot uses our internal demo-database, not actual customer data:

    Capacity / Performance Statistical Report

  2. Create a Schedule to regularly generate and send it:
    1. In the Report Generation Configuration, check that you have the hourly summary if so desired:
    2. Check that all probes are used, but you don’t need to keep the report very long:
    3. Confirm that you have file-format set to CSV unless your handler script can dismantle XLS, or you intend to publis a PDF:
    4. Choose to send email; this is the key part. The message can include anything as subject and body, but you must check “E-mail report as attachment”:
    5. …and finally: you may not yet have the distribution list set up, consider the following example. Note that the port number is 8025, and server is localhost because in this test, I’m running the MailDropQueue on the same server. The sender and recipient don’t matter unless you later determine which actions to run bases on triggers matching sender or recipient:
    6. Check that your MailDropQueue is running on the same port (this is an example running MailDropQueue using VirtualWisdom’s enclosed Java and the config example above; the two “non-body MIME skipped:” messages are from clicking “Send Test E-Mail” twice):
  3. Finally, run your MailDropQueue. The skip used above is shown here (except that running it requires removing the “-V”, highlighted), as well as the config, and an output of “java -jar maildropqueue.jar -V” to show how MailDropQueue parsed the configfile:
  4. Clicking “Run Now” on the Scheduled Action for the report generation shows end-to-end that VirtualWisdom can generate a report, send it to MailDropQueue, and cause a script to be triggered on reception. Of course, if the script configured into MailDropQueue hasn’t been written, a Java error will result, as shown:
  5. Now the only things left to do are:
    1. Write the report so that the correct data is sent as tables and Statistical Summary reports (only one View per section)
    2. Write the IMPORT-DATABASE.BAT so that it reacts correctly to the zipped archive of CSV files

Use VirtualWisdom Alarms to Schedule Daily Tasks

The VirtualWisdom Service part of the VirtualWisdom Platform doesn’t necessarily do everything: our customers’ SANs differ in the small details as well as the larger ones, necessitating VI Services to help with some customization. In many cases, we set things up to run daily, such as grabbing zone info to convert to Nicknames, or converting Nicknames to UDCs and Filters.

In some cases, customers cannot edit the Windows Scheduler to run these, and do not have a UNIX-like system with an available scheduler. This can be due to access, or corporate policy. I wanted to share a workaround for this situation: (mis-)use the Alarm system to do so.

The following image may explain more efficiently than a walk-through:

Example daily alarm to run a batch file

As you can see by the name, the alarm policy should only be applied to one ProbeSW — to one SAN Switch.

The alarm will trigger when any data flows — you can see the trigger set to “> 0, 1 matching interval in domain of 1 interval”, and all it does it runs an external program. The configuration of that external program is also opened in the editor, and you can see that it simply runs a script (using full pathname).

The re-arm of that Alarm Policy Rule is “MB/sec != -1″. Because MB/sec can only go down to zero, “-1″ is impossible, so this rule will always match. The trick is that this has to match one triggered, and has to match for 288 intervals (288 x 5 minutes = 24 hours). Effectively, this is a logic statement that says “don’t run more often than every 24 hours”.

This Alarm Policy Rule effectively runs immediately after the Portal Service is restarted or the Alarm Policy is applied to a switch, and will run every 24 hours thereafter (understanding that 288 might need to be 287 to avoid a 5-minute skew daily).

The “meat” or complexity here would be in the BAT file: the Alarm uses the “External Script” action to run our batch file daily. This avoids configuring the OS Scheduler, but at a cost of not being able to choose the exact time. Additionally, the BAT file executes with the permissions of the Portal Server, which typically cannot view Network Shares and other remote resources.

Move Your VirtualWisdom Backups into Your Backed-Up Space

VirtualWisdom has an easy backup system: quite simple to configure for backups as easily as any scheduled event: as frequently as daily, at any time, and with multiple schedules possible, re-using the same configuration for each. The issue of a new filename every time — chosen by VirtualWisdom to avoid overwriting a good backup with one that might run into some exception and be incomplete — often causes a new backup file each week to be present, and no simple method of aging-out old backups.

The Post-Backup Script in the Backup Service Configuration runs after every backup, if activated: it simply executes a script with a few parameters. This allows the VirtualWisdom Administrator a certain flexibility in writing any manner of script that can run as the VirtualWisdom process to accomplish the automated moving around of backup files — or, logically, any task, even unrelated to the backup.

As defined by the underlying database vendor, our database files need to remain untouched by backup and antivirus processes which tend to lock the files for long periods. Any locked data file tends to block database writes, slowing throughput, and risking corruption of the data. This requirement also means that backups are typically outside of corporate backup tools and policies; the risk of a backup not being preserved in a catastrophic filesystem exception is clearly significant. Even though VirtualWisdom only handles measurements and data about the data, it does not handle data itself, and does not form a critical path in data I/O, loss of VirtualWisdom is loss of measurement and analysis tools which may be critical to resolve storage issues. Clearly we want the backup for VirtualWisdom to be safely archived.

In this article, I’d like to share one example of how successful backups can be moved into the filesystems covered by corporate backup policies, replacing past backups to avoid ever-increasing disk usage. My content here on the Virtual Instruments SAN Best Practices blog tends to be of a technical “how-to” nature; we hope this article may help define a customer’s backup config, giving safety to the data so that focus can return to the performance and availability of the SAN.

Overview

The basic backup process is a sequence such as:

  1. lock the database (database becomes read-only)
  2. quickly duplicate all database files
  3. unlock the database and let processing continue
  4. aggregate the backup files into a single file, optionally compressing

The feature we want to exploit to improve this process is the optional “Execute the following command upon completion” entry on a Backup Service Configuration to move the backup file to where it should be. In most cases, “where it should be” is a disk covered by corporate backup processes with sufficient space to hold the backup, compressed, accounting for organic growth (database backup grows as number of monitored ports, VMs, ESXs, and ITLs increase over time).

For our example, that is the “X” drive. Bear in mind that the backup script runs as the VirtualWisdom process, which runs as a service hence has no access to network drives. In our example, the “X” drive might even be a SAN LUN: even though we recommend that the disk not be on a SAN LUN due to the risk of being affected by the performance problems and exceptions that VirtualWisdom is trying to help users track and resolve, the backup may be on a SAN LUN because delays in the archived backup do not directly affect performance of the VirtualWisdom platform.

Example Backup Service Configuration

Typically, your backup schedule would look like the following: (except that my work server is small, so I have disabled mine by unchecking the checkbox beside the scheduled time)

Typical Backup Service Config without Post-backup script

… with a Backup Service Configuration such as:

Typical Backup Service Config without Post-backup script

Improved Backup Service Configuration

Instead of merely doing the backup, we can use the “post-backup script” to do the work for us. The “Post-Backup Script” is the name I’ve started using for the script that gets listed in the box for “Execute the following command upon completion”. An example script may be as simple as the following:

Example Post-Backup Script

As we can see, when the second parameter given to the script (“%2“) is a 1, then the filename given as the first parameter (“%1“) is moved to the consistent filename X:\Backups\VirtualWisdomBackup.zip. The X:\ drive would be within normal backup policy, so routine backups would protect the database archive.

This batch file is run by entering it as a “post-backup script” as follows. NOTE: where possible, use a full pathname to ensure the script is found, and it’s the correct script.

Example Backup Service Confg with a post-backup script

As we can see in this Backup Service Configuration, we have enabled the “Execute the following command upon completion” checkbox, and listed our script as the script to run. The two parameters are selectable with the “Insert” box, or may be directly typed free-form.

When the script runs after a backup is complete, the $BACKUP_STATUS$ is replaced by a 1 or a 0 depending whether the backup was successful — and as noted above, if this value is “1″, the working file is moved; otherwise, it’s untouched. Perhaps an enhancement might be to raise an alert that the backup failed (VirtualWisdom logs backup failures in the Portal log, but makes no other indication), or to delete or move aside a failed backup as well for analysis and fault-resolution.

When the backup is complete, and a new backup file is created named after the time that the backup started: backup - yyyy-mm-dd-hh-MM.zip, where yyyy is the year, mm is the month (zero-padded), dd is the day (zero-padded), HH is the hour (24-hour time), MM is the minutes (zero-padded) — yes, this is intentionally very close to ISO8601 that is the basis for RFC3339, HTML5, and XML date format. With a new pseudo-random always-incrementing filename, new backups will never overwrite previous backups, but they are difficult to track down. The $BACKUP_FILE$ token is replaced by this filename, allowing the post-backup script to work with the correct filename every time.

Of course, in order to summarize the underlying behaviour, we do change the name of the schedule itself, but it’s not critical:

Backup Configuration with post-backup script

In most articles, we include complete examples, but the development and explanation of this relatively simple example is a complete example. Of course, changes will have to be made for each individual unique environment. Most backups do not run to the C:\ drive because there would not be sufficient space; rather, most configurations have a D:\ drive or E:\ drive for data, and that drive is used as a working drive during backups.