Revision 541 – Added Parsing of DCNM Data for Nicknames

In revision 541, the basic capability to ask a DCNM service for the Nickname/WWPN mapping has been started. This has the potential to reuse the information entered into DCNM rather than re-entering manually, and avoid pulling zonefiles from each zone for parsing using a –nickname=file:// or –nickname=http:// method. As well, in situations where aliases are not used, we can collect the port labels or private aliases that DCNM may not share down to the switch.

This method is only initially defined, but requires a bit more work.

The same parser logic is used as was added for OnCommand in OnCommand Query and for BNA in BNA Query. Like the BNA and OnCommand work, the DCNM query simply reformats a query and sends it through the array of parsers to vote upon:

java -jar vict.jar --nickname=dcnmsql://user:pass@server:port/ --nicknameout=\VirtualWisdomData\DeviceNickname\nicknames.csv

Default user/pass should be accurate but need additional testing to confirm. Like the BNA parser, this method hits the underlying database directly, so it needs (firewalls/filters) direct access to the server, and is vulnerable to schema changes. Schema is based on 5.2 documentation.

In the meantime, cisco-shows2wwncsv.awk is also provided to build portlabel nicknames as import CSVs; this awk file needs two “show” commands on every switch. NOTE: every switch, not just every fabric, and it needs to see the output of “show flogi database” before it sees “show interface description”.

Revision 537 – Confirm DB Backup is OK

I had a few customers who had no idea their backups were not running. Typically this is because of space-exhaustion, but sometimes this was because the backup schedule was not set, or it had become deactivated.

PHC on/after version 0.2-537 will now confirm that a backup was completed within 14 days; this value was chosen based on the recommendation that backups be done weekly, and the longest possible “staleness” of a backup is when the next backup is running: 7 days. Missing two backups is a critical concern. If a backup wasn’t scheduled, then there will be no backup logged; if a backup fails due to space, it will still be cause by not showing a completed backup. Either way, we can now catch when a portal service has no protective backup and is a risk to upgrading.

Revision 536 – FixNickNameHistory for Database Back-Edits

In Revision 536, I implemented “–fixnicknamehistory” which applies any loaded nicknames (for example, those gained using OnCommand Extraction, those Parsed from Cisco Zones, or those extracted from BNA) to the underlying Portal Server database retroactively.

So long as the Portal Service is shut down, this can be done on a customer’s live server, but it’s more useful in analysis where nicknames arrived late yet the analyst cannot wait another week or so for nickname-ful summaries to be collected.

Revision 533 – Semi-Statistical Summary Insertion Delay

In this revision, I incorporate Welford-via-Knuth into the running statistical calculation, providing a running mean/deviation calculation to show where the greater portion of insertion delay are.

Why?

The “mean” of a value is nearly useless for planning: it’s just an average. An understanding of how that value varies can help get a better idea of it just as any ability to match the behaviour to a predictable curve. A mean+/-1deviation can give a basic idea; a mean+/-3deviation can show where generally “all the non-unusual ones” land. Put another way, mean+/-3deviation can tell where all the value land once the aberrant outlying points are removed.

Currently, this means that the summary calculation now shows the mean delay plus the mean+1deviation “ceiling”: a mean +/- deviation provides two values of course, but we only worry about the worst case when a risk exists of taking too long to insert a summary row.

PHC-0.2-536 showing mean+1deviation

PHC-0.2-536 showing mean+1deviation

Currently, I’m not sure whether the mean+1deviation is the more critical value than the reported mean, nor whether we should shift to the mean+3deviation as a more appropriate metric.

Revision 529 – Check Summary Average Delay

Revision 529 is where the FileSystem Layout Detection (r527) is used to start highlighting risks in Summary Insertion Delay.

I would prefer to also look at the insertion time rather than delay, as delay may well remain constant while the actual insertion cost can lead to longer delay. That’s like addressing speed difference rather than distance between two race cars: looking at differences in speed can alert more quickly and predict changes in distance. Similarly, more metrics can be watched once I figure them out. Just like most users, I’m looking from the outside, but I can fake up the interpreted content in a known-ciphertext-attack -like method of using conditioned input to the parser to tell me what I’m seeing.

Revision 527 – Improved FileSystem Layout Detection

internal: the detection of filesystem layout and file relative locations was all twisted, and was flakey. So I’m ripping it out. It’s mostly refactored, but in order to leverage it sooner, there are currently both methods active. The flakey one will finish dying very soon, but in the meantime, only most of the code benefits from this structure change rather than all of it.

The way I’m doing it now is a straight Booch-ian method of taking multiple objects and using them to encapsulate the logic. They are interchangeable. The most applicable will be chosen by — you guessed it — voting. Whichever one looks most likely will be used. This allows me to abstract off discovery of pathnames and locations as they change from live server, PortalLogs, and database backup without huge changes in code and whackloads of if/else loops.

The behavior I wanted to leverage more quickly is the reading of summarymon log to check performance at the database insertion side.

Revision 526 – Vote02-ZoneSplit2onTargetPrefix.awk

Every customer is different, and their nicknames of zone titles is no exception. There is no clear consistency in zone names which can then lead near-consistently to Attached Device Names. Nearly.

Zone-Vote Algorithm is always an assumption-lead guess. It poses only a slim chance of being 100% accurate.

In cases where the target in a zone title starts with the same prefix, this Stage-2 of the process can help narrow it down. I added Vote02-ZoneSplit2onTargetPrefix.awk to split zone names based on the prefix of the target device (for example if all targets start with STOR*)