Mentis Homepage Thought Hardware Software Design About Search Move up one level
Software Programming Alice

Alice Developer's Journal

This section documents the day-to-day development of Alice. Only the most recent entries are shown on this page, the earlier entries are available on the archive page. You may direct comments or questions to aravin@mentis.ca.

Confused about these weird dates we use? Check our Frequently Asked Questions for an explanation.

2001:120
Alice has now been running several days without a crash. Reservedly, i will say that i think we've adequately addressed the timeout problem. If this turns out to be so, then Alice 2.0 will be officially gold.

2001:114
I should have known it wouldn't be so simple. Modifying the timeout handler wasn't enough, the problem persisted. However we're fairly certain that the problem only appeared after the timeout handler was inserted. So, i don't quite know where the source of the problem lies. Nonetheless i've tried to remedy it by moving the error checking into QueryFailed. It's inelegant, but then so is the Inet control.

Another minor issue, Trevor reported that in the Alerts window the copy-to-clipboard functions weren't working. It was easily fixed, it turns out that one has to clear the clipboard before setting it. Why the SetText method doesn't handle that itself, i cannot fathom.

2001:101
I think we've quashed a rare bug today that has been interfering with the long term stability of Alice. It turns out that the explicit timeout handling we built into NetworkMonitor failed to do one very important thing: inform the Inet control that we've timed out! It causes the dreaded "Still executing last request" error when a query takes longer than 15 seconds, which isn't very often. That's why it was hard to trace; Trevor finally caught it today while running Alice from the VB environment. If after this fix Alice is able to run continuously for a week, we'll probably release it.

2001:52
Significant news: we now have Alice communicating with the real LG from my development machine. James managed to set up a redirector at the office, so now i can test Alice using live numbers. Already this effort has paid off, as it allowed me to isolate some elusive bugs in the alerting code. Some calculations were running into integer range limits, but only under certain conditions--conditions that only obtained on the real LG and not on my simulation.

We're now looking into shoring up some of the more rarefied problems with Alice. For example, how to handle nodes that disappear completely from a card, a situation which at present is undetected. There are a few other issues of such subtlty that remain; i expect that once those are addressed we should be ready for release.

Development is going to be slow since much of our attention is being directed towards setting up Mentis.

2001:38
Alice development has been resumed after a long hiatus, and much has happened in the interim. Ix is now a division of Mentis, a joint venture between Trevor and myself.

As for the program itself, there have been three items of note. First, all the features specified in the previous entry have been implemented. Second, a small but fatal bug has been corrected. It had to do with the alerts log, which only allowed 2 digits for percentages. Whenever a value of 100% had to be displayed, the program would crash.

The third item is the most significant. We discovered that under certain conditions, an LG failure could causes all subsequent querying to be suspended. The conditions were when LG timed out after returning a response header. A timeouts event is provided by the Inet control, but this only applies when no response header is received. The solution was to build timeout handling directly into the NetworkMonitor class, which will now abort any query that takes longer than 15 seconds for any reason.

2000:363
All the features mentioned in the previous entry have been implemented. The process of gradual refinement continues, with the following changes:

  • The Alerts Log is now shown with the most recent entries towards the top.
  • If no Active Alerts are selected, invoking Copy will copy all alerts to the clipboard.
  • The CSV log now excludes comments, which weren't accepted by Excel.

The following changes are slated for the next round:

  • The alerts log should be kept to a maximum of 100 entries, with the oldest entries being removed as new ones are added. This is mainly to facilitate running Alice non-stop for extended periods, but also because the VB textbox supports a maximum of 32 KB of text.
  • In addition to the CSV log, a snapshot of the Active Alerts should be written to a file each time the list changes. This file should be in HTML format to allow viewing via a web browser. In addition the paths of both of these files should be configurable.

Also, there is the question of sorting the Active Alerts list. Since most of the data in the list is non-alphabetic, the intrinsic sorting capability of the listview control is inadequate. I can easily write my own sorting mechanism, but the bugaboo is imposing my order upon the list. MSDN indicates that the Index propery of items can be "read or set", but in fact the property is read-only. Why can't MSDN be precise and accurate?! Anyway, i'm going to post on the About VB forum to see if anyone can provide a solution.

2000:359
Curse their oily hides, a pox upon their house, etc. Bringing up the config window still sometimes causes querying to pause.

Anyway the significant development is that logging is working. An alert will track the maximum failure on its node as long as it is active, and those statistics will be written to the log when the alert expires. It seems to work quite well.

Things to be done:

  • The log should simultaneously be written to a CSV file for import to other apps.
  • Need ability to clear the log.
  • Node evaluation should occur for all nodes on a UBR after it's been updated, rather than as each node is processed. (call from UBRclass.CommitData rather than UBRclass.UpdateData.)
  • Should show the number of active alerts on the Network window status bar.

2000:358
I made quite a bit of progress yesterday.

  • Alert conditions can be modified through the Config window, and Alice now saves settings on exit.
  • Trevor reported that it was difficult to gauge the extent of failure with the absolute style meters on the UBR window. It was a combination of two factors: the total number of modems displayed for a card or upstream are much less than for a UBR, and the meters themselves for cards and upstreams are smaller. He suggested that the on the UBR window the absolute style meters should use an upper bound of 500 modems instead of 1000 as on the Network window, and i've made the change. This makes such perfect sense that i'm surprised we didn't pick up on it earlier.
  • I got rid of the menu on the Network window and moved the configuration command to the toolbar. It was sometimes happening that using the menu would pause the Query Queue. Yet another flaw in Windows or VB that i've had to work around.
  • Keyboard shortcuts have been implemented on the Network window, they'll be added to the UBR window as well.
  • I realized that i didn't need two buttons to change the meter style, a single toggle button would have sufficed. However Trevor told me to leave it as is; it will be difficult to resist the temptation to change it, being the perfectionist that i am!

Next on the agenda is the logging of alerts. There are a few design questions that remain unanswered in this regard, so i'll have to talk to Trevor before i can proceed.

2000:356
Trevor came over last night, and we hammered out a design for the alerts. The main features to be implemented now are:

  • Sorting the alerts list by various criteria.
  • Modifying alert conditions through the config dialog.
  • Copying selected alerts to the clipboard.
  • Maintaining a log of alerts.

Trevor's got a sharp eye for details, it's good to have him around as an assistant programmer. "Igor, the fluid!"

2000:354
A very rough cut of the Alerts window is in place. It's pretty bug-infested at the moment, but it appears that the data structure chosen will be adequate for the job. I'll need to consult with Trevor to come up with an efficient interface layout, and from there it'll be straightforward coding. I anticipate no problems completing the Alerts system by year's end.

2000:346
Development certainly has slowed down considerably over the last few weeks. It seems soon after my vacation was over, my holidays began! Well, i was assisting Trevor with Project Mango, though in sooth that took but little of my time.

Anyway, i don't think the pace of development will pick up until after year's end, but i'll try to at least get the alerts ready before then. I've been working on the configuration window, and things got a little messy. I accidentally set up a circular flow structure and caused some nasty stack overflow bugs. That's all been sorted out now, and i'm ready to move on to the alerts proper. I'm going to attempt the simpler of the two alert models i mentioned earlier.

2000:338
Alright, my vacation's over, time to resume development.

First order of business: Putting together the configuration dialog. I've got to organize the various controls into groups of related items and design the form, then code the behaviours. Properly i should make the Config object persistable, but that can wait.

Second order of business: Determining the data structure to use for the alerting system. It would be easy enough to create an Alert class and maintain a list of Alert objects. The difficulty is that each node in the system must be aware of its status. I'm thinking to use a circular reference, provide each node with a reference to an Alert object, and each Alert object with a reference to it's node. This would require rewriting the node from a simple type into a class, and that's where the bulk of the work would lie. An alternative is to use a flag in each node, and a compact node identifier in the alert object. It would be much easier to implement, and i'm investigating how this would impact the extensibility of the system. If the impact is minimal, i'll go with this option.

Return

Continued on the journal archive page...