Features of mon 0.99.2 
mon was developed under Linux, but it is known to work under Solaris 2.5
and 2.6. Since the clients and server are written
completely in Perl, portability shouldn't really be too much of an issue.
The following is a list of some of the features of mon:
    -  Monitors
    
-  "Monitors" are programs that check for a particular condition,
    	and report success or failure to the server, along with
	any output.
    	They are independent of mon, so to add a test for a
	new service, you can just write your monitor in any language,
	put it in the monitor directory, and it just works.
	
     
-   Asynchronous Events
    
-  Support for asynchronous events communicated to the
        mon server. This is an open-ended protocol, like
        the monitor and alert scripts, so that you can trigger on
        anything. One obvious use is acting on SNMP traps. Traps
	generated by remote entities can be programmed to behave
	in the same manner as failures noticed by local polling
	monitors, so it is possible to build a distributed monitoring
	architecture. For example, remote monitoring domains (such as
	sites separated by slow WAN lines) can collect their own
	data locally and report significant events to a centralized
	location, such as a NOC.
        
     
-  Alerts
    
-  "Alert" scripts send a message or otherwise act
    	on a failure that mon detects. These alerts, like
	the monitors, are not part of mon, and are easy to add.
	"Upalerts" are also supported, which are used to trigger
        an alert when a server comes back up after being down for
        a long amount of time.
	
     
-  Alert Management and Failure Handling
    
-  Failure of any monitor can trigger any (and multiple) alerts,
    	to different people at different times. You can effectively
	construct "on call" schedules using this feature. For
	example, you can send
	a page to all system administrators if a resource goes down
	before 8PM, but after 8PM, page only Joe, but send email to
	everyone else.
	
	Many alert throttling controls are implemented.
	 
     
-  Parallelization
    
-  Parallelizes the checking of services on different
    	hosts or groups of hosts. For example, pinging your routers
	can happen while it is also pinging your WWW servers. There's
	no queue that can postpone the scheduled testing
	of other services.
	
     
-  Repetitive Alert Supression
    
-  Repetitive alerts can be supressed. For example, only
    	send email once an hour if a service continues to fail.
	As an option, small, transient failures of a service may be ignored.
	
     
-  Dependencies
    
-  Inter-service dependencies and even correlation. For example,
	if the router between the monitoring host and your WWW
	server is down, HTTP won't work, so only send an alert that
	the router is down. This prevents the cascading of zillions
	of alerts that happens when some critical resource is not
	accessible. Dependencies can be understood as a hierarchical
	form (a tree), and when a failure occurs, the tree is traversed
	towards the node which has no unresolved dependencies. However,
	complex dependencies can be described using a generic graph, since
	the actual implementation does not require a hierarchichal layout.
	
     
-  Flexible Configuration
    
-  A very flexible (and extensible) configuration file.
    	Hosts can be grouped together, and each host or group
	can have multiple services. Have a look
	at an example configuration file.
	Another m4-based example.
	
     
-  Client/Server Model
    
-  Has interactive command-line,
	WWW-based, and SkyTel 2-Way
    	alphanumeric pager-based clients
    	that query the server for status and history. The protocol is simple,
	and it is very easy to make clients of your own.
	Multiple authentication methods are supported (including PAM),
	along with per-user access control.
	A Perl module API can be used to query the server, so writing
	alternate interfaces are simple (such as one which takes
	advantage of WAP, Wireless Access Protocol). At this point
	there are several WWW interfaces actively being maintained by
	different parties, each with its own report  and goal.
	
	Click here for demonstration of the mon.cgi web interface.
	 
     
-  View-based Status Reports
    
-  To help with large configurations, "views" can be generated
	 to simplify reports for customers who do not need to know
	 the status of all services being monitored. For example, 
	 a "network" view can be generated which includes the status of
	 all networking gear, just as a "servers" view can show all
	 info pertaining to servers. Views can be configured on a per-customer
	 basis if needed, and customers have control over their own views.
	
     
-  Run-time Alert Acknowledgement and Disabling
    
-  A service failure can be acknowledged so that alerts are
	surpressed until the problem is fixed. This "ack" state
	is retreivable from the client interface so that users
	can see that support staff are working on the problem.
	Also, Alerts for particular hosts, groups, or services can
    	be temporarily disabled an re-enabled by the client, without
	stopping and restarting the server.
	If you're upgrading a particular server, you can disable
	the alert while you're doing the work, and re-enable it
	when you're done.
	
     
-  History
    
-  Keeps a historical list (queried by the clients)
    	of both failures that were detected and alerts that were
	triggered.
	
     
-  Portability
    
-  Nothing to compile for the server or clients, and written
    	in 100% Perl 5. This should help portability.
trockij@linux.kernel.org