- report version dynamically
- report version in a text file in the deployment directory — if applicable
- report version inside the .war file — probably in the Manifest
- embed version in any generated client-facing pages
- provides an “alive” URL for monitoring
- provide URL to validate deeper monitoring (“healthy” internal processes)
- provide URL to validate *connectivity* remote sytem dependencies
- Note that we just care about connectivity here. The remote system should have its own health monitor.
- provide URL for report of error conditions, current and past
- report errors (push) to a remote system (in addition to logging)
- for monitoring, break logging into
- INFO one-time initialization, periodic milestones, cycle start/stop, metrics
- WARN conditions to watch over time
- ERROR a condition that requires someone to look at and evaluate
- CRITICAL a system failure that needs immediate resolution
- identify tasks. Each task will have an elapsed time, and at least one count associated with it
- provide running counts & elapsed time measurements of important system functions
- provide running counts of error conditions
- provide running counts aggregated into categories based on processing data
- provide in-memory aggregation & statistics about key running counts
- system-wide count of errors and warns
- documentation should be in the code, not separate. Don’t repeat yourself!
- generated API documentation
- generated configuration documentation
- generated logging documentation
- All instantiated objects should have lifecycle managed by the application context — create, initialize, destroy
- Should be easy to hook objects into the application context lifecycle, either decoratively or through discovery.
- Note: things like object pools or execution pools need to be shutdown gracefully and cleaned up!
- WIP the system needs a plan for how to handle asynchronous processing in a way that’s standard throughout the system.
- e.g. JMS Queue? Execution thread pool? Batch from database table? In-memory? Persistent?
- standard mechasim for parallelizing tasks by breaking tasks into batches that can be distributed across machines
State Management / Distributed Caching
- WIP the system should have a highly available, horizontally scalable mechanism for saving processing state for longer transactions (such as user sessions or inputs to a long-lived messaging conversation)
- e.g. external (memchached), internal with broadcasting (ehcache), NoSql, relational, in-memory, persistent