![]() |
VOOZH | about |
We’re so glad you’re here. You can expect all the best TNS content to arrive Monday through Friday to keep you on top of the news and at the top of your game.
Check your inbox for a confirmation email where you can adjust your preferences and even join additional groups.
Follow TNS on your favorite social media networks.
Become a TNS follower on LinkedIn.
Check out the latest featured and trending stories while you wait for your first TNS newsletter.
As developers, we tend to write log details with ourselves in mind. This is fine in a DevOps organization where the development team also handles the ops. But many larger organizations may choose to operate an Information Technology Infrastructure Library (ITIL) approach to error and problem management, or you may have a product that people are deploying in locations beyond your reach. But we need to think further ahead. One of the important aspects of ITIL is its definition of a known error:
A Known Error is a problem that has a documented root cause and a Workaround. Known Errors are managed throughout their lifecycle by the Problem Management process. The details of each Known Error are recorded in a Known Error Record stored in a Known Error Database (KEDB). As a rule, Known Errors are identified by Problem Management, but Known Errors may also be suggested by other Service Management disciplines, e.g., Incident Management. (Source: IT Process Maps wiki)
In simple terms, an organization will keep a record of errors with resolutions. This is why we assign errors with error codes. An error code allows us to provide a simple lookup for an error that can be linked to the appropriate documentation. The documentation should describe the error and provide details, including a remedial set of actions to perform (this is essential if this involves bringing the system back to an optimal state without corrupting or losing data).
If, of course, the log event is recorded and acted upon before things really go wrong, then the actions could be preventative in nature.
The cause of the error could be either from a user action or a bug in the application that has been caught; either way, we should add error codes to the log information. It is best not to pop up error codes in a user interface (UI), as you’re likely to undermine user confidence in your product. But that shouldn’t stop you from linking error codes to messages suitable for users when a user action triggers a problem.
Error codes make it straightforward to enable customers to look up errors, descriptions and recommended responses to incorporate into a known error database (KEDB). Building such error code content may seem very demanding; this can be far from the case.
While developing software, the simplest solution is to have a collaborative spreadsheet that allocates error IDs, ensuring the IDs are unique. Then capture the expected cause with a brief description from the developer. Building out resolution documentation can always be done later.
One of the benefits of using error codes is that it becomes pretty easy to standardize and internationalize the documentation about the errors. The error codes are language- and locale-agnostic; once you have the code, you can then look up the code documentation in an appropriate language.
There are all sorts of additional tricks that you can incorporate into the software development processes, such as including the documentation in the code management tool. Hence, you release the document with the code so that the details are linked to your release process. Code-quality tooling could look for errors or fatal log entries and apply a regular expression to see if an error code is linked to it, and so on.
Here are a few tips for creating your error code numbering:
Some technologies provide codes to indicate success and errors that have been well documented, such as those for HTTP (RFC); others include SMTP (email services) and Oracle WebLogic Server.
Using such codes in logs helps provide more context through a common meaning and understanding — as long as they are used correctly. For example, the temptation to simply do everything through the standard HTTP 200 or 400 codes doesn’t help. Using an HTTP 413 code to tell the requester they sent too much data is far more effective and meaningful, not to mention that this will show up in the logs for any network routing devices.
The use of predefined error codes does need to be judged with care, as exception classes in software can be considered a special form of error code. But these circumstances could also reduce clarity.
Error codes are the most important messages to be uniquely identifiable. The principle of associating codes to documentation for operational (rather than user) processes means that you can hook the event back to specific operational recommendations that could range from performing database optimization processes to archiving log files.
You’ve just read an excerpt from the Manning book “Logging Best Practices” — you can download a complimentary copy of the entire book.