Archive for September, 2009

Enterprise Systems Management Concepts

Posted on 6 Comments

In this article Feathered Owl explains some basic Enterprise Systems Management concepts. This piece was originally titled "Idiot's Guide to Enterprise Systems Management". Think of it that way if you prefer....

A Brief History of ESM

ESM as a discipline has developed as a result of the widespread migration from centralised mainframe and/or midrange computers to smaller distributed computers that has taken place throughout the IT industry over the past 10 to 15 years. ESM aims to solve one of the major difficulties associated with an IT infrastructure comprising large numbers of small computers distributed over a network, namely that these are inherently more difficult to manage than a smaller number of larger computers. As well as the computer and network hardware and operating systems, applications which run on the computers and which often span multiple computers, networks and geographical locations must also be managed. Many modern distributed computing infrastructures are highly complex; as a result, management of these has become a major challenge.

ESM as a recognised discipline has been around for about the last 10 years, and originated in the networking arena where the problems of remotely monitoring and administrating multiple devices were encountered before the big rush toward distributed computing. A special management communications protocol (SNMP - simple network management protocol) was developed as part of the overall protocol suite (TCP/IP) which allowed network devices from different manufacturers to exchange data with each other. As the computers which were attached to the network themselves became smaller, more numerous and more widely dispersed, network management techniques were extended into the server arena. Although SNMP is still used for the vast majority of network management applications, and remains the only globally-supported standard for server and application monitoring and management.

Basic Principles of ESM

The most basic challenge that any ESM solution aims to solve is monitoring the status of all components of the IT infrastructure and, ideally, the applications that run within in. Like any machine, computers need to be checked periodically to ensure that they are functioning optimally. Consider a scenario where a single systems administrator (SA) is available to monitor all the computers in a company’s IT infrastructure. It takes the SA five minutes to log on to each computer and perform a basic set of health checks. It is not unusual for large companies to have hundreds or even thousands of computers in their datacentres; if it takes 5 minutes to check each one of these, every 100 computers would require one SA simply to perform basic daily healthchecks for a full 8-hour working day. Clearly this is not efficient use of resources.

The solution to this problem is to install “agent” software on each computer and have this perform the healthchecks instead, freeing up the SA’s time to perform more useful activities. A further advantage is that the agent can be configured to carry out healthchecks at frequent regular intervals; in practice, checking computer once a day is simply not sufficient – in cases where business critical functions are supported, computers need to be checked at intervals of five minutes or less. Although each agent software license may cost a few hundred pounds, this will quickly be recouped in time savings for SA’s time which may cost anything up to £500 per day.

If an agent detects a problem, it can be configured to generate an alert message in order to let the SA know that there is something he or she needs to take a look at and fix. This is known as reactive alerting. Ideally, if an agent could detect that a problem was about to happen and send an alert to the SA, allowing preventative measures to be taken to stop it happening, this would be a better state of affairs as there would be no interruption to service. This kind of proactive monitoring is typically what is aimed for by ESM system designers.

Another kind of action which can be taken by a monitoring agent when an actual or impending fault is detected is to initiate a corrective action to prevent or resolve the issue. This is only usually possible where the detected fault (actual or impending) is a recognised one for which a standard fix is available. In such a case, the agent may be configured to execute a command or script to apply the fix. A notification may still be sent to the relevant support staff, however in this case it informs them that a known problem had occurred and has been automatically resolved – no further action is required, although it may be desirable to investigate further to prevent the problem from re-occurring.

The scenarios described above can be though of as an evolution from basic to more sophisticated monitoring functionality: -

Monitoring Maturity Timeline

ESM Architectures

ESM systems need their own architecture to function effectively. As well as agents which actually do the monitoring, some kind of management server is required to receive and process alerts sent by the agents, with a console to allow alert viewing and agent configuration. A simple generic ESM architecture can be depicted as follows: -

Generic ESM Architecture

ESM is now generally accepted to be an essential element of any well-designed and managed distributed IT infrastructure. Without an ESM capability, it is virtually impossible to ensure that everything will function properly without support personnel constantly checking all computers and applications. In many larger organisations, ESM is the responsibility of dedicated teams.

ESM Vendors

As distributed IT infrastructures have become larger, more powerful and more complex, the requirement to monitor and control them has increased proportionally. A small number of major ESM “framework” vendors (HP, IBM, CA and BMC) continue to dominate, but an increasing number of smaller players have entered the marketplace, aiming to fill in the gaps not well covered by the framework vendors, concentrate on one specific area and do that better than vendors offering a wider overall capability, or simply to provide equivalent functionality at much lower cost.

Agentless Monitoring

An alternative to deploying agents to every monitored system is to use an agentless approach, with checks on status and health being performed remotely from other computers dedicated to this task. Typically, agentless monitoring is carried out using generic means of accessing the monitored computers provided by the manufacturers. The advantage of the agentless approach is that no additional software has to be deployed to the monitored systems; typically, a user account and password are created for the monitoring systems to periodically log on to the monitored system and perform the desired checks; any detected anomalies result in an alert being raised and sent to the relevant support personnel in the same manner as for agent-based monitoring described above.

The main disadvantages of agentless monitoring are that it can result in higher levels of network traffic between monitored systems and the management server, generally making it less suitable for use over slow-speed network links. Additionally, if contact is lost between the management server and the monitored system, no monitoring can be performed (agent based monitoring solutions invariably support alert buffering at the agent with store and forward capability allowing alerts to be sent to the management server once contact is re-established).

The main advantage of agentless monitoring is that it is generally easier to deploy and maintain as no additional software needs to be deployed to monitored systems and all monitoring configurations are held on the management server. Agentless monitoring also lends itself well to situations when it is desirable to test availability of a service remotely (e.g. a website or email relay server) as this is reflective of how the service is accessed by users.

Network Monitoring

As mentioned earlier, the technologies and techniques used in ESM originated in the Network arena, and continue to be used in pretty much the same way up to the present day. As well as monitoring network devices (hubs, switches, routers, remote access servers and so on) for faults and performance statistics, network monitoring solutions also map the network topology and connectivity (logical & physical) between devices. This connectivity information is vitally important detecting, troubleshooting and resolving network problems. Perhaps the most widely-used application for network device, topology and connectivity monitoring is HP OpenView Network Node Manager (NNM).

A further key function is to manage the configurations of network devices. Network device configurations take the form of instruction sets stored on the hard disk or in flash memory on the device, and can be manually amended via a remote login, or uploaded via FTP (file transfer protocol). In any large network, it is not feasible to manually maintain configurations on all devices individually; it is preferable to store, maintain and update configurations on a central computer and distribute these out to devices on the network as required. Where configuration updates have to be distributed to a number of devices, it may be desirable to schedule these automatically and perform them overnight or at weekends to minimise disruption. If any problems with new configurations are encountered, the ability to “roll back” to a previous configuration which was known to work properly is also valuable.

CiscoWorks is a network device configurations manager used in many sites using Cisco network kit.

Hardware Element Managers

Although they may share common architectural principles or be based on generic standards, each computer hardware vendor’s products are generally sufficiently different to require a proprietary software tool to monitor hardware health and manage hardware configurations such as firmware revisions. Such tools are known as “element managers” and a comprehensive ESM solution will typically contain a number of different element managers from the manufacturers of each type of hardware device in use with the organisation’s IT infrastructure.

Because their functionality is limited to monitoring of specific proprietary hardware, element managers are typically integrated into other generic monitoring tools designed to accept alerts from multiple disparate sources via standards-based interfaces such as SNMP.

Examples of hardware element managers include HP Insight Manager (HP and Compaq server hardware) and Sun Management Centre (Sun Microsystems server hardware), Dell OpenManage. CiscoWorks (see “Network Monitoring”) is also an element manager.

Application Monitoring
Any IT infrastructure exists solely to support the activities of the organisation using it. Functionality is provided to users by software applications; whilst monitoring the availability and performance of underlying infrastructure will go some way toward ensuring that they function correctly, if applications themselves can be directly monitored then a more complete and accurate picture emerges.

Much of the activity in the ESM marketplace in recent years has been concentrated around application monitoring. Infrastructure monitoring is now a relatively mature activity dominated by a small number of established players with official or de facto standards in place. In contrast, application monitoring is a rapidly growing sector with hundreds of vendors providing a plethora of mainly proprietary solutions to satisfy demand created by organisations looking to improve the quality of application service delivery. The adoption of a “service orientated” IT delivery model (where services delivered to users are the focus, as opposed to how well individual elements of the IT infrastructure are working) is further fuelling the interest in application monitoring and encouraging further new entrants into this marketplace.

Although few if any standards have emerged in the area of application monitoring, some common principles and techniques are, at least, used by all solution vendors. Most if not all applications operate via transactions. A transaction is an activity or series of linked activities, which are performed using an application and which deliver a useful result. Examples of transactions include performing a search via a Web search engine, creating a new customer record in a CRM system, running an online management report, making a credit card payment via a secure website and so on.

As discussed earlier, modern IT infrastructures tend to be highly distributed with tens, hundreds or even thousands of computers interlinked via networks being used to deliver applications to users. It is the norm for the successful execution of application transactions to be reliant upon multiple infrastructure components and network connections, all of which must be functioning effectively. Applications may also be reliant upon other applications, which themselves have their own infrastructure and perhaps application dependencies. Take the example of a Web-based application provided by an airline which allows customers to view schedules and book seats. The infrastructure which supports the application is as follows: -

Generic Web Application Architecture

A typical transaction would be to request information on available flights and prices for a specified origin, destination and date as the first step in booking a flight. The sequence of operations is as follows: -

  1. User enters address for website in their Web browser
  2. User selects link to “book flights” form (page); form is loaded from Webserver into Web browser
  3. User enters “From” and “To” airport, date and whether it is a one way or return flight; click on “Search Flights”
  4. Information entered into form is sent to application server which in turn issues a “select” statement against the back end database to find all data matching the criteria entered into the “book flights” form
  5. Data matching the selection criteria are sent back to the application server by the database server
  6. Data are formatted for display in the “book flights” webpage by a script running on the application server
  7. Formatted Web page containing requested data is uploaded to the Web server
  8. User’s Web browser displays updated web page with available flights, times and prices

So, even a relatively simple everyday transaction is made up of a number of steps and reliant upon a number of different pieces of infrastructure and software, all intercommunicating over a network. Each of these components is monitored using the appropriate element manager (server hardware), agent & management server (server operating system, database, application and webserver processes), network monitoring tool (network devices, topology & connectivity) and agentless monitoring tool (flight booking website availability). To complete the monitoring picture, it is necessary to monitor the individual steps making up the overall transaction to ensure that: -

a) Each transaction step completes successfully

b) Each transaction step completes within the required time limit

c) The overall transaction completes successfully

d) The overall transaction completes within the required time limit

Typically, a number of approaches may be taken to monitor the application transaction. One way is to use a robot to periodically perform the transaction in order to simulate what is being done by real users. If this synthetically-generated transaction fails or takes too long, then it is reasonable to assume that the same problem would be experienced by a real user.

Another approach would be to install a piece of agent software onto the PC from where the user is accessing the application, and to monitor the stream of instructions being processed by the PC operating system as the transaction is executed. An agent could also be installed on the application server and work in a similar manner. The advantage of this approach is that all transactions being processed by the application server are visible, as opposed to the far lower number of transactions that would be executed on a single user’s PC. These approaches are known as client and server instrumentation.

Another approach again is to use a network probe to capture data being transmitted over the network and deduce what transactions are being performed, by whom, from which locations, how long they are taking and whether they are successful from the captured data.

In each case, the monitoring technology used needs to be configured to recognise the application transactions to be monitored. In the case of synthetic transaction generation, this is done by “recording” transactions executed by a real user; in the case of client or server instrumentation or network data capture, this can be done either by performing sample transactions on an otherwise quite server/network infrastructure, or, as is more common nowadays letting the technology capture data for an extended period so that it can “learn” which transactions are taking place and whet they do from the information contained within the captured data.

The choice of approach for application monitoring will depend on many factors such as infrastructure platforms used, application architecture, number of users, transaction volumes, size of network and others considerations, not least price. Once configured, the chosen monitoring tool will generally need to be integrated into the incumbent framework application (if present). Alternatively, most application monitoring vendors supply their own full-capable management server, administration and event browser consoles.

Optimising IT Service Delivery through Enterprise Systems Management

In this article, Feathered Owl describes the concepts and methodology we follow during an Enterprise Systems Management engagement. It's a long one, so make sure you're sitting comfortably...

What is ESM?

Enterprise Systems Management is concerned with control, monitoring and the management of IT infrastructure and applications in order to optimise IT service delivery.

It’s been around for a number of years and came into existence as a direct result of the almost universal adoption of distributed network computing and the new set of management challenges this created. ESM is essentially based on a marriage between remote monitoring and configuration techniques originally developed for distributed networks and control and management practices borrowed from mainframe and midrange computing environments.

A well-designed and properly-implemented ESM solution allows IT personnel to support and manage a larger, more complex and more geographically-dispersed IT infrastructure than would otherwise be the case. This is achieved primarily through automation of monitoring tasks which would otherwise require periodic manual checks of every system to ensure that the network devices, servers and applications used by an organisation were functioning properly.

Automated notification of detected faults allows personnel to rely on the ESM systems to tell them when there is a problem that requires their attention, enabling them to use their time more productively on project delivery and other “value add” activities. In some cases it is possible to delegate the resolution, as well as detection, of problems to the ESM systems, providing further opportunities for efficiency savings.

Performance monitoring, tuning and capacity management techniques – again borrowed from larger host-based computing environments – also fall under the “ESM” banner. Other activities such as job scheduling, software distribution, IT inventory management and data backup/restore may be included in a wider definition of ESM depending on the particular requirements, culture and structure of an organisation.

In recent years, ESM has evolved from a primarily technology-centric to an increasingly service-centric discipline, as organisations embrace IT Service Management (ITSM) in the drive to deliver continued competitive advantage through technology. The emergence of ITSM best practice frameworks such as ITIL has prompted a paradigm shift in which the “traditional” ESM disciplines of network, server and application monitoring, performance tuning and capacity management have become components of a more holistic, business process-aligned effort to maximise IT service quality, availability and continuity. It is becoming increasingly common to see ESM being used as an enabler for ITSM by providing better visibility of the availability and quality of IT services delivered via IT infrastructure and applications.

The principal benefits which organisations derive from ESM are as follows:-

  • Reduced IT headcount required to support a given size of IT infrastructure leading to salary savings and/or increased resource availability for project & development work
  • More efficient utilization of valuable technology assets allowing upgrade or expansion costs to be avoided or deferred
  • Increased availability and performance of technology infrastructure and applications meaning that users can work more efficiently and business is not lost to competitors when applications are down
  • Optimisation of IT service delivery leading to improved customer satisfaction and perception of IT as a business enabler rather than just a cost

As globalisation and the rise of the Internet dictate ever-increasing increasing reliance on technology to remain competitive, ESM is becoming an essential aspect of organisations’ IT activities and a large number of products from many vendors are now available to support implementation of solutions to support their requirements in this field.

How we can help

Many companies’ ESM environments consist of isolated solutions which may monitor the health and performance of servers, networks, websites and so on reasonably effectively, but fail to provide an overall view of the health of the IT services used by the business to go about their daily activities.

We can provide best practice insights into your existing IT processes, the data they employ and the applications supporting them. We help you produce an architecture, deployment plan and technology recommendations that supports your business needs from immediate tactical benefits towards a fully integrated ESM environment.

We work to ensure our customers reap the significant benefits which can be provided by ESM solutions if the right tools are selected and implemented properly in line with real requirements according to established IT best practice.

We deliver an ESM solution precisely tailored to each customer’s specific requirements, leveraging existing tools wherever possible or based on the most suitable products selected from the ESM marketplace – a marketplace which we know inside out.

Our service is aimed at:

  • Companies undertaking IT Service Management (ITIL) programmes
  • Companies with existing investments in ESM tools considering replacing them or otherwise unhappy with them
  • Companies with an existing ESM capability that is primarily technology/silo focussed and who want to develop “up the stack” into application and service management – frequently this will be as part of a wider IT Service Management initiative (see above)
  • Growing companies moving (or wanting to move from) from “cottage industry” to “industrial strength” IT
  • Companies with application performance issues
  • Companies with large complex IT estates that are difficult to support with the available staff
  • Companies wishing to downsize IT support departments
  • Companies who have recently merged with/taken over other companies

Our Approach

It is rare these days to find an organisation which has no ESM capability whatsoever and, even the smallest company may have some basic monitoring or performance management systems already in place. However, irrespective of whether you have nothing, or are just starting out or have already invested extensively in ESM solutions don’t worry – the service offering is aimed at all companies regardless of their current level of ESM capability.

The steps in our approach are as follows: -

Current State Analysis

We begin with an analysis of the client’s business activities, organisational structure and IT systems. We look in detail at existing ESM activities undertaken by various teams within the organisation, including identification of any existing processes whether formal or de-facto, documented or undocumented.

The purpose of the analysis is to allow us to generate the ‘ESM Big Picture’ which is fully characterises the client’s current ESM setup. It shows business processes, applications associated with these and the infrastructure elements that support them.

The main purpose of the ‘big picture’ is to show the client how they are currently set up and forms the basis for mapping the changes to the current working practices which may be required in order to assist the company migrate to a new ESM strategy. The company can see the amount of effort it requires to undertake, the underlying benefits if it does and the tools they need to fully manage applications and their underlying infrastructure from a business perspective, increasing performance and availability while helping to reduce costs.

Process Redesign

Having established the current ESM capability a company has we then focus on redesigning processes tailored to meeting their business objective and creating an ESM solution. When designing the new way of working we leverage best practice frameworks such as ITIL and FCAPS and, most importantly, draw upon our consultants’ extensive experience of designing, implementing and supporting ESM systems in some of the largest companies in the world. The current level of capability has a bearing on what we would do; the following commonly-encountered scenarios will serve to illustrate this point: -

No ESM Capability

These companies are usually:

  • New start-ups building technology infrastructure from scratch or an established company with few if any implemented management tools
  • Organisations undergoing complete technology refresh across some or all of their IT infrastructure and/or application stack

In this case we would:

  • Define monitoring requirements for infrastructure and applications
  • Align requirements with wider IT processes e.g. ITSM, application development lifecycle, operational handover (this stuff is in itself another big opportunity we should look at in more detail sometime)
  • Define ESM service in terms of activities, roles, responsibilities & processes, including integration with and amendment of existing processes if any
  • Design the new ESM functional architecture.

Immature and Fragmented ESM

In this instance ESM systems tend to have grown up in a fragmented fashion over a period of time. Different teams and/or silos have implemented vendor-supplied or home grown solutions to satisfy their own requirements. Functionality is not always consistent across silos/departments and there may be little or no integration between the various tools and/or correlation of the information generated by them. There may also not be a clear ownership of or accountability for ESM and processes around ESM are inconsistent or absent altogether.

To address these issues we would do some or all of the following: -

  • Work with client to define “target mode of operation” for ESM and define any replacement or new processes, roles & responsibilities to deliver this. We would also include linkages or enhancements to other IT processes e.g. Change Management, Release Management, Operational Acceptance, Application Development Lifecycle, Datacentre Installation etc.
  • Identify unmonitored technology platforms and applications and define requirements for these
  • Define consistent monitoring requirements for platforms which are already monitored but inconsistently
  • Where fit-for-purpose monitoring exists but is undocumented, “reverse engineer” monitoring requirements documentation
  • Identify tools within exiting portfolio which are fit-for-purpose, agree adoption as standard with relevant stakeholders; may require ask to market (RFI/ITT etc.)
  • For functional gaps, identify suitable new tools or extensions to existing tools to fill these
  • Design overall functional and technical architecture for ESM and identify required product integrations.

Too Many ESM Tools/Vendors

In this instance companies face an inflated cost of ownership due to; requirements to maintain diverse range of skills within organisation, higher overall support contract costs with numerous individual vendors, increased complexity and product integration overhead within ESM architecture.

This results in a higher unit cost for new product acquisition due to reduced supplier leverage as well as inconsistent monitoring functionality across platforms supported by groups using different tools. Incident troubleshooting may be more difficult due to different tools across service delivery infrastructure.

To address these issues we would do some or all of the following: -

  • Define and/or reverse engineer functional requirements for monitoring
  • Determine which tools from their portfolio can satisfy which requirements
  • Invite proposals from vendors to extend coverage of their tools and buy out other vendors products
  • Design new overall technical architecture for ESM
  • Present and agree proposed product selection(s) and architecture with stakeholders to get their buy-in.

Mature Component-Based ESM, Needs to go to Next Level

Such organisations have probably been doing ESM for a while and have good coverage at the component level (network, servers, databases, webservers, storage etc) but lack the visibility of end-to-end application and service availability and performance.

There is likely to be a dedicated ESM organisation working to defined processes and agreed service levels; alternatively ESM accountability and activities may be distributed across a number of teams (e.g. server support, DBAs, Network ops).

Typically, organisations in this position need to extend their ESM capabilities to provide monitoring and reporting for end to end applications and IT services delivered to the business. They will have evolved their overall IT operation to the point where they have started, or are contemplating adoption of IT Service Management in order to optimise IT service delivery.

To assist in this process, we would align with any ITSM programme being undertaken and identify the additional functional requirements associated with new processes being implemented within the IT organisation. If there is existing data and/or documentation of business processes and the end-to-end applications which support execution of these, these can be used to build up metadata defining the relationships between components, applications and services (this is one of the cornerstones of ITSM; ITIL prescribes the enshrining of these in a configuration management database or CMDB). Alternatively it may be necessary to undertake some business process analysis to help in the definition of these CMDB relationships.

What is then required is to select new and/or additional tools which will: -

  • Deliver the required CMDB capability to store the component-application-service relationships and appropriate processes or process enhancements/linkages to ensure that these are kept up to date (e.g. Change Management, Moves/adds/changes, decommissioning etc.) Ideally, CMDB relationships are protected and maintained through an over-arching Configuration Management process that is implemented as part of wider ITSM implementation
  • Present real-time and historical ESM data (typically alerts and performance data) within the context of e2e applications and IT services delivered to the business (e.g. server goes down which impacts three applications all using a database resident on it which in turn impacts three different lines of business – as opposed to “a server is down”)
  • Allow correlation between alerts from multiple sources (e.g. server monitoring, database monitoring, network monitoring, end-to-end (e2e) app transaction monitoring, external website availability monitoring) and determination of root cause (e.g. server has become inaccessible but only because the network link to it has gone down). Instead of generating multiple alerts, only root cause alert is escalated – symptomatic alerts are suppressed or presented in association with root cause)
  • Do the same as above, but retrospectively. This capability allows historical root cause analysis and identification of “systemic errors” within the IT environment – which is what ITIL call “problem management”
  • Allow a knowledge base of known alerts and resolutions to them to be built up and presented to operations staff via the ESM tools when alerts are detected. The endgame of this process is to automate fixes to known errors within the ESM tools themselves. Removing the need for human intervention altogether

There are many other possible scenarios and whatever you current ESM Capability we tailor our service to create a comprehensive solution to meet your specific business requirements.

Technology Selection

We are experts in translating strategic ESM business requirements into a technical specification. This is a document which defines the specific ESM technical requirements based on the solution we would have designed and the new working practices.

We document the requirements in some or all of the following depending on the scope of the engagement: -

Network Management

  • Automated monitoring of distributed IP networks
  • Fault, Configuration, Availability, Performance and Security management
  • Network performance management
  • Network inventory and configuration management.

Systems Management

  • Automated monitoring of distributed UNIX and Wintel computers; also midrange systems e.g. Vax, AS400
  • Server performance management
  • Monitoring of storage infrastructure (SAN, NAS, Backup/restore etc)
  • Database monitoring
  • Inventory management.

Application Monitoring

  • End-to-end monitoring of distributed (client-server, N-tier etc) and host-based (UNIX, VMS, mainframe etc) applications e.g. C++, Visual Basic, .NET, J2EE, SAP, Exchange
  • Application performance troubleshooting and optimisation.

Capacity Management

  • Server, network, SAN, database, webserver, Email etc. performance troubleshooting & optimization
  • Application capacity management
  • Optimization of application transaction performance
  • Application governance – ensuring applications perform in line with SLA when released and on an ongoing basis throughout their lifetime.

IT Service Management

  • Integration of Systems, Network & Application monitoring tools with IT Service Management dashboards & Configuration Management databases
  • ITIL Service Delivery (Service Level, Availability, Financial etc)
  • ITIL Service Support (Incident, Problem, Change etc)
  • ITIL Configuration Management.

The deliverable is a detailed design specification which includes the new ESM big picture alongside specific requirements describing the new processes the workflow and data flows.

We are supplier independent which allows us to be objective when selecting the appropriate technology and supplier to deliver the solution. However we know the latest tools and technologies which allow us to target only suppliers with the relevant technologies thus saving time and ‘searching’ for appropriate solutions. We have indepth knowledge of;

  • HP OpenView
  • BMC Patrol
  • Micromuse (IBM) Netcool
  • NetIQ
  • Microsoft Operations Manager
  • Mercury BTO
  • Managed Objects Formula
  • Others – we are constantly reviewing the latest in ESM technologies.

We usually go through a two stage approach filtering out in appropriate solutions and allowing 3 potential suppliers to show us how they would map your requirements to their technology and how it will deliver the benefits sought.

We help you put together a selection criteria around areas such as, the company and it’s background, it’s financial stability, the technologies they use (are the appropriate the latest etc.), where have they done similar projects before & are they referenceable, how well their solution meets your functional and technical requirements, What are the implementation costs, and the 5 year overall costs. We also review the SLA’s to make sure they provide the support your business needs? We can also help negotiate contracts to get you the best deal.

Technology Implementation

ESM projects are prone to failure because they are not managed correctly and this results in the supplier delivering s solution that ‘looks pretty’ but does not actually do what you wanted it to do.

No company buys an ESM solution for its own sake. Companies buy the benefits which improved technology can bring. There is no return to be had on investing in software while preserving the status quo.

As a result, the successful introduction of a solution is always accompanied by improvements to working practices and business processes. This introduces a number of challenges: -

  • The ESM solution is purchased and installed to match the improved ways of working
  • The introduction of this new technology and the changes in working practices consume management time, and normally the same individuals are key to both processes and to the day-to-day running of the company. This is the main reason why IT projects finish late and/or fail to deliver the benefits sought as time cannot always be dedicated to making sure the solution is delivered and works
  • While most suppliers will do a competent job of configuring a solution that contains a given set of technical capabilities, they do not possess the skills (and are not contractually obliged) to ensure that the solution is correctly used (or used at all!)
  • There are significant “transition” issues for which the supplier will not responsible. This includes ensuring that your staff are trained and ready to start new working practices on a given date, and finding a way to initialise the system without stopping business operations.

These are the things that a company must do internally to be ready for the new system and hence there are a number of tasks which need to be undertaken, on top of those contracted to the supplier. We help you setup an appropriate internal project team and facilitate the delivery of the new solution and its benefits.

1. Project Governance

The supplier is contracted to provide a configured solution. They will provide a project manager to ensure that their tasks remain on time and budget. However, the supplier is not responsible for organising your staff who will have a long list of tasks which they must complete (these include developing, testing, migration planning and training to the end users. Responsibility for ensuring that these tasks are scheduled correctly and completed on time is the company’s.

We put in place project governance and a project manager who will be responsible for managing the day to running of the project both externally, with the supplier responsible for assessing the risks inherent in any project. Actions can be taken to reduce and mitigate risk. However, many of these actions cost money, reduce the benefits accruing from the project or have other downsides. It is the project manager’s job to ensure that a sensible balance is maintained in the risk profile of the project.

We create the internal project plan detailing all of the tasks which must be carried out, who is responsible for each, what needs to be done, by when and in what format the output needs to be.

2. Requirements authority

The Supplier will expect a single contact-point in your project team who can give them rapid and clear answers to questions that arise during the construction of the solution. It is inevitable that issues and questions will arise when an idea is fully thought through, and this thinking will be happening right up to delivery of the solution.

We manage this interaction. Our technical consultant and project manager will closely monitor the supplier during the design and build stages and will be the main point of contact to address all process and/or technical questions from their project manager and their developers. This is a key stage in the project as if the design and build do not exactly replicate the ESM requirements, then the supplier will deliver a solution that meets only a few of your project objectives.

3. Test management

Once the supplier has built the solution they will send it over to you for final testing. You need to test it to make sure that it does ‘what it says on the tin’.

Hence it is very important that your project team fully test the solution before they rely on it. This is an onerous task – every feature in the solution needs to be tested with example data, including situations which may arise only rarely. Tests are required to ensure that some things don’t work.

We assist you in creating test scripts for all process and situation eventualities, we will assist in setting up the testing environment, we will facilitate the testing of the system by the internal project team. We will be able to identify whether a problem found is due to the business process or a bug in the solution. We will then be able to go back to the supplier and explain the problem and get them to fix it.

4. Training

The supplier will provide training in the use of their solution. But this only trains people on the mechanics of how the solution is setup. The supplier has an incomplete understanding, at best, of the conventions of your new ESM strategy and how the solution works to attain that strategy. Training in the use of the solution needs to be given simultaneously with a description of changes in working practices. These changes are often politically sensitive and have to be presented carefully.

We help your project team put together a training plan. We would recommend that the key project members are fully trained on the system and train the other individuals who will use the system, this allows your company to share the knowledge of the system between many and be responsible for training others.

5. Go-live

There are always problems that arise on the days before and after go-live. Very often, tasks in the last few days in the project overrun and contingency plans have to be put in place to allow the system to go live without all the pieces of the jigsaw in place. There are always users who have forgotten their training or who find an unusual situation that wasn’t addressed. This phase often coincides with the supplier handing over the project from development to support teams, which can raise problems of its own. We will be on hand with the key project team to ensure the smooth transition to the new system and to deal with any problems that arise.

Post Implementation Review

Following implementation and handover of the ESM solution but before project closure we will conduct a post implementation review (PIR) to verify that the project has done what it set out to do, i.e. that all the objectives have been met and that each identified requirement has been satisfied. The PIR is the final activity in the quality management process which goes on throughout the project’s lifetime.

As many of the benefits of ESM solutions are realised over time after implementation, we will help customers define ESM key performance indicators (KPIs) and develop data capture and reporting mechanisms – often as part of the solution itself – that will allow the improvements in IT service availability and quality following project completion to be measured.

Final Thoughts

ESM is now seen as a business critical area which allows IT departments to better serve the other functional elements of a business.

Feathered Owl Technology has a thorough understanding of the ESM marketplace and has strong relationships with leading vendors. Our In-depth understanding of IT infrastructure, operations and support and real-world experience of IT Service Management and the use of ESM to support its adoption, allows us to create a comprehensive ESM strategy tailored to a company’s individual needs.

We do not ‘sell’ a technology and are therefore better positioned to take an objective view when selecting the most appropriate technology. We manage and execute a full ESM project lifecycle:

  • Current situation analysis (if applicable)
  • Requirements analysis
  • Project scoping, definition of objectives, benefits calculation and business case preparation
  • Solution design, product selection & assistance with procurement cycle (if desired)
  • Solution development, product customization, integration & testing
  • Operational process development and/or optimization
  • Solution deployment, training, documentation and handover
  • Post implementation review

Unlike other consulting firms who are only technology focussed, we are business focussed and look at creating working practices that have a direct impact on a company’s bottom line. Irrespective of a company’s size, business sector, we have many years’ experience of ESM and a comprehensive knowledge of the ESM marketplace – we really have “been there and done it”.

We offer a free, no obligation ESM “healthcheck” during which we will conduct an initial audit of any existing ESM tools and processes, carry out initial requirements analysis and suggest ways in which your organisation’s ESM capability can be enhanced to improve IT service quality and deliver lasting benefit to the business.

To find out how we can help you with your ESM needs or to arrange a healthcheck, contact us via the Feathered Owl Technology website.

Enterprise Systems Management Defined

Enterprise Systems Management (ESM) is concerned with control, monitoring and the management of IT infrastructure and applications in order to optimise IT service delivery.

It’s been around for a number of years and came into existence as a direct result of the almost universal adoption of distributed network computing and the new set of management challenges this created. ESM is essentially based on a marriage between remote monitoring and configuration techniques originally developed for distributed networks and control and management practices borrowed from mainframe and midrange computing environments.

A well-designed and properly-implemented ESM solution allows IT personnel to support and manage a larger, more complex and more geographically-dispersed IT infrastructure than would otherwise be the case. This is achieved primarily through automation of monitoring tasks which would otherwise require periodic manual checks of every system to ensure that the network devices, servers and applications used by an organisation were functioning properly.

Automated notification of detected faults allows personnel to rely on the ESM systems to tell them when there is a problem that requires their attention, enabling them to use their time more productively on project delivery and other “value add” activities. In some cases it is possible to delegate the resolution, as well as detection, of problems to the ESM systems, providing further opportunities for efficiency savings.

Performance monitoring, tuning and capacity management techniques – again borrowed from larger host-based computing environments – also fall under the “ESM” banner. Other activities such as job scheduling, software distribution, IT inventory management and data backup/restore may be included in a wider definition of ESM depending on the particular requirements, culture and structure of an organisation.

In recent years, ESM has evolved from a primarily technology-centric to an increasingly service-centric discipline, as organisations embrace IT Service Management (ITSM) in the drive to deliver continued competitive advantage through technology. The emergence of ITSM best practice frameworks such as ITIL has prompted a paradigm shift in which the “traditional” ESM disciplines of network, server and application monitoring, performance tuning and capacity management have become components of a more holistic, business process-aligned effort to maximise IT service quality, availability and continuity. It is becoming increasingly common to see ESM being used as an enabler for ITSM by providing better visibility of the availability and quality of IT services delivered via IT infrastructure and applications.

The principal benefits which organisations derive from ESM are as follows:

* Reduced IT headcount required to support a given size of IT infrastructure leading to salary savings and/or increased resource availability for project & development work
* More efficient utilization of valuable technology assets allowing upgrade or expansion costs to be avoided or deferred
* Increased availability and performance of technology infrastructure and applications meaning that users can work more efficiently and business is not lost to competitors when applications are down
* Optimisation of IT service delivery leading to improved customer satisfaction and perception of IT as a business enabler rather than just a cost

As globalisation and the rise of the Internet dictate ever-increasing increasing reliance on technology to remain competitive, ESM is becoming an essential aspect of organisations’ IT activities and a large number of products from many vendors are now available to support implementation of solutions to support their requirements in this field.

I.T. Service Management

Posted on 1 Comment

In this article, Feathered Owl gives the lowdown on IT Service Management....

“I.T. Service Management”; “Business Service Management”; “ITIL”. It would be very surprising indeed if you haven’t encountered at least one of these terms in the last 12 months, or received a call from a salesperson trying to sell you an “ITIL-compliant” solution of some description. Maybe you’ve been tasked with “implementing ITIL processes” or adopting a more “service-centric” way of working in your team or department. But what does it all mean?

In the Beginning there was Technology - lots of it.

Up until a few years ago, IT was about technology. Designing, building, supporting, enhancing and everything else was all about making sure that the servers, networks, databases, data storage or any of the many components which made up IT infrastructures worked as well as possible. Datacentres were organised by technology “silo” and each area had its own people specialising in that particular technology. Monitoring, measurement and reporting and perhaps even service levels were all focussed on making sure that, for example, critical network links never went above 80% utilisation. Or that there was always enough storage space on the fileservers to cope with the amount of data being saved by users.

As well as all that infrastructure stuff there were applications (not that some people in datacentres seemed to notice). Usually, these were the responsibility of a completely separate group within the IT department from Infrastructure and the focus was entirely on requirements analysis, development of elegant code and functional testing to ensure that the code did what the requirements said. Once that was done the apps would be thrown over the fence to the Datacentre guys and the next interesting development project would begin. Everybody was happy. Well, almost….

Don’t forget the Users

As IT environments grew ever more complex and more and more money disappeared into them, the people who used the applications which ran on all that expensive infrastructure began to ask why they never seemed to perform properly or do precisely what was required to support the activities of the business in question. Or why, when new applications were released, something important like training the users in the new application always seemed to get overlooked.

At the top of the pile, business management started to complain to IT management that the systems just weren’t delivering to the required level, no matter how many charts they were shown of servers performing well below maximum CPU utilisation. In fact, exactly how much value were they getting for all that cash they were stumping up to fund the annual IT spend?

So what was wrong?

The problem was that, although all the component parts of IT may have been working fine in isolation, overall they were failing to provide the right services to the business users. In fact, for the most part, nobody in IT really knew what the right services were in the first place. As a rule, IT users don’t care about servers, databases or storage; what interests them are activities like sales, order processing, accounts, despatch and so on and the applications which they log on to and use to perform these business activities. Basically, IT was missing the point – effective management of technology silos alone was never going to deliver the right IT services to the users.

What’s IT Service Management?

IT Service Management (ITSM) was born out of the frustration of user departments with the quality of service they were getting from IT. As a recognisable discipline it originated in the late 1980s as a result of the Office of Government Commerce (OGC, formerly the Central Computer and Telecommunications Agency) in the UK being asked to do something to help the British public sector get better value from its IT investments. The result of this investigation was the IT Infrastructure Library (ITIL), a set of vendor- and technology-independent best practices and process guidelines based on received wisdom within the IT industry at large and, importantly, what was observed to work well by the OGC in the organisations it studied.

Then there was ITIL

ITIL was released into an unsuspecting IT industry in the early 1990’s. For several years not much was heard of it; this was, after all, the decade when everyone was busy getting rid of their mainframes, midrange computers and terminals and replacing them with cheaper, easier to implement and more agile distributed computing infrastructures made up of PCs and minicomputers talking to each other over IP networks.

In a way then, ITIL appeared at exactly the right moment, it’s just that nobody realised it at the time. By now we are all familiar with the headaches of managing complex distributed IT infrastructures and the applications they run to support IT services. Had it been adopted by the industry at large from the outset, ITIL could have saved everyone a whole lot of bother.

ITIL is published by the OGC as a set of manuals which give detailed information on a number of important IT practices down to the level of checklists, tasks, procedures and even roles and responsibilities. The areas covered by ITIL, divided into Service Support and Service Delivery are summarised in the following section. Unless you’ve been living under a stone for the past few years you’ll recognise at least some of them and appreciate that they encompass most of the things that should probably have been thought about at the same time as the rush to distributed computing was under way, in order to keep it all manageable to at least some degree.

Service Support Disciplines

Service Desk

Provides a central interface and point of contact between users and IT, handling incidents reported by users and requests for new services and acting as the interface into other processes as required.

Incident Management

Provides a means of restoring normal operation as quickly as possible following a service impacting outage, if necessary by effecting a temporary fix or workaround based on previous experience.

Problem Management

Seeks to identify the underlying root cause of incidents and to implement permanent fixes or changes to remove these and so prevent re-occurrence of the same or similar incidents

Change Management

Manages the risk associated with changes to any part of the IT infrastructure to ensure that the desired outcome is achieved without adversely affecting the service in question or causing any unforeseen knock-on effects.

Release Management

Considers everything that needs to be done to ensure that a major release (such as a new application rollout) in to the IT infrastructure is successful, including support staff and user training, documentation, operational handover, testing etc.

Configuration Management

Seeks to manage the configuration and versions of all technology components, applications and other IT assets IT assets, providing a logical model of the IT infrastructure and the relationships between “configuration items”.

ITIL Service Delivery Disciplines

Service Level Management

Defines expected levels of IT service, documents these in service level agreements (SLAs) implements monitoring and reporting to measure achievement of these and seeks to “design in” the ability to meet SLAs from the outset of IT projects.

Availability Management

Does everything possible to ensure that IT services are available at the required times to the right people, including designing for resilience, monitoring and reporting service availability and process optimisation for availability.

Capacity Management

Performs continuous monitoring, analysis and optimisation for production IT services to ensure continued delivery in line with SLAs, supports predeployment performance testing and optimisation and assesses the impact of changes on service performance.

Financial Management for I.T. Services

Provides guidelines for effective IT financial management including recovery of costs through usage-based charging.

I.T. Service Continuity Management

Seeks to ensure the continuity of service through effective backup and recovery, DR/failover solutions and processes to ensure that continuity of IT service provision is maintained and services can be recovered in the event of a disaster in line with contingency plans for business recovery

So what’s in it for me?

Having read the above list you’re probably thinking that this all makes perfect sense and is just what every organisation should be doing in order to manage its IT resources and services effectively. In practice, however, it can be difficult to do all this stuff well and every organisation has different specific requirements depending on its technology, people, processes and culture.

Where ITIL scores is in the fact that it doesn’t dictate a standard way of doing things using specific tools. Rather, it recommends best practices that are pragmatic and can be tailored to the requirements of virtually any organisation, large or small, which makes use of IT to go about its business. You can even leave bits out if they’re not relevant to you. Fantastic! It’s this flexibility and pragmatism (or put another way, a common sense approach) which has seen ITIL adopted the world over as the “industry standard” framework for managing IT as a service as opposed to a set of technology platforms.

Simply put, ITIL works. If someone had packaged it up and sold it they’d have made a fortune by now. Luckily, ITIL is in the public domain; for a few hundred pounds an organisation can buy the OGC manuals and off they go. An active ITSM community with its own representative body, the IT Service Management Foundation, constantly shares information and promotes the wider application and further development of the best practice guidelines which make up ITIL, and there are now recognised professional qualifications for individuals who are involved in application of these in their own organisations or provision of ITIL consultancy or related services. In many cases, an ITIL qualification or accreditation of some kind is becoming a requirement rather than a nice to have when looking for that next position in IT, the clearest evidence of all that IT Service Management is here to stay (for a while at least).

SAN, NAS or Both?

Posted on 1 Comment

In another of our series of technical articles, Feathered Owl gives a few pointers to help you decide whether SAN or NAS is the right storage solution for you.

Over the past twenty years there has been a worldwide migration from host-based to distributed computing. This has had numerous effects, many of them unforeseen when organisations first began augmenting or replacing their mainframe and midrange systems with mini- and microcomputers. One such effect is the increasing move toward storage consolidation and the emergence of Storage as an I.T. discipline in its own right. Today’s I.T. managers have a huge range of vendors and technologies available to them in the Storage arena, and one key decision which must be reached is whether to use Storage Area Network (SAN), Network Attached Storage (NAS), or, indeed, both to satisfy an organisation’s storage consolidation requirements.

Why Consolidate Storage?

Once many organisations had installed dozens, hundreds or even thousands of small computers in their machine rooms, each with their own locally-attached storage, the following issues typically arose: -

  • Storage attached to one computer could not be readily accessed by users or applications on another computer
  • Management of storage resources across the distributed I.T. estate became more onerous
  • Overall utilisation of storage space was inefficient due to the cumulative levels of slack space inaccessible from systems
  • Data backup and recovery became a significant challenge

The Emergence of SAN and NAS

To address these problems, it is now common practice, in I.T. environments of any significant size, to treat data storage as a centralised resource and provide shared access to it via a network. The simplest form of this is the use of file servers to house personal or workgroup data. A file server can be any computer on a network whose storage has been rendered accessible to other computers via a file-sharing protocol such as NFS or CIFS (more on protocols in a while).

File servers became widespread when PC network operating systems such as Netware and Windows for Workgroups emerged; these originally came into being to support sharing of files and printers attached to desktop PCs in office environments. NAS devices evolved from fileservers, as manufacturers introduced dedicated file serving devices to reduce the cost and management overhead associated with multiple server operating systems whose only role in life was to make storage accessible on the network.

At the other end of the spectrum, a more heavyweight solution for consolidating and sharing storage for “enterprise” applications such as databases emerged – the SAN. Historically, a SAN was a means of linking storage devices to multiple computers via a dedicated Fibre Channel network separate from the main data network.

Some More Technical Detail

So a SAN is for big enterprise storage and NAS is for workgroup files and home drives? Yes and no, or maybe not really. We’ll talk about this in a bit. But first there’s some more technical stuff which is important to help us understand the differences between the two technologies. No self-respecting technology is complete without an acronym, so here’s a few explained for you in as simple a fashion as possible: -

SCSI

The Small Computer Systems Interface was ratified by ANSI in 1986 and quickly became an almost universal standard means for attaching storage to mini- and microcomputers via a parallel connection. Other standards which have evolved from this are SCSI-2, SCSI-3, iSCSI and Fibre Channel. As you may be aware, the Fibre Channel network protocol is very important in the world of Storage and SANs.

Fibre Channel

Fibre Channel came about as an alternative to SPI (the SCSI Parallel Interface) which got round the main limitations of parallel SCSI, which are: -

Parallel SCSI cable has length limitations due to crosstalk within copper cables and external interference

  • Parallel SCSI is limited to a maximum of 16 devices on a bus
  • It’s not practical to connect more than one computer to the same storage device

Fibre Channel is a serial protocol which uses fibre optic cable, allowing single cable runs of up 10 kilometres (Fibre Channel can also run over copper cable but that’s another story). As well as supporting longer distances, Fibre Channel supports (in theory) up to 16 million devices on the same bus, meaning that storage devices can be readily shared amongst multiple computers at the network level.

NFS

The Network File System, part of the TCP/IP protocol suite, was developed by Sun Microsystems and released to the public in 1984. Since then it has become the standard means of sharing filesystems in the UNIX world.

CIFS

The Common Internet File System is the commonest protocol used to share files in Windows environments, and is based on Netbios. Functionally, NFS and CIFS are analogous; they just tend to be found in UNIX and Windows environments respectively.

Needless to say it’s all vastly more complex than this, but, in essence, SANs are an evolution of SCSI and NAS is an evolution of NFS and CIFS.

SAN and NAS Defined

Returning now to the issue of distinguishing between SAN and NAS, rather than just considering size and physical architecture, we can use the following definitions: -

A SAN is storage shared at the device level via a serial SCSI protocol such as Fibre Channel

NAS is a computer or device dedicated to sharing files via NFS or CIFS

Technological Convergence in Storage

This definition of SAN and NAS works well when you start to consider that since their inceptions, the uses to which SANs and NAS are put and the hardware on which they are implemented have experienced significant convergence.

As speed, size and reliability have improved, NAS devices have begun to be used for “enterprise” applications such as databases, email and data archiving as opposed to just home drives and shared workgroup directories. SANs have got smaller and more manageable and it is now possible to buy a “SAN in a box” – basically a SAN appliance which contains both storage and SAN fabric to which computers can simply be connected via a Fibre Channel network card or HBA (host bus adapter).

Another recent development is the NAS gateway, which allows SAN storage to be presented via NFS or CIFS over an IP network instead of as a raw chunk or logical unit (LUN) of disk over Fibre Channel.

So what’s the Difference Again?

In summary, the key difference to bear in mind is that SAN is good for making consolidated storage available to multiple computers as raw devices, whilst NAS is good for making it available via network shares. Where an application requires or works best with raw device storage (as is typically the case with enterprise database software), you need a SAN. Where an application will happily access data via a network share, NAS will, more often than not, fit the bill.