SOA is a compelling technique for developing software applications that best align with business models. However, SOA increases the level of cooperation and coordination required between business and information technology (IT), as well as among IT departments and teams. This cooperation and coordination is provided by SOA governance, which covers the tasks and processes for specifying and managing how services and SOA applications are supported.
What is SOA Governance?
In general, governance means establishing and enforcing how a group agrees to work together. Specifically, there are two aspects to governance:
Establishing chains of responsibility, authority, and communication to empower people, determining who has the rights to make what decisions.
Establishing measurement, policy, and control mechanisms to enable people to carry out their roles and responsibilities.
Governance is distinct from management in the following ways:
Governance determines who has the authority and responsibility for making the decisions.
Management is the process of making and implementing the decisions.
To put it another way, governance says what should be done, while management makes sure it is getting done.
A more specific form of governance is IT governance, which does the following:
Establishes decision-making rights associated with IT.
Establishes mechanisms and policies used to measure and control the way IT decisions are made and carried out.
That is, IT governance is about who’s responsible for what in an IT department and how the department knows those responsibilities are being performed.
SOA adds the following unique aspects to governance:
Acts as an extension of IT governance that focuses on the lifecycle of services to ensure the business value of SOA.
Determines who should monitor, define, and authorize changes to existing services within an enterprise.
Governance becomes more important in SOA than in general IT. In SOA, service consumers and service providers run in different processes, are developed and managed by different departments, and require a lot of coordination to work together successfully. For SOA to succeed, multiple applications need to share common services, which means they need to coordinate on making those services common and reusable. These are governance issues, and they’re much more complex than in the days of monolithic applications or even in the days of reusable code and components.
As companies use SOA to better align IT with the business, companies can ideally use SOA governance to improve overall IT governance. Employing SOA governance is key if companies are to realize the benefits of SOA. For SOA to be successful, SOA business and technical governance is not optional, it is required.
SOA Governance in Practice
In practice, SOA governance guides the development of reusable services, establishing how services will be designed and developed and how those services will change over time. It establishes agreements between the providers of services and the consumers of those services, telling the consumers what they can expect and the providers what they’re obligated to provide.
SOA governance doesn’t design the services, but guides how the services will be designed. It helps answer many thorny questions related to SOA: What services are available? Who can use them? How reliable are they? How long will they be supported? Can you depend on them to not change? What if you want them to change, for example, to fix a bug? Or to add a new feature? What if two consumers want the same service to work differently? Just because you decide to expose a service, does that mean you are obligated to support it forever? If you decide to consume a service, can you be confident that it will not be shut down tomorrow?
SOA governance builds on existing IT governance techniques and practices. A key aspect of IT governance when using object-oriented technologies like Java 2 Platform, Enterprise Edition (J2EE) is code reuse. Code reuse also illustrates the difficulties of IT governance. Everyone thinks reusable assets are good, but they’re difficult to make work in practice: Who’s going to pay to develop them? Will development teams actually strive to reuse them? Can everyone really agree on a single set of behavior for a reusable asset, or will everyone have their own customized version which isn’t really being reused after all? SOA and services make these governance issues even more important and thus, their consequences even more significant.
Governance is more of a political problem than a technological or business one. Technology focuses on matching interfaces and invocation protocols. Business focuses on functionality for serving customers. Technology and business are focused on requirements. While governance gets involved in those aspects, it focuses more on ensuring that everyone is working together and that separate efforts are not contradicting each other. Governance does not determine what the results of decisions are, but what decisions must be made and who will make them.
The two parties, the consumers and the providers, have to agree on how they’re going to work together. Much of this understanding can be captured in a service-level agreement (SLA), measurable goals that a service provider agrees to meet and that a service consumer agrees to live with. This agreement is like a contract between the parties, and can, in fact, be a legal contract. At the very least, the SLA articulates what the provider must do and what the consumer can expect.
SOA governance is enacted by a center of excellence (COE), a board of knowledgeable SOA practitioners who establish and supervise policies to help ensure an enterprise’s success with SOA. The COE establishes policies for identification and development of services, establishment of SLAs, management of registries, and other efforts that provide effective governance. COE members then put those policies into practice, mentoring and assisting teams with developing services and composite applications.
Once the governance COE works out the policies, technology can be used to manage those policies. Technology doesn’t define an SLA, but it can be used to enforce and measure compliance. For example, technology can limit which consumers can invoke a service and when they can do so. It can warn a consumer that the service has been deprecated. It can measure the service’s availability and response time.
A good place for the technology to enforce governance policies is through a combination of an enterprise service bus (ESB) and a service registry. A service can be exposed so that only certain ESBs can invoke it. Then the ESB/registry combination can control the consumers’ access, monitor and meter usage, measure SLA compliance, and so on. This way, the services focus on providing the business functionality, and the ESB/registry focuses on aspects of governance.
Aspects of SOA Governance
SOA governance is not just a single set of practices, but many sets of practices coordinated together. The sections that follow provide a brief overview of the various aspects of SOA governance.
Service Definition
The most fundamental aspect of SOA governance is overseeing the creation of services. Services must be identified, their functionality described, their behavior scoped, and their interfaces designed. The governance COE may not perform these tasks, but it makes sure that the tasks are being performed. The COE coordinates the teams that are creating and requiring services, to make sure needs are being met and to avoid duplicate effort.
Often, it is not obvious what should be a service. The function should match a set of repeatable business tasks. The service’s boundaries should encapsulate a reusable, context-free capability. The interface should expose what the service does, but hide how the service is implemented and allow for the implementation to change or for alternative implementations. When services are designed from scratch, they can be designed to model the business; when they wrap existing function, it can be more difficult to create and implement a good business interface.
An interesting example of the potential difficulties in defining service boundaries is where to set transactional boundaries. A service usually runs in its own transaction, making sure that its functionality either works completely or is rolled back entirely. However, a service coordinator may want to invoke multiple services in a single transaction (ideally through a specified interaction like WS-AtomicTransactions). This task requires the service interface to expose its transaction support so that it can participate in the caller’s transaction. But such exposure requires trust in the caller and can be risky for the provider. For example, the provider may lock resources to perform the service, but if the caller never finishes the transaction (it fails to commit or roll back), the provider will have difficulty cleanly releasing the resource locks. As this scenario shows, the scope of a service and who has control is sometimes no easy decision.
Service Deployment Lifecycle
Services do not come into being instantaneously and then exist forever. Like any software, they need to be planned, designed, implemented, deployed, maintained, and ultimately, decommissioned. The application lifecycle can be public and affect many parts of an organization, but a service’s lifecycle can have even greater impact because multiple applications can depend on a single service.
The lifecycle of services becomes most evident when you consider the use of a registry. When should a new service be added to the registry? Are all services in a registry necessarily available and ready for use? Should a decommissioned service be removed from the registry?
While there is no one-size-fits-all lifecycle that is appropriate for all services and all organizations, a typical service development lifecycle has five main stages:
Plan
A new service that is identified and is being designed, but has not yet been implemented or is still being implemented.
Test
Once implemented, a service must be tested. Some testing may need to be performed in production systems, which use the service as though it were active.
Active
This is the stage for a service available for use and what we typically think of as a service. It’s a service, it’s available, it really runs and really works, and it hasn’t been decommissioned yet.
Deprecate
This stage describes a service which is still active, but won’t be for much longer. It is a warning for consumers to stop using the service.
Sunset
This is the final stage of a service, one that is no longer being provided. Registries may want to keep a record of services that were once active, but are no longer available. This stage is inevitable, and yet frequently is not planned for by providers or consumers.
Sunsetting effectively turns the service version off, and the sunset date should be planned and announced ahead of time. A service should be deprecated within a suitable amount of time before it is sunsetted, to programmatically warn consumers so that they can plan accordingly. The schedule for deprecation and sunsetting should be specified in the SLA.
One stage which may appear to be missing from this list is “maintenance.” Maintenance occurs while a service is in the active state; it can move the service back into test to reconfirm proper functionality, although this can be a problem for existing users depending on an active service provider.
Maintenance occurs in services much less than you might expect; maintenance of a service often involves not changing the existing service, but producing a new service version.
Service Versioning
No sooner than a service is made available, the users of those services start needing changes. Bugs need to be fixed, new functionality added, interfaces redesigned, and unneeded functionality removed. The service reflects the business, so as the business changes the service needs to change accordingly.
With existing users of the service, however, changes need to be made judiciously so as not to disrupt their successful operation. At the same time, the needs of existing users for stability cannot be allowed to impede the needs of users desiring additional functionality.
Service versioning meets these contradictory goals. It enables users satisfied with an existing service to continue using it unchanged, yet allows the service to evolve to meet the needs of users with new requirements. The current service interface and behavior is preserved as one version, while the newer service is introduced as another version. Version compatibility can enable a consumer expecting one version to invoke a different but compatible version.
While versioning helps solve these problems, it also introduces new ones, such as the need to migrate.
Service Migration
Even with service versioning, a consumer cannot depend on a service, or more specifically, a desired version of that service, to be available and supported forever. Eventually, the provider of a service is bound to stop providing it. Version compatibility can help delay this “day of reckoning” but won’t eliminate it. Versioning does not obsolete the service development lifecycle, but it enables the lifecycle to play out over successive generations.
When a consumer starts using a service, it is creating a dependency on that service, a dependency that has to be managed. A management technique is for planned, periodic migration to newer versions of the service. This approach also enables the consumer to take advantage of additional features added to the service.
However, even in enterprises with the best governance, service providers cannot depend on consumer migration alone. For a variety of reasons, for example legacy code, manpower, budget, priorities, some consumers may not migrate in a timely fashion. Does that mean the provider must support the service version forever? Can the provider simply disable the service version one day after everyone should have already migrated?
Neither of those extremes is desirable. A good compromise is a planned deprecation and sunsetting schedule for every service version, as described in “Service deployment lifecycle” on page 22.
Service Registries
How do service providers make their services available and known? How do service consumers locate the services they want to invoke? These are the responsibilities of a service registry. It acts as a listing of the services available and the addresses for invoking them.
The service registry also helps coordinate versions of a service. Consumers and providers can specify which version they need or have, and the registry then makes sure to only enumerate the providers of the version desired by the consumer. The registry can manage version compatibility, tracking compatibility between versions, and enumerating the providers of a consumer’s desired version or compatible versions. The registry can also support service states, like test and deprecated, and only make services with these states available to consumers that want them.
When a consumer starts using a service, a dependency on that service is created. While each consumer clearly knows which services it depends on, globally throughout an enterprise these dependencies can be difficult to detect, much less manage. Not only can a registry list services and providers, but it can also track dependencies between consumers and services. This tracking can help answer the age-old question: Who’s using this service? A registry aware of dependencies can then notify consumers of changes in providers, such as when a service becoming deprecated.
Service Message Model
In a service invocation, the consumer and provider must agree on the message formats. When separate development teams are designing the two parts, they can easily have difficultly finding agreement on common message formats. Multiply that by dozens of applications using a typical service and a typical application using dozens of services, and you can see how simply negotiating message formats can become a full-time task.
A common approach for avoiding message format chaos is to use a canonical data model. A canonical data model is a common set of data formats that is independent of any one application and shared by all applications. In this way, applications don’t have to agree on message formats, they can simply agree to use existing canonical data formats. A canonical data model addresses the format of the data in the message, so you still need agreement around the rest of the message format, for example header fields, what data the message payload contains, and how that data is arranged, but the canonical data model goes a long way toward reaching agreement.
A central governance board can act as a neutral party to develop a canonical data model. As part of surveying the applications and designing the services, it can also design common data formats to be used in the service invocations.
Service Monitoring
If a service provider stops working, how will you know? Do you wait until the applications that use those services stop working and the people that use them start complaining?
A composite application, one that combines multiple services, is only as reliable as the services it depends on. Since multiple composite applications can share a service, a single service failure can affect many applications. SLAs must be defined to describe the reliability and performance consumers can depend on.
Service providers must be monitored to ensure that they’re meeting their defined SLAs.
A related issue is problem determination. When a composite application stops working, why is that? It may be that the application head, the UI that the users interface with, has stopped running. But it can also be that the head is running fine, but some of the services it uses, or some of the services that those services use, are not running properly. Thus it’s important to monitor not just how each application is running, but also how each service (as a collection of providers) and individual providers are also running. Correlation of events between services in a single business transaction is critical.
Such monitoring can help detect and prevent problems before they occur. It can detect load imbalances and outages, providing warning before they become critical, and can even attempt to correct problems automatically. It can measure usage over time to help predict services that are becoming more popular so that they can run with increased capacity.
Service Ownership
When multiple composite applications use a service, who is responsible for that service? Is that person or organization responsible for all of them? One of them; if so, which one? Do others think they own the service? Welcome to the ambiguous world of service ownership.
Any shared resource is difficult to acquire and care for, whether it’s a neighborhood park, a reusable Java framework, or a service provider. Yet a needed pooled resource provides value beyond any participant’s cost: Think of a public road system.
Often an enterprise organizes its staff reporting structure and finances around business operations. To the extent that an SOA organizes the enterprise’s IT around those same operations, the department responsible for certain operations can also be responsible for the development and run time of the IT for those operations. That department owns those services. Yet the services and composite applications in an SOA often don’t follow an enterprise’s strict hierarchical reporting and financial structure, creating gaps and overlap in IT responsibilities.
A related issue is user roles. Because a focus of SOA is to align IT and business, and another focus is enterprise reuse, many different people in an organization have a say in what the services will be, how they will work, and how they’ll be used. These roles include business analyst, enterprise architect, software architect, software developer, and IT administrator. All of these roles have a stake in making sure the services serve the enterprise needs and work correctly.
An SOA should reflect its business. Usually this means changing the SOA to fit the business, but in cases like this, it may be necessary to change the business to match the SOA. When this is not possible, increased levels of cooperation are needed between multiple departments to share the burden of developing common services. This cooperation can be achieved by a cross-organizational standing committee that, in effect, owns the services and manages them.
Service Testing
The service deployment lifecycle includes the test stage, during which the team confirms that a service works properly before activating it. If a service provider is tested and shown to work correctly, does the consumer need to retest it as well? Are all providers of a service tested with the same rigor? If a service changes, does it need to be retested?
SOA increases the opportunity to test functionality in isolation and increases the expectation that it works as intended. However, SOA also introduces the opportunity to retest the same functionality repeatedly by each new consumer who doesn’t necessarily trust that the services it uses are consistently working properly. Meanwhile, because composite applications share services, a single buggy service can adversely affect a range of seemingly unrelated applications, magnifying the consequences of those programming mistakes.
To leverage the reuse benefits of SOA, service consumers and providers need to agree on an adequate level of testing of the providers and need to ensure that the testing is performed as agreed. Then a service consumer need only test its own functionality and its connections to the service, and can assume that the service works as advertised.
Service Security
Should anyone be allowed to invoke any service? Should a service with a range of users enable all users to access all data? Does the data exchanged between service consumers and providers need to be protected? Does a service need to be as secure as the needs of its most paranoid users or as those of its most lackadaisical users?
Security is a difficult but necessary proposition for any application. Functionality needs to be limited to authorized users and data needs to be protected from interception. By providing more access points to functionality (that is, services), SOA has the potential to greatly increase vulnerability in composite applications.
SOA creates services that are easily reusable, even by consumers who ought not to reuse them. Even among authorized users, not all users should have access to all data the service has access to. For example, a service for accessing bank accounts should only make a particular user’s accounts available, even though the code also has access to other accounts for other users. Some consumers of a service have greater needs than other consumers of the same service for data confidentiality, integrity, and nonrepudiation.
Service invocation technologies must be able to provide all of these security capabilities. Access to services has to be controlled and limited to authorized consumers. User identity must be propagated into services and used to authorize data access. Qualities of data protection have to be represented as policies within ranges. This enables consumers to express minimal levels of protection and maximum capabilities and to be matched with appropriate providers who may, in fact, include additional protections.
