(Reproduced with kind permision of Wrox Press: https://www.wrox.com)
Professional Windows DNA: Building Distributed Web Applications with VB, COM+, MSMQ, SOAP, and ASP
Why Do We Need DNA?
Windows DNA is a platform for building distributed applications using the Microsoft Windows operating system and related software products. That statement leaves a considerable amount of room for confusion. What is a "platform"? What are "related products"? Even the term distributed computing has been given various definitions.
It's worth the effort to sort out the confusion because multi-tier architectures like DNA are widely regarded as the future of large-scale development. As we will see in this chapter, such architectures offer significant benefits in the construction of mission-critical and enterprise-scale systems. This book will introduce and overview, just about everything you want to know about building modern, forward- looking applications for Windows. The Windows DNA architecture is a blueprint and set of guidelines for doing this.
Before we can begin, however, we need to examine a few concepts. When we talk about modern applications, we are almost always talking about network applications. It's rare to develop an application today that does not either make use of distributed resources or incorporate technologies developed for the Web. Even standalone desktop applications make use of techniques and technologies influenced by distributed programming, such as data source abstraction and component development.
In this chapter, we'll look at:
Before we embark on our voyage through Windows DNA, let's examine the nature and needs of a modern application. Building such applications is what Windows DNA is about.
Internet Application Design
In case you haven't noticed, most of the new software development taking place today centers around delivering functionality via the Internet. What does the typical Internet-based application look like today?
There really is no single answer to that question. There are so many types of business solutions being built using Internet technology that there are probably thousands of different types of Internet-based applications. However, there is one question we can probably answer: what are the basic characteristics of these applications?
Internet Application Requirements
In general, an Internet-based application will:
Now let's take a look at each of these characteristics in a little more detail.
Presenting a Unified View of Data From Multiple Data Sources
Presenting a unified view of data coming from multiple sources creates a number of problems. A lot of businesses have multiple types of data stores, ranging from mainframe-based VSAM applications, to SQL Server databases, to Oracle databases, to e-mail stores and directory services. There needs to be a way to "tie" all of this data together. So obviously we are going to need a robust data access mechanism, which allows us to access multiple types of data sources that might even reside on different platforms. In addition, we might need host, or mainframe, integration capability. Just getting data from a mainframe application to a web page is a huge technical feat.
Allowing a User to Update Data
If our application allows a user to purchase something, or initiate financial transactions, or update personal data, we're going to need transactional capability. By having transactional capability, we need to somehow make certain that either all parts of a piece of work, or transaction, complete successfully or none of the transaction is allowed to occur.
To make matters more complicated, we already know that the likelihood of having data in multiple sources is fairly high. The consequence is we will need the ability to define transactions that span multiple data sources, while still having full two-phase commit and rollback.
Full e-Commerce Capabilities
Providing full e-commerce capability is a must these days. If you're selling products over the Internet, you will need a framework that provides a shopping cart, and management tools to manage product catalogs, run promotions and sales, and present cross-selling opportunities to your users. Additionally, you will need this framework to be extensible so you can incorporate your own business logic, such as calculating taxes in foreign countries. It would also be really nice if this e-commerce framework used the same transactional capability described above when users commit their purchases.
You might get users to your web site the first time, but if your site is so slow that they have a bad user experience, they might never come back. So you're going to need to be able to architect a solution that solves your business problem, but is fast at the same time.
Being able to distribute requests among many machines is one way to help achieve a speedy solution. Other design characteristics, such as performing work asynchronously, can also help speed up things.
An example, albeit a crude one, might be an online purchasing application. When a user actually places an order, they probably don't need any kind of response other than "we received your order, here is your confirmation number". You could then place the order on a queue and process it later. The user doesn't know their order won't be processed until later, but they've got the result they wanted quickly.
The implication here is that you'll need a queuing or messaging mechanism at your disposal, so you can incorporate it into your application design.
Scalable to Thousands of Concurrent Users
Not only does your site need to be fast, it probably needs to support thousands of concurrent users. Again, load balancing across multiple machines will help solve this problem. Not only will load balancing help you handle more users, it will also improve your "uptime", and help ensure you are running 24x7x365 like all users expect these days.
Platform Capability Requirements
So now you have a better picture of some characteristics of applications being built today. There's obviously a lot of infrastructure needed to build these applications. What we can do now is take these characteristics, and from them create a list of the capabilities we're going to need from our platform. These include:
We'll see later in the book that Windows 2000 provides these, but that's jumping the gun a little. We should probably turn our attention first to identifying the things that drive us when designing applications.
Application Design Goals
For most Internet-based applications you can usually identify two sets of goals:
Obviously, these are traits we would like for any type of application. You're probably thinking to yourself that these are common sense – so what's the big deal?
Look at the demands being placed on application developers today, and the software they write. These goals are no longer just expected – they are demanded. This fast-paced, Internet economy has turned loose a whole new group of users. These users are not just people in the cubes down the hall – they are your customers!
These folks don't understand, nor do they care, how hard it is to write Internet software that integrates with all of the other systems in your company. They don't care that we work 18 hours a day to meet our deadlines. They don't care that some of us have invested millions of dollars in developing mainframe- based software using 25+-year-old technology, and we need to make our web servers talk to it! These users want to be able to do everything with a click of the mouse instead of talking to a person on the telephone. They want all of the functionality currently available via phone and paper to be available on the company web site, and they want it now!
Whew! The world of the software application developer has changed drastically over the last 5 years. Once the Internet burst onto the scene, and we started developing applications that were being used by people outside the walls of our companies, the pressure was turned up substantially. Couple all of those demands with the fact that these people want all of this available to them all day, every day, and it appears that we as software developers are in big trouble.
Let's take a moment to look at some of the goals in more detail:
Now that we've looked at our own common application design goals, we're going to examine some of the problems inherent in designing network applications, and then go on to see how architecture has evolved to try to solve these problems.
If you had to pick a single concept to represent Windows DNA, "network applications" would be the one you'd pick. Any single technology in Windows DNA can be applied to specific problems or features, but taken together, the tools that make up the DNA architecture are all about applications that live on, use, or are accessed by networks.
You could run a multi-user database (like SQL Server) or a directory service without a network, but what would be the point? Web-based applications effectively require a network.
Breaking applications into their functional pieces and deploying them across a network lets us make the best possible use of an organization's resources. Once you do this, however, you become reliant on the network for the full use of your applications. This implies requirements for reliability, scalability, and security, and soon you realize you need a well-planned architecture. In our case, that's Windows DNA. It's about adopting network applications as the future of general purpose computing, then developing an architecture that supports them.
The definition of network applications given above is rather vague. You probably know what is meant intuitively, but intuition doesn't go very far in programming. So let's look at the characteristic problems that network applications will need to deal with.
Network Application Characteristics
We've just said that network applications break their implementation into functional modules and rely on (or at least substantially benefit from) the presence of a network of computers. Some characteristics follow from that:
Let's look at these in more detail.
The first point is essentially a given one – if my applications work through a network, they must have some means of communication. However, as we'll see in a little while, communications can become a deep topic. There are issues of protocols and data formats that arise in network applications. Simply running a cable between two computers and configuring them for the network is the easy part. Life for the application programmer can get very interesting.
While you may not actually have to worry about the detailed implementation of network communications, you must be concerned with the issues that arise when applications span multiple computers.
You could deploy network applications in a single user manner. You might insist that every client be matched with an individual server, or you might have a multi-user server that forced clients to wait for service in series. This would simplify the programming task, but it would also negate many of the benefits of using networks in the first place. We want applications to dynamically access some bit of software on the network, obtain service, and go about the rest of their processing. Forcing them through the iron gates of single-use software would mean incurring all the overhead of distributed processing while also incurring the limitations of standalone software. You'd feel cheated, wouldn't you?
Even if you don't develop multi-user software, you rely on concurrent access to system services and network servers. I can write a web page with single-user client-side script, but I want the web server it accesses to be able to accommodate multiple users at the same time. Could you imagine a corporate database application denying a user service because one other user somewhere in the organization was already connected?
Some part of a network application, then, must handle the tough tasks of concurrent access. These include multithreading, concurrency, and integrity. Multithreading is what enables a single piece of software to have more than one task underway in a program at any given time. Concurrency is what keeps one task distinct from another. Most importantly, integrity concerns itself with how to maintain the integrity of the data or process, when different users want to modify the same bit of information.
State management is closely related to concurrency. Technically, this is another facet of concurrency, but we'll consider it as a characteristic in its own right because this is a topic that almost all programmers will encounter when writing network applications.
If an application is using other applications or components on the network, it must keep track of where it is in the process – the state of the process. Single-user, standalone applications find it easy to maintain state. The value of the variables is the state of your data, while the line of code that is currently executing defines where you are in the overall process.
Multiuser, distributed applications have it harder. Suppose I have an e-commerce web site whose implementation involves sending a message to another application and receiving a reply. I have to maintain a set of data for each user. When I send a message, I have to record where that particular user is in the overall process, together with the current state of that user's data. When a reply comes in, I have to be able to determine which user is affected by the reply and retrieve the data I saved.
If you've worked with the Session and Application objects in Active Server Pages, you've programmed state management information. You've told the ASP component to keep track of something you'll need again later. The more widely distributed you make your application, the more state information needs to be coordinated. It's best to try to minimize state information on remote servers using a stateless server model, which we'll see a little more about later in the book.
How long does it take to communicate with other components on the network? The time attributed solely to the network is the network latency of the application.
This would seem to be too small to worry about at first glance. How fast are electrons in a wire? It turns out that for practical purposes, the speed of electrons in copper wire is slightly less than the speed of light. A good rule of thumb is 200 meters per microsecond.
Surely that's good enough, you might say. In a standalone application, though, the time to access a function might be a fraction of a millisecond. Now measure the path through your network – seldom a straight line on a LAN – and multiply by two. Add the time imposed by routers or switches, and you find that networks have latency that is significant compared to the time to execute instructions within a component.
Latency is especially important for Internet applications. The distance alone is significant. A round trip across the United States should take, in theory, 50 milliseconds. But that's a direct hop. When I send a packet from Philadelphia to a particular server in San Francisco, I find that it takes almost three times as long to get there and back. My packet is bouncing around my service provider, then heading south to the major interconnect MAE East in Virginia, then making its way across country. Each router or switch takes its toll along the way.
This is an extreme case, but even crossing the office is more expensive than moving between addresses in a single computer's memory. Programmers and architects need to consider how they can minimize the number of times their systems need to call on a remote server if they want to maintain acceptable performance. Latency changes the way we design applications.
Encapsulation is a technique in which you hide – or encapsulate – the details of some implementation from some software using that implementation. Object oriented programming is a classic example of encapsulation. An application using an object has no idea how the object maintains its data or implements its methods. Structured programming, the traditional function-by-function method of building an application, may also practice encapsulation by hiding the implementation details of a subroutine from the main program. A programming Application Programming Interface (API) is an encapsulation.
In a standalone application, encapsulation was a good idea. It helped programmers develop and maintain software effectively and efficiently. Network applications have no choice but to practice rigorous encapsulation – different programming teams may write the components of a network application. One team may not have any influence over another, or even know who wrote the component.
If I were to write a shipping application that relied on information from the Federal Express tracking application on the Web, for example, I would have no choice but to use their application using their HTTP-based API, as I have no other access to the application. Certainly, I cannot call Federal Express and ask them to make some internal modifications for me. Network applications live and die by clean interfaces, behind which implementation details are encapsulated.
We're now going to take a possibly familiar trip down memory lane, tracing the history of applications from monoliths to component-based distributed applications. You may have seen it before, but it's still necessary to discuss this because it's central to DNA's concept.
Evolution of Network Applications
The earliest computer applications – and many of the applications still in use – were monolithic. All the logic and resources needed to accomplish an entire programming task were found in one program executing on a single computer. There was neither need nor provision for a network.
As computer science evolved in its capabilities, the desirability of the client-server model became evident. Clients would obtain critical services, either data or computing, from server software that usually resided on another computer. As networks approach ubiquity, though, the advantages and challenges of distributed computing emerge. The basic model for addressing the challenges is variously called the 3-tier or n-tier model. These models of computing did not spring out of a vacuum. Rather, each evolved from its predecessor as the challenges of the old model were solved, thereby uncovering new challenges.
Monolithic applications are a bit like rocks. Everything you need to make a rock is found inside the rock, indivisible from all other parts and invisible to an outside observer. Similarly, a monolithic application is what you get when you write a program that does not rely on outside resources and cannot access or offer services to other applications in a dynamic and co-operative manner. Clearly, even a simple application has some I/O – keyboard input, reading and writing disk files – but basically, these applications rely strictly on local resources. They read and write data locally, and all logical operations are embedded in a single executable program.
In many ways, this simplifies the life of an application programmer. The program runs on one operating system, so the entire program has access to the same set of services. It executes wholly on one computer, so connectivity isn't an issue. The program runs or it doesn't; if it runs out of resources, it declares a failure and exits (gracefully, one hopes).
Security is simple, as well. There is one user at a time, and his identity is either known to the application or is unimportant to the execution of the program.
Finally, the program can use native formats, as the data never leaves home. Not only are the data structures consistent and known throughout the application, but also the underlying system representations – the size and bit ordering of primitives like integers – are consistent, because the program is only concerned with a single platform.
The very strength of a monolithic application – its unity – becomes a weakness as the scope of the programming task increases. Everything must be implemented within the single application. There is little or no reuse of clever and well-tested utilities. The program cannot defer a challenging task to a more powerful computer. All the data it needs must be local. Monolithic applications are ideal for simple tasks, but they simply are not up to handling complicated or mission-critical jobs.
Somewhere along the way, theorists began to address the reuse problem. This was to be the strong point of object oriented programming. Functions and features would be implemented and tested once in well- encapsulated entities, then reused whenever the same problem came up.
Unfortunately, the level of reuse was at the level of source code, or, in the case of precompiled object libraries, at the link level. A body of code could be reused, but once added to an application, it was not visible to outside applications. Within the big, opaque block of the monolithic application, the pieces of the application became co-operative. From the outside, however, the application looked the same. From a distributed perspective, programs built from objects were still monolithic.
The Road to Client-Server
The relational database community made the advantages of the client-server model apparent. Any database is inherently a centralized resource. All users must have the same view of the data they are permitted to see, which implies that they are all reading from and writing to the same database. If multiple users have access to the same body of data, then the data must reside on a central server accessible via a network. Several advantages become apparent when moving to client-server:
The last point is especially important. While data integrity is the major concern for relational databases, any sort of business rule may be enforced. The rules are implemented and enforced by the entity managing the data, so it can guarantee that the rules are always enforced and data remains secure or business rules are always observed. Because there is a single, central implementation, there is no chance for application developers to implement different rules that could conflict, and which will certainly lead to data corruption over time.
In the illustration below, we see how application architecture evolved from monolithic applications. The client, represented by the cube on the left, is granted access to data hosted by a server. Access is subject to permissions in the server's access control list (ACL) for the user, and access is made through a set of API calls that collectively define the interface to the database. Access is not entirely encapsulated, however. The interface is specific to the vendor of the server application, and native data formats are used:
Now consider what changes result from moving from a monolithic model to a client-server model. The challenge of connectivity immediately arises (except, of course, where the client-server model is used for logical purposes only, and both client and server reside on the same machine).
It may not be obvious, but attention must be paid to differing data formats. It's not uncommon, for example, to host a large database on a Unix-based server, but access it from Windows clients. As a result, the interface libraries for clients and servers must provide for translation between different data representations. In practice, this is not so burdensome as might be imagined. The server software vendor typically writes both sides of the libraries. Even when a third party writes database drivers, they are writing to the server vendor's structures. Data translation, then, is restricted primarily to coping with platform differences.
Standards like SQL (Structured Query Language) muddy this distinction somewhat, as vendors may have internal structures that must be mapped to SQL structures and types. For our purposes, however, the abstraction between the standards layer and the vendor layer means that the application programmer is concerned with a single translation, from application-native formats to the server formats exposed at the interface.
Security and resource utilization are tightly bound to the number and identity of the clients. This is good for security, as very specific permissions may be granted for each user. The database can maintain very accurate audit trails of what user performed what transaction. For large numbers of users, of course, this means very long ACLs. This brings us to resource utilization. Each client actively using the server consumes resources: connections, session memory, transactions, etc. The database scales exactly as the client load scales. This is suitable for workgroup and department level applications, but not suitable for exposing a critical database to an entire enterprise. Some means of pooling resources must be devised.
From the standpoint of reuse, the client-server model makes only modest gains. Reuse is confined to a single service and the interface to that service. A client accessing more than one server must support as many interfaces (to include interface code and data formats) as there are servers. In practice, that means the client computer must have drivers installed and configured for each database it plans to access. The server is the smallest unit of reuse. If there is a smaller module that provides useful features, it is lost to networked clients.
The notion of separating clients from servers, though, provided programmers a great service. No longer were they tied to the resources of a single machine for the accomplishment of their tasks. Resources and implementations could be moved to the computer that best accomplished the task. In fact, as database applications grew in complexity, it became apparent that multiple classes of servers would be needed.
Relational databases implemented the ability to perform processing so that database administrators could implement data integrity rules. Triggers and stored procedures began to look like small programs in their own right. At some point, it became obvious that databases were implementing more processing than was strictly necessary for data integrity. They were implementing business rules: units of processing or algorithms that represent some concept of importance to the organization using the database. This might consist of how discounts are calculated, for example.
Because business rules are broadly applicable, it's desirable to implement them once, on a centrally managed server. Since they are not directly related to data integrity, however, it's not clear that they should be implemented on a relational database using data-oriented tools and languages. The n-tier architecture model resulted from this.
In this model, sometimes also known as the 3-tier model, clients remain focused on presenting information and receiving input from users. This is known as the presentation tier.
Data, meanwhile, is hosted on one or more data servers in the data tier. Only that processing required to access data and maintain its integrity gets implemented on this tier. This includes SQL query engines and transaction managers for commercial software, as well as triggers and stored procedures written by database administrators. Unlike the client-server model, however, these triggers and procedures are limited in scope to managing the integrity of the data residing on this tier.
Business rules are moved to the application logic tier, sometimes referred to as the business services or middle tier.
Stored procedures on the database are sometimes used to implement and to enforce business rules. While this can lead to performance gains, the approach does not go along with the purist concept of n-tier or DNA, where the data tier is kept strictly for data.
The term n-tier comes from the fact that the application logic tier is often subdivided into further logical tiers dedicated to one task or another; the application tier is seldom homogeneous. Some programmers view the division as three tiers, while others view the different classes of application logic as individual tiers.
Dividing a task into three or more tiers brings the following benefits:
Separation of Presentation and Function
The separation of functional behavior – calculations and algorithms – from their visual presentation is important in two ways. First, it becomes easy to change visual presentation in isolation from tested functionality. You can change a display from a grid representation, for example, to a graph. You can support different classes of users with different views of the same data, providing each with a view appropriate to their needs. In fact some users will not need visual presentation – they might be other programs consuming the data.
Second, with the sort of separation we have been talking about, the client computer need only be powerful enough to perform the rendering tasks and make requests of the server. Perhaps you've seen an intranet application in which complex functionality was implemented on the server, allowing users to have rich visual access anywhere they have access to a web browser and the hosting server. In fact, this scenario is enjoying surging popularity right now, and Windows DNA supports it quite well.
It's not inconceivable that a PDA (personal data assistant) could be a thin client (where minimal processing is performed on the client itself), even while a very powerful workstation (a fat or rich client) connects to the same application logic layer to offer its user a different, richer view of the data. Regardless of the presentation, the code on the server that implements the application logic remains the same.
Any programmer who has profiled and optimized an application has been surprised by the performance bottlenecks found. As each problem is resolved, new issues are uncovered. So it is with a distributed system. Each subsystem has its own unique challenges, each with an optimal solution.
These subsystems exist in monolithic and client-server applications, but there they are bound up with each other. With an n-tier architecture, it becomes possible to isolate each and make appropriate adjustments.
In the simplistic case, you can throw expensive hardware at the servers. The data tier needs high- powered servers with redundant disks. Redundant disks are important for high availability, but are quite expensive. There are ways to write application logic tier components, as we shall see elsewhere in this book, so that that tier, while it needs powerful machines, does not need to have redundant disks. Because the tiers are distinct, the machines that support the application logic may be less expensive than the computers hosting the data services tier. The client can be an inexpensive computer.
At a more sophisticated level, software can be adjusted for the needs of each tier. Servers are tuned for background processes, having little or no interaction with a foreground user. The exact opposite condition exists on the client. Relational databases involve complicated mixes of disk and memory performance, while the application logic tier operates largely in memory and prefers CPU and network I/O performance. If these functions were housed on the same machine, some compromise would have to be reached. Overall performance, in consequence, would be less than the ideal case for each subsystem.
A monolithic application has few opportunities for parallel development. While multiple teams can be spun off for different parts of the application, there are so many dependencies that each progresses at the pace of the slowest.
Client-server improves the process somewhat. The client team can work, at least initially, in isolation from the server team, using a very limited stub application in lieu of the server. Unfortunately, the server possesses all the interesting logic. Recall that the functions of the application logic tier are generally implemented as stored procedures and triggers in a client-server application. The client quickly reaches an impasse, awaiting interesting results the stub cannot provide. The server software, in turn, will not be fully challenged until it is tested against the sort of interesting requests and conditions that arise only when multiple, fully-featured clients begin accessing it.
The situation is even worse if application logic is split between the client and the server. In such a case, each team can make limited progress without the other, and development resembles monolithic application programming.
Things are a bit better in n-tier archtiectures. There is still the need for stubs at each level, but now the stubs are selectively removed as each tier makes progress. It is a more finely tuned sort of development. If the application tier progresses ahead of the data tier, clients can still resolve more difficult issues as they are going against the live component. The underlying data is still stub data, so the client cannot be fully tested, but the client team can work on more challenging issues than if they had stub logic and stub data.
In fact, as the world moves to n-tier architecture as the model for large and robust systems, development will be continuous and some programming teams will have no knowledge of one another. If the tiers communicate through open protocols like HTTP and open formats like XML, new client applications can be developed for existing servers without access to the server software programming team. Similarly, servers may be upgraded or data migrated without upsetting clients. The key is clean interfaces between the tiers.
Somewhat related to this is functional reuse. Because the key pieces of an application are broken out and integrated using clean interfaces, they are ready for reuse in other applications having need of the same functionality.
Object oriented programming made a bid for software reuse, but it required source code level reuse. With the source code available, it was too easy to modify software when the interfaces were insufficient. Since the software could be easily changed, less emphasis was placed on interface design.
Component software, such as the VBX controls of the early nineties and COM controls and components of today, advance the cause of reuse. Since they are reused at run-time, and the source is seldom available to the using program, greater effort must go into interface design. Interfaces must be clean and broadly useful. Designers must consider how another program will use these components.
Still, when these components are deployed on the same computer as the client application, which is the most typical case, the problems of optimizing a monolithic application recur. For the same reason, anyone who wants to reuse the functionality must acquire the software and host it themselves. This includes configuring and maintaining the component. When server code is deployed on multiple tiers, it can be deployed once and managed by an organization that understands the proper configuration and tuning of the software. More importantly, it is controlled by the organization that is responsible for the business function that the software represents. Since they control the software, clients can be assured they will keep it current.
Windows is no longer a single platform. It spans a range from Windows CE on palmtop devices, through consumer and small business desktops running Windows 98, up to highly critical systems running Windows 2000. In fact, Windows 2000 itself is available in four variations (Professional and three grades of Server) depending on your needs.
As much as we would like you to use Microsoft Windows for your platform (and we will make a very strong case for it in this book), we have to concede that there is life apart from Windows. This diversity isn't some horrible accident caused by programmers' egos and the marketplace. The fact is that different platforms serve different needs. Different hardware is required to service different tasks. Choices and design tradeoffs are made throughout a job as difficult as bringing an operating system to market, and a wise software architect will select the platform that offers functionality closest to the requirements they have. Occasionally, the choice is as simple as sticking with a platform you know best – productivity matters.
You may find that different layers of your application work best on different platforms. Monolithic applications had no choice – everything ran on the same platform. Client-server applications brought programmers their first choice of platform freedom, and it became quite common to use different platforms for the client and server. Still, some compromises had to be made.
n-Tier architecture lets you divide an application into pieces as small as you desire (within some practical limits, of course). You are free to use multiple platforms for each tier, and even multiple platforms within each tier. This exacerbates the problems of application integration, so it's important that your architecture provides for integration solutions.
As we shall see, Windows DNA takes a careful approach. There is ample provision for integration within the DNA infrastructure, as well as robust support for open protocols such as HTTP. If you stay with the Windows platform, you can take advantage of the tight integration enabled by the Windows infrastructure. If your requirements take you to a mixed environment, you can integrate the tiers using open protocols and data formats. You have more choice with Windows DNA.
DNA Design Objectives
Now that we have seen the benefits of a traditional architecture, we can look at the various other issues that DNA was designed to overcome. We'll see how the current DNA architecture succeeds in these areas in the next chapter.
Windows DNA had five major design objectives. These are common themes that run through the architecture, guiding the design decisions at each step. Without these, the architecture would be incoherent, and would not address the challenges of network applications. These design objectives are:
We'll discuss each of these in turn.
Autonomy is the extension of encapsulation to include control of critical resources. When a program uses encapsulation, as with object oriented programming, each object protects the integrity of its data against intentional or accidental corruption by some other module. This is just common sense extended to programming: if something is important to you, you will take care of it. No one else will protect it as well as you will.
Unfortunately, client-server computing violates encapsulation when it comes to resources. A server, no matter how well written and robust, can only support a finite number of connections. System memory, threads, and other operating system objects limit it. Database programmers see this expressed in the number of concurrent connections an RDBMS (Relational Database Management System) can support.
Conservation of system resources was a sensitive topic in departmental level client-server applications. It becomes an essential issue as we build mission-critical systems at enterprise scale. The sheer multiplicty of components and applications in a distributed system puts pressure on any one server's resources. The dynamic nature of resource usage makes tracking and managing resources hard.
Extending encapsulation to resources should suggest that the server, which is critically interested in conserving its resources, is best positioned to manage those resources. It is also best suited to track their utilization. The server is the only entity that has a global view of the demands on it. Consequently, it will try to balance those demands. It must also manage secure access to its resources – access is a resource no less important than a physical entity like memory or database connections.
The addition of the application logic tier between presentation tier clients and data servers clouds the issue of resource management. The data server knows the demands presented to it by the components and servers on the application logic tier. It cannot know, however, what kind of load is coming into that tier from the clients on the presentation tier (or from other servers within the application logic tier, for that matter). Thus, application logic tier components must practice autonomy as well.
Just as a component is a client of data servers, it is also a server to presentation clients. It must pool and reuse critical resources it has obtained from other servers, as well as managing the resources that originate within it. A component managing access to a database, for example, will likely acquire a pool of connections based on its expected demand, then share them across clients that request its services.
Servers in any tier, then, practice autonomy in some form. They practice native autonomy on resources they originate, and they act as a sort of proxy for servers they encapsulate. In the latter case, they must not only share resources, but also ensure that access control is respected. In the case of a database, for example, the component acting as a proxy knows the identity (and by implication the access permissions) of the requesting client. Because the component acts as a proxy, using shared resources, the data server no longer has direct access to this information. Instead, the database typically establishes access based on roles. The proxy component is responsible for mapping a particular client's identity to a role within the system as it grants access to the resources it serves from its pool.
Computers are reliable, aren't they? Surely if we submit the same set of inputs to the same software we'll obtain the same results every time. This is certainly true from the vantage point of application programmers (hardware engineers might have a few quibbles). As soon as we open an application to the network, however, the challenge of maintaining the integrity of the overall system – reliability – requires our involvement.
If you have ever programmed a database application, you have encountered the classic bank account example. When an application transfers funds from one account to another, it must debit the losing account and credit the gaining account. A system failure between the two operations must not corrupt the integrity of the bank.
Client-server applications only had to concern themselves with the failure of the server or loss of a single connection to ensure the integrity of the application. A three-tier distributed system introduces many more points of failure. The challenges we enumerated earlier for network applications, especially connectivity, resource collection, and availability, offer the possibility of failures that are harder to unravel.
Relational databases offer transactions within their bounds; a network application using multiple data stores requires a distributed transactional capability. If the system experiences a loss of connectivity, the transaction service must detect this and rollback the transaction. Distributed transactions are difficult to implement, but are critically important to the success of a network application. Their importance is such that distributed transactions must be viewed as critical resources requiring the protection and management we discussed under the goal of autonomy.
This goal is concerned with the ability of the network application to perform its functions. Such an application contains sufficiently many resources prone to failure that some failure must be expected during the course of operation. Optimal availability, then, requires that the network application take the possibility of failure into account and provide redundancy, either in terms of extra hardware or duplicate software resources, or in the provision for gracefully dealing with failure.
A network is inherently redundant in that it has multiple computers on the network. To achieve high availablity, a network application must be designed with the known points of failure in mind, and must provide redundancy at each point. Sometimes this is a matter of hardware, such as RAID disk drives and failover clusters, and other times it's a matter of software, as in web server farms. Software detects the loss of one resource and redirects a request to an identical software resource.
In the monolithic world, if we had our computer, we had our application. Network applications, however, give the illusion of availability to the user whenever their machine is available. The actual state of the network application's resources, however, may be very different. It's the goal of availability to ensure that the resources of the network are deployed in such a way that adequate resources are always available, and no single failure causes the failure – or loss of availability – of the entire application.
Availability is the goal behind such buzzwords as "five nines", that is 99.999%. If you are going to the expense of fielding a network and writing a distributed application, you expect the application to be available. A monolithic application running on commodity PC hardware is scarcely capable of hosting mission-critical functions. Windows DNA aspires to host such functions on networks of commodity computers. Availability is a make or break point for Windows DNA.
It would be close to pointless to deploy a network for a single-user application. One of the points of network application architecture is to efficiently share resources across a network on behalf of all users. Consequently, we should expect our network applications to handle large volumes of requests. Each of, say, 100 users should have a reasonable experience with the server, not 1/100th of the experience and performance provided to a single user of the application.
Scalability measures the ability of a network application to accommodate increasing loads. Ideally, throughput – the amount of work that can be completed in a given period of time – scales linearly with the addition of available resources. That is, if I increase system resources five times (by adding processors or disks or what have you), I should expect to increase throughput five times.
In practice, the overhead of the network prevents us from realizing this ideal, but the scalability of the application should be as close to linear as possible. If the performance of an application drops off suddenly above a certain level of load, the application has a scalability problem. If I, say, double resources but get only a 10% increase in throughput, I have a bottleneck somewhere in the application.
The challenges of network applications and the responses we make to them – distributed transactions, for example – work against scalability. Architects of network applications must continually balance the overhead of distributed systems against scalability. A scalable architecture provides options for growing the scalability of an application without tearing it down and redesigning it. An n-tier architecture like DNA helps. If you encounter a bottleneck on any single machine, such that adding additional resources to that machine does not alleviate the bottleneck, you are able to off-load processing to other machines or move processing between tiers until the bottleneck is broken.
The challenge of platform integration arises from the fact that organizations will end up possessing dissimilar hardware and software platforms over time. In the past, organizations sought to fight this through standardizing on one platform or another. The sad reality of practical computing, however, is that standardization is nearly impossible to maintain over time.
Sometimes there are sound technical reasons for introducing heterogeneous platforms – a given platform may not be sufficiently available or scalable for a particular need, for example. Other times, the problem arises from a desire to protect the investment in outdated hardware. Sometimes, it's simply a matter of the human tendency to independence and diversity. Whatever the cause, interoperability – the goal of being able to access resources across dissimilar platforms and cooperate on a solution – is the answer. Any architecture that claims to be suitable for network applications must address the problems of differing system services and data formats that we described under the challenge of platform integration.
We've used this chapter to set the scene for why we need to consider a specific architecture like Windows DNA.
We saw what requirements we're likely to have of our applications and of our platform – they should present a unified view of data from multiple data sources, allow a user to update data, possibly provide full e-commerce capabilities, be fast, and be scalable to thousands of concurrent users. We also saw some of the problems inherent in network applications (communications, concurrency, state, latency, and encapsulation). We then went on a sideline track to see how network applications have evolved from monoliths to distributed component-based n-tier architectures. Finally, we looked at what DNA professes to achieve for your applications (autonomy, reliability, availability, scalability, and interoperability).
Although we've not seen exactly what DNA is, we have seen what it's supposed to do for us. In the next chapter, we'll take a look at what's in DNA, and see how these things can solve the problems we've looked at here.
Contribute to IDR:
To contribute an article to IDR, a click here.