Shubhrajit Chatterjee |
We have come a long way, since the early days, when computing was based on stand alone single systems with single or multi-users. The mantra today is of network computing, and businesses are rapidly converting their stand-alone systems to talk to each other and to develop new distributed systems, which are geographically separated, in order to do their business with reduced latency. The focus of this paper is to understand how scalable distributed systems can be built with ease.
Protocols are the communication language between parties involved in a transaction, so that they can understand each other. Since our mission is to allow different computer systems, with different architecture to talk to each other, we must exchange data in a certain format, which can be understood. The format of data exchange is a protocol.
According to OSI network model, the network protocol can be generally divided into 7 layers. These are
Ø Physical Layer
Ø Data Link Layer
Ø Network Layer
Ø Transport Layer
Ø Session Layer
Ø Presentation Layer
Ø Application Layer
However, the layers overlap with each other and this may be simplified to four layers, i.e. Data Link, Network, Transport and Application.
TCP stands for Transport Control Protocol and IP stands for Internet Protocol. These are the suit of protocols, which lie in the Transport Layer and Network Layer. This is the de facto protocol of the Internet as this is promoted by the open source initiative and is supported by all operating systems. Although there are proprietary protocols like IBM’s LU6.2, etc., TCP/IP is the widely accepted protocol and most modern distributed system uses this protocol. This is also important, since TCP/IP suite is the only protocol supported by the Java Virtual Machine. A detailed study of TCP/IP is in the realm of systems programming, and is outside the scope of this paper. However, it is worthwhile to know a few basic points about TCP/IP.
IP part of this protocol is at a lower level and primarily concerns identification of a node (computer system) in a network using a 8 digit hexadecimal Number called the IP address. For clarity it is written in the form of xx.xx.xx.xx and networks are classified on the basis of their IP address pattern. TCP is a stateful (i.e the connection is maintained between the nodes) protocol in which data packets are exchanged between different nodes of the network. TCP suite also contains a protocol which is similar to TCP but which is stateless. This is known as UDP (User Data gram Protocol)
HTTP (Hypertext Transfer Protocol) is the simple application level protocol, which is used to transfer data over the network between applications. HTTP protocol is deliberately designed to be simple, so that it can be applied by a wide range of systems.
HTTP protocol is designed to be stateless, i.e. two parties communicating through HTTP should not expect that the network state would be maintained. Since it is built over TCP, which is a stateful protocol, a HTTP client and server deliberately closed the TCP connection after a HTTP exchange is over. (HTTP Version 1.0)
Since it is expensive to establish a TCP connection between two nodes, HTTP 1.1 allows the network connection to persist, if both the applications taking part in the HTTP exchange. However, the applications cannot rely on the availability of the old connection, thus maintaining the stateless nature of the protocol.
HTTP is a request response protocol, i.e. the HTTP client requests a HTTP function to the server (along with data and/or resources if applicable). The valid HTTP response functions are GET, POST, HEAD, PUT, DELETE, OPTIONS and TRACE, though only GET and POST are generally used.
On receiving the request, the server sends back a response code to the client along with resources (if applicable). The HTTP exchange is also controlled by optional HTTP headers, which may be sent either by the server or the client. As of HTTP 1.1 there are 46 headers, out of them only one is mandatory.
HTTP response codes can be classified into 5 groups.
Ø 1xx indicates an informational message only
Ø 2xx indicates success of some kind
Ø 3xx redirects the client to another URL
Ø 4xx indicates an error on the client's part
Ø 5xx indicates an error on the server's part
Common HTTP response codes are 200 (OK), 404 (Resource not Found) etc…
SMTP (Simple Mail Transfer Protocol) as the name suggests, is a protocol to access a mail server (Also known as SMTP server). This is a conversational protocol between a client and a server, to send a mail to the server. The mail server then decides the destination of the mail from the mail header information. A mail may be routed through various SMTP servers before reaching the final destination.
FTP is an application level protocol built over TCP/IP, which is suitable to transfer files between the client and the server.
Telnet is an application level protocol built over TCP/IP to set up a remote console session with the server.
Most of the applications over the Internet that we know today are based on HTTP protocol. HTTP is popular because of its simplicity, but it lacks sophistication, because it is stateless; i.e. connection information is not maintained in a session. This is a major drawback while designing sophisticated business applications.
Most applications overcome this problem using complicated techniques. Java Web Applications for example store the session data in a indexed table. The key to this table is exchanged with the client to identify the session state.
Cookies
Cookies are small chunks of data exchanged by the client and the server. The server may request the client to save a cookie, which the client sends back to the server to help the server to identify a client. Cookies may be used in a variety of ways, to enhance the client application. Cookies, are however not mandatory, and a client may not implement cookies.
Session
When a client and the server establish a set of related HTTP exchange, it is known as a session. We have discussed earlier, that since HTTP is stateless, it is not possible to establish a session naturally over HTTP. Different server software use different techniques to maintain the session. Java software for example use two technique to maintain session. At the server side, the session information is stored in a memory table, and is indexed by a key called the session id. This session id is sent to the client either as a cookie, or, in case the client does not support cookies, as a part of the URL(uniform Resource Locator) that are sent to the client. This session id is sent to the client when the first transactional HTTP exchange takes place. During further HTTP exchanges, the server reads back the cookie or parses the request URL, to get back the key, which is then used to retrieve session data.
The Java 2 Platform, Enterprise Edition (J2EE) defines the standard for developing multi-tier enterprise applications. J2EE simplifies enterprise applications by basing them on standardized, modular components, by providing a complete set of services to those components, and by handling many details of application behavior automatically, without complex programming.
Goal: To develop mission – critical distributed applications, which will be:
Ø Reliable
Ø Secure
Ø Fast
Ø Scalable
Constraints: The development needs to be cost effective.
Solution: J2EE (Java 2 Enterprise Edition) is a component based framework for developing server side applications, which satisfies our goals, without violating the constraints.
J2EE is a specification and is currently in version 1.3. There are several components as a part of the J2EE architecture, which can work individually and with each other to establish an application, which spans physical locations.
Since J2EE is an evolving framework, many components, which were earlier part of J2EE, are now made a part of J2SE and many new components have been added. Some of the major J2EE components are noted below.
Java Servlet technology lets us define HTTP-specific Servlet classes. A Servlet class extends the capabilities of servers that host applications accessed by way of a request-response programming model. Although Servlets can respond to any type of request, they are commonly used to extend the applications hosted by servers.
Servlet specification is generic, but due to immense popularity of HTTP as a distributed protocol, application frameworks are widely available for HTTP Servlets.
Whenever a request is made to the application-server for a specific URI(universal Resource Indentifier), a Servlet that has been mapped to the URI is invoked. The application server passes control to the service method of the servlet as the entry point to the application specific component. The servlet then invokes business logic and sends back the response to the client. Servlets, therefore extend the functionality of an application server.
JSP pages technology combine snippets of Java programming language code with static markup in a text-based document. A JSP page is a text-based document that contains two types of text: static template data which can be expressed in any text-based format such as HTML, WML, and XML, and JSP elements that determine how the page constructs dynamic content.
JSPs are nothing but servlets, designed in a way, so as to enable web designers to put content in a web document. JSPs get translated to a servlet internally by the JSP engine in the application server. JSPs are powerful enough to contain business logic, but modern design paradigms state that it is better to use JSPs to design presentation logic only, the business logic being executed in a servlet/other J2EE components.
Based on this paradigm, JSP based applications are now divided into two types:
Model I JSP: Business logic is written in JSP. This model is only suited for small applications, specially when the presentation is also designed by developers. This may also be used to quick deploy Proof Of Concepts.
Model II JSP(Also known as MVC{Model-View-Controller} Type Application): In this paradigm JSP, Servlets and Java Bean/EJB are used to cater to each part of the application. In this model JSPs are used to design the presentation, Java Bean/EJB are used to model the business process and a central servlet is used to control application flow.
An enterprise bean is a body of code with fields and methods to implement modules of business logic. You can think of an enterprise bean as a building block that can be used alone or with other enterprise beans to execute business logic on the J2EE server.
There are three kinds of enterprise beans: session beans, entity beans, and message-driven beans as described in Business Components.
EJBs are business components, which are designed to leverage the strength of distributed computing, re-usability while keeping the nuances of distributed programming away from business programmers.
EJBs are hosted in a EJB server, which may not be the same server in which the rest of the web application is hosted. EJBs are designed to model business components, which can be integrated seamlessly with different applications (both web applications and stand alone Java Applications) EJBs can be invoked over different network protocols like RMI, IIOP, etc… but this is transparent to the application developer.
Session EJB
These are components, which typically model business rules. They can be stateless, i.e. all the EJB components are identical and do not contain any internal data regarding its state. An example of a Stateless EJB can be Operation, which may have a business method to transfer funds between two accounts.
Stateful session EJBs maintain a record of their internal state, i.e. the different EJBs are not identical. An example of a stateful EJB can be a ShoppingCart in a B2C application.
Entity EJB
These components model a entity in a persistent storage. Each entity EJB may hold a database record object. An example of an entity EJB can be Account which stores account specific information of a user.
Entity EJB can also be of two types, depending on the management of their persistence. If the EJB container is responsible to persist/synchronize the EJB with the data store, then it is a Container Managed Persistence (CMP) EJB. If this is taken care of by the bean itself, then it is called Bean Managed Persistence (BMP) EJB.
Message Driven EJB
These are EJBs, which act as a message consumer in a JMS (Java Messaging Service) system. When a message is placed in the JMS queue in which this EJB is interested, it reads the message and performs some business logic. JMS is discussed later in this document.
Message driven beans are a new addition to EJB version 2.0 over 1.1.
Often, in complicated applications, there may be a requirement for asynchronous processing, where the client may initiate a request and continue it’s processing asynchronously, without waiting for a response. Typical requirements may be the request for a huge report, which might take considerable time to execute. These kind of requirements are catered to by the JMS API.
The JMS API is a messaging standard that allows J2EE application components to create, send, receive, and read messages. It enables distributed communication that is loosely coupled, reliable, and asynchronous.
JMS messaging model is broadly divided into two types.
Message-Queue Model: In this model, JMS clients put message in a ‘queue’ (Message Repository), which the server program can read and perform necessary processing.
Publish-Subscribe Model: In this model a JMS
publisher puts a message in the repository. This is then sent to all interested
clients who have subscribed to this ‘topic’. (Message Repository)
The Java Transaction Service (JTS) API technology ensures interoperability with sophisticated transaction resources such as transactional application programs, resource managers, transaction processing monitors and transaction managers. Since these components are provided by different vendors, JTS provides open, standard access to these transaction resources.
The JTA API provides a standard demarcation interface for demarcating transactions. The J2EE architecture provides a default auto commit to handle transaction commits and roll backs. An auto commit means any other applications viewing data will see the updated data after each database read or write operation. However, if your application performs two separate database access operations that depend on each other, you will want to use the JTA API to demarcate where the entire transaction including both operations begins, rolls back, and commits.
The J2EE SDK implements the transaction manager with the Java Transaction Service.
The JDBC API is to invoke SQL commands from Java programming language methods.
The JDBC API has two parts: an application-level interface used by the application components to access a database, and a service provider interface to attach a JDBC driver to the J2EE platform.
Many Internet applications need to send email notifications so the J2EE platform includes the JavaMail API with a JavaMail service provider that application components can use to send Internet mail. The JavaMail API has two parts: an application-level interface used by the application components to send mail, and a service provider interface.
XML is a language for representing and describing text-based data so the data can be read and handled by any program or tool that uses XML APIs. Programs and tools can generate XML files that other programs and tools can read and handle.
For example, a J2EE application can use XML to produce reports, and different companies that receive the reports can handle the data in a way that best suits their needs. One company might put the XML data through a program to translate the XML to HTML so it can post the reports to the web, another company might put the XML data through a tool to create a marketing presentation, and yet another company might read the XML data into its J2EE application for processing.
The Connector API is used by J2EE tools vendors and system integrators to create resource adapters that support access to enterprise information systems that can be plugged into any J2EE product. A resource adapter is a software component that allows J2EE application components to access and interact with the underlying resource manager. Because a resource adapter is specific to its resource manager, there is typically a different resource adapter for each type of database or EIS.
The Java Authentication and Authorization Service (JAAS) provides a way for a J2EE application to authenticate and authorize a specific user or group of users to run it. JAAS is a Java programing language version of the standard Pluggable Authentication Module (PAM) framework that extends the Java 2 platform security architecture to support user-based authorization.Java Naming and Directory Interface API
Designed to standardize access to a variety of naming and directory services, the Java Naming and Directory Interface (JNDI) API provides a simple mechanism for J2EE components to look up other objects they require.