From Client/Server to Multi-Tier
This book is about a methodology for developing database applications that is commonly referred to as the "multi-tier" model, meaning that it is based on the principle that data access should be separated into multiple levels, or tiers. The minimum number of levels for a system to be considered to be multi-tier is three, and indeed most projects do consist of just three tiers, but there are also architectures where additional tiers add to the solution to yield four or more tiers.
To understand why multi-tier database access is the preferred solution for most scenarios these days, and has all but replaced its predecessors, we first need to take a step back and look at the principles that were applied to data access before.
A Step Back into the 20th century: Client/Server
Back in the 80’s and 90's (and before), there were basically two types of database applications being written: so-called "Desktop" and "Client/Server" applications.
The term desktop database application most commonly referred to single user applications that talked directly to a local database – often something like dBase, Paradox or FoxPro – installed on the user's computer. This was in the days before all-pervasive networks, and these applications were usually standalone programs that did not interact with the outside world or share data with other systems or users. Today, these kinds of applications might use SQLite or other “embedded” database types, or frameworks like CoreData (which itself is a wrapper around SQLite, but abstracts the actual database away and presents itself to the application as something that persists the content of objects – an Object Persistence Framework, or OPF).
On the higher-level side, there were Client/Server applications, which shared many characteristics with their desktop counterparts in how the application was structured, but accessed a "big" database system, usually running on a dedicated server or server farm on the company's network and shared by many users. These databases would be Oracle, Informix, Sybase, Microsoft SQL Server (which actually evolved from Sybase), Interbase, or the like. Client applications would communicate directly with the database, typically over the local area network, the LAN, or (as the Internet started becoming more popular) over VPN connections that simulated extensions of the LAN. Architecturally, Desktop and Client/Server applications really had more in common than they had differentiating them, with the only major difference being whether the back-end database was single-user and located on the same machine as the application or not. ￼
Both Desktop and Client/Server applications had in common that the applications communicated directly with the database engine (whether local or remote) using the proprietary protocol defined by the database vendor; both had direct and full access to the physical database or only very granularly controlled access (on table level, or read-only vs. read-write permissions) based on their login string and (as we'll see, the major weakness of the Client/Server paradigm) in both cases virtually all business logic was hard-coded into the application itself, with the database basically taking over its traditional role of a data storage – nothing more. Because the two models were so similar, at least in terms of what matters for the discussion at hand, for the remainder of this text we will use the term “Client/Server” to collectively to refer to all of these applications.
Disadvantages of the Client/Server Model
Client/Server applications suffered from several drawbacks inherent to their architecture: Because all business logic was implemented on the client application, the code enforcing this business logic was spread all across the network and duplicated on each workstation. Changes to business logic or business rules usually implied redeploying new client software to all users. A big administrative effort at best, and downtime for large parts of the workforce or user base at worst.
Another problem associated with having business rules enforced on the client tier was (and still is) that it made the system vulnerable to attacks, as client systems could be compromised. With client software deployed on hundreds of computers throughout the company or even outside the controlled network, it cannot be guaranteed that malicious users do not try and succeed at “hacking” the client software, or even write their own replacement software – directly accessing the database and bypassing all enforcement of business rules altogether.
While all databases use their own, proprietary wire protocol for communication with clients, those protocols are usually well-documented, and/or client libraries are provided by the database vendors to connect to databases and query data (after all, how else would client applications be written that could talk to the databases). In turn, this means that in addition to authorized clients, it is very easy for anyone to directly connect to the database and perform their own queries, be it with a standard database administration tool or a custom written client. All that stands between an attacker and full database access is usually a connection string, containing a database username and password. And since client applications had to connect to the database in question, this means they had to contain that connection string, making it very easy for anyone with access to the client application to reverse-engineer it or monitor the network to obtain this connection string.
Somewhat related to the above point is the fact that, again due to the very nature of Client/Server architecture, the back-end database needed to be opened up to be accessible across the network by all clients; if users were expected to connect to the database from the public internet, this meant opening the firewall so that the database could be accessed world-wide – by anyone. And while all modern database systems do provide sophisticated authentication mechanisms, opening a database to the network (let alone the world) is always a risky endeavor. Even leaving aside that hackers could easily obtain the login information stored in the client application, all commonly used database back-ends bring with them well-known attack surfaces that hackers can exploit to compromise the system. Over the course of the years, virtually every common database system has had exploits published, that, if not patched quickly, would have allowed any hacker full access to the database.
Another drawback of Client/Server applications when used on the Internet is that the network interface provided by most back-end database systems has been designed for access over the local network, using fast connections and no firewalls. Nowadays, many clients’ software needs to run outside the controlled network – be it from employee's home offices or from laptops on unsecured connections on the internet, in airport lounges or internet cafes. Even ignoring the security risks of opening up a database server to these scenarios outlined above, in many cases such connections will be unreliable or too inefficient for Client/Server database protocols to work well over them. Lastly, while most database access protocols are well documented (which, mind you, is not the same as them being documented well), some aren’t, and all protocols have in common that they are pretty complicated, prone to changes between versions and in general not very easy to implement yourself. As such, application developers virtually never implement access to a database system themselves (by means of opening a TCP socket and figuring out what bytes to send to make the database understand their requests). Rather, application developers reply on so-called database drivers, usually provided either by the database manufacturer or the development tool provider (or a mixture of both) to perform this task for them. Database drivers encapsulate the underlying wire protocol, often also abstracting differences between versions of the back-end database, and provide a higher-level API for developers to use to talk to the database. Oftentimes, development tools would provide even higher-level abstractions of those drivers (such as for example the ADO.NET classes in Microsoft’s .NET Framework).
These database drivers needed to be installed and configured on the client machines, before applications could use them to access the database (for example, client applications talking to an Oracle database would need the Oracle Client Interface (OCI) libraries installed on the computer).
This lead to two challenges: For one, installing and configuring these drivers led to complexity in the client installation process, and to possible conflicts if two applications needed different versions or different configurations of the client tools (to this day, it is tricky to install both Interbase and Firebird client libraries on the same system without the two interfering with each other, for example). But what’s more important: database drivers for your database of choice had to exist for your platform of choice, before you could even begin to worry about deploying them. For the longest time, it was next to impossible to talk to a Microsoft SQL Server from Mac OS X or Linux, for example, and virtually no database vendor ships native iOS drivers.
Client/Server vs. iOS
The above is a pretty comprehensive (but far from all-encompassing) list of why the Client/Server model was a really bad idea for desktop applications. But let’s look at the list again with an eye on creating applications for iOS, and we will see that what can be considered a “pretty bad idea” for desktop apps can be a deal-breaker on a mobile platform:
- Database drivers. Right there we hit a show-stopper issue. Even if we wanted to directly access our database from the iPhone or iPad – for most database systems, there are no drivers for it, meaning we’d have to implement the wire protocol ourselves, or try to port an existing driver (assuming it’s open source) to iOS (which might be doable, especially if there is an Mac OS X version, but it would be time consuming).
- Storing the connection string inside your app. Think about this for a second. You are submitting your app to the App Store, where everybody (or everybody willing to part with the 99c you charge for your app) can download it, and reverse-engineer it. Do you really want this app to contain the key to the crown jewels, the login to your MySQL database? Remember that this would be one connection string for everybody, so whoever hacked this could access the data of all your users. What happens if that connection string does get out and you need to change it? All installed apps out in the wild would stop working, until you send out an update
- Business Logic on the client. Once again: with one connection to your database for every install, can you rely on the client application to enforce that user Joe only sees his data, and Jane only sees hers? What if there’s a bug in your client, and suddenly users see other people’s data – and you can’t even fix it without sending out new clients.
- Exposing your database server. Unless you’re developing enterprise applications for a niche market where you can require your users to be on a VPN, your iOS will need to talk to your server via the “public” internet, meaning whatever it talks to needs to be exposed to the world – and as we’ve seen above, you don’t really want to do that with your Oracle or MySQL or Microsoft SQL Server.
- Connection stability. Users will run your iOS apps connected over WiFi (at best) or 3G/CDMA at worst. They might be trying to use your application in front of Moscone West while WWDC is going on, and AT&T’s 3G network is on the verge of breaking down. You cannot let your application rely on a verbose network protocol designed in the 80’s for always-connected ethernet – your application needs to handle flaky connections that are slow, get interrupted, or fail altogether, and cope with these failures reasonably.
- Connection security. With iPhones and iPads, users will be connected to unsecured networks all the time, whether it’s via 3G or the free WiFi at Starbucks that they are sharing with 20 strangers. Regardless of the sensitivity of data in your application, you will want to be communicating through secure channels such as HTTPS or otherwise encrypted protocols.
- As you see, right off the bat, there are a lot of problems with Client/Server architecture that will make it next to impossible to successfully use it in iOS applications. So what can we do?
Multi-Tier to the Rescue!
The good news is that the solution to all of the problems outlined above lies in going multi-tier. How does a multi-tier architecture solve this problem? Simple: It does that by partitioning data access into more tiers than the traditional Client/Server model (which is also sometimes referred to as two-tier), with each tier performing the tasks for which it is best suited and can be trusted.
Essentially, a multi-tier architecture inserts a third tier between your client application (on your iOS device) and the database (running in your, or someone’s, data center). This third tier, also called the “middle tier”, will usually be located physically close to the database server, usually in the same datacenter, and expose just the services your clients need to the outside world. You can also think of this as splitting the client tier of a Client/Server application into two parts, and moving the part that gave us all the worries (business rules, tricky connection to the database) from a location where it is largely out of our control – on the user’s device – to where we have tight control over it, instead. ￼
(In theory, the multi-tier architecture, as indicated by the name, can consist of a variable number of tiers, but the most commonly used scenario is a three-tier solution.)
This change takes care of virtually all the problems we have seen above: all the constraints of a Client/Server database connection now only apply to the connection between the middle tier and the back-end database, as in essence the middle tier is taking the place of the “Client” in the Client/Server model, as far as the data access is concerned. But because middle and database tier are located so closely to each other, none of the concerns about the database communication layer are an issue on this level:
- Middle tier and database will be in the same data center (possibly even on the same server or servers), so speed and security of the network protocol is no longer an issue, as the network connection between the two will be as close to perfect as things will get.
- The database connection string is known to the middle-tier, but since that is running in your data center, there is no concern of regular users reverse-engineering the application to obtain it since users don’t have access to your middle-tier server application.
- Since middle tier and database communicate locally, you can close up your firewall to the outside really tight, so no-one can directly access your database server. You don’t need to worry about known vulnerabilities in the database server software, or even about the connection string getting out, since your database server is unreachable from the outside world. (Your developers and DBAs might still need to access the database directly, of course, but that is a problem that can be solved by other means, for example by letting them connect via a VPN or using tools that allow remote management of the database without opening the database ports. For example, Relativity Server and Schema Modeler, introduced in chapters 10 and 11, allow developers to create the middle tier without direct database access.)
- Since the middle tier will run on one of a handful of standard server platforms (usually Linux, but maybe also Mac OS X Server or Windows Server), there should be no shortage of database drivers for the platform, whether for the common big database types or even more esoteric ones.
The communication between clients and middle-tier server is no longer tied to a protocol dictated by the database; instead, the way clients talk to the middle-tier can be fine-tuned for the demands that remote clients put on the system and the protocol can be designed around these requirements:
- Communication can be done via HTTP or HTTPS, using efficient binary protocols that transfer only the minimal amount of data and can cope with connections being slow or connections being lost altogether; alternatively or additionally, open standards such as SOAP, OData and JSON can be used to expose a middle tier to different clients using protocols that are widely understood.
- For the same reason, there’s no need for database drivers on the client; in fact, the client no longer has any knowledge of what kind of database is run on the back-end at all. All the client needs to know is how to communicate with the middle-tier – which is taken care of by the multi-tier framework for you, but also commonly is a fairly straight-forward and easy-to-implement protocol.
- The client application no longer needs to contain a connection string or, indeed, any type of hardcoded login; instead, client applications can authenticate with a username and password entered by the user (something we’ll look at in more detail when we talk about business logic). At the most, the client contains a hardcoded URL to your service, such as https://api.example.com.
All of these may already seem like major architectural improvements, and they are, of course. But the single biggest advantage lies in the fact that all the business logic is transferred from the client application into the middle tier.
What does that mean? This means that rather than relying on the client application to “know what it is doing”, to only touch data it is allowed to touch, and to make sure any data changes are consistent, this is now handled on the server.
Remember how in the Client/Server model, the client connected directly to the database using some hardcoded login? This meant that the client could see all the data in the database, and it would be up to the client to not show data the user was not allowed to see. It also meant the client was technically allowed to write any data it pleased (maybe within some constraints) into the database, and once again it was up to the client application to make sure data entered by the user was valid and stored in the database correctly and that the database was not put into an inconsistent state.
With the multi-tier approach, all of this is still true – for the middle tier. The middle tier still has the full access to the database, and it is the middle tier’s task to make sure to not give out data a client is not allowed to see and to make sure data from the client is consistent before it gets into the database. But this is not a problem, as the middle tier server is under much tighter control – it runs in your own data center, and no one unauthorized can easily mess with or replace it to circumvent its rules.
The client application authenticates with the end-users’s login, and the middle tier will only ever send it data that it is allowed to see. No matter what a client application does, how badly it is compromised, it cannot get past the middle-tier server. Even if a hacker were to completely reverse-engineer your app and create his own version for malicious purposes, there’s no way he could do any harm – because the middle tier will only let him do what he’s allowed to do in the first place.
The middle tier holds the final control over what data goes in or out. A client application might ask for data it is not allowed to see, but the server will either reject the request or automatically down-filter the data it sends out, based on its knowledge of who the client is. (For example, a client application might ask “give me all messages”, by sending a “SELECT * FROM Messages”, but the middle tier, knowing which user the client authenticated as, would automatically translate this to only return the messages the user is allowed to see, effectively converting the request to “SELECT * FROM Messages WHERE [can be seen by Joe]”; in another case the client might ask for private data, such as admin tables or statistics, and the server would simply refuse the request, if the user has no admin privileges.) Similarly, a client might ask the server to update the database in ways that are not allowed, or might leave the data in an inconsistent stare, but the server would catch and reject such a change, or possibly fix the change. (For example, a rule might say that updating rows in a certain table requires an UpdatedDate to be adjusted. Rather than rejecting a change where the client neglected to update this field, the server could simply update the date itself, while processing the rest of the change as requested.)
Essentially, with the Client/Server model, there was a break between what the client application could do and what the user could. Application and user were separate entities with different levels of trust – even though the client application was running on the user’s device and, as such, should not really be trusted any more than the user could. With the multi-tier model, this changes; the client application is not treated as identical with the user; the application only sees what the user is allowed to see, and data that is not intended for the user never even makes it out onto the wire.
To give a concrete example, imagine you are writing the next Facebook, and your database will eventually contain the private and personal data of thousands, if not millions of users. You would not want to leave it up to the application on Joe’s phone to decide whether to download and show Jane’s private information, you would want that kind of decision to happen on your server, where it is under your control. If Joe were not allowed to see a certain piece of information about Jane, then Joe’s client application – action on behalf of Joe – should not have the right to access this information, either.
Basically, the middle tier acts as a filter, or a funnel, with access to the full database on the back end and tight control over who can access or modify what on the front.
Business Rules on the Client Tier
Of course there is still a case to be made for having some business logic on the client tier as well, but this business logic should complement the rules that are enforced on the server, mainly for user convenience. In other words, the client should try and enforce rules where it can and not let the user enter bad data, or warn him early on when entering bad data – but this should be only so the user gets faster feedback about problems, or can avoid them altogether, without waiting for the server to reject them. The server should never rely on the client to have enforced these rules.
For example, think of Twitter. A good client will enforce the 140 character limit locally, maybe show a counter, and stop you from sending a tweet that is too long, for example by graying out the Send button. That’s client side rule enforcement. But even if every client did that, it would still be expected from the server to double-check the rules, and reject a tweet that is too long, if it somehow slipped through the cracks. As a rule of thumb, client side checks are for convenience, and for convenience only; the middle tier server is and must be authoritative for what is allowed and what is not.
In this chapter, we took a look back at how database applications started out, last century, using the Client/Server model. While this architecture had its time and place, we saw what drawbacks it has in the current world of computing, where the internet and mobile devices have taken the user out of the local company network and into home offices and onto the street. We’ve learned about multi-tier architecture, which was designed to solve these problems, and touched briefly on some of the benefits a middle-tier server brings, especially for mobile clients.
In the next remaining chapters, we will delve deeper into the various concepts behind multi-tier architecture. We will take a close look at business logic and all it entails in Chapter 18, and then continue on to look at database abstraction, and how multi-tier benefits the development of diverse clients to complement your iOS application. Finally, we will look at Data Abstract, a commercial multi-tier framework for iOS (and many other platforms) developed by our employer, RemObjects Software, and its approach to solving multi-tier. You have already seen Data Abstract in practical use throughout Part Two.