Dark Clouds: Demanding an End to Outages of Online Services

In January, Salesforce.com’s “on-demand” services for managing customer relationships–popular with Merrill Lynch, Morgan Stanley other Wall Street firms–went down for about an hour.

In the middle of May, Google went down for two hours, leaving users unable to access the search engine–but also email, documents and other Google services popular with small business users that were hidden in its widely dispersed computing “cloud.”

Then, in early June, customers of Amazon’s Elastic Compute Cloud (EC2) cloud computing services were offline for more than four hours after lightning struck one of the company’s data centers.

The portent of such problems used to make Bob Barry wary of cloud-based computing. For the president of Barry Capital Management Inc., a small wealth management firm based in Hackettstown, New Jersey, reliability of computing services is paramount.

Barry used Goldmine customer relationship software on desktop computers for 15 years. When he originally learned about Salesforce.com, he heard–and worried–that the system had a lot of downtime.

He finally made the switch to Salesforce.com a year ago, after getting frustrated with his previous system, a desktop software package.

“Allegedly you could do all these wonderful customizations–over 15 years we did a fair amount of them–but they were 100 times more difficult than they needed to be, and didn’t work near as well when they finished,” he said.

He wanted to move quickly, as he fought the economic crunch–and not hire a programmer to make changes that suit his business. With Salesforce, he doesn’t need to hire outside staff.

“I made four customizations this morning between 9:10 and 9:30, before I went to a customer meeting,” he said.

Plus, Barry says he is very happy with the reliability of the Salesforce service. He hasn’t personally experienced any unscheduled downtime when he tries to tap into “the cloud.”

“I’m generally on the system all day,” he said, “And at night until 12 o’clock or 1 in the morning, seven days a week,” he said.

Not that Salesforce.com doesn’t go offline, at all. There is middle of the night scheduled downtime, for maintenance. Typically, around 2:30 a.m.

For his capital management company, which handles about $25 million in assets, that’s okay. “They’ve managed to tuck the maintenance into a spot where it hasn’t affected us at all,” he said.

But for Wall Street firms looking to move mission-critical applications into the cloud, any downtime at all is bad news-whether scheduled or not.

Cloud computing providers are moving to increase reliability and reduce downtime, said Larry Scott, president of global industries for Hackensack, NJ-based IT consulting firm Ness Technologies.

“The reliability question is being addressed through a strategy–I don’t think entirely quite reality yet–of dynamic, redundant, integrated clouds,” he said.

In effect, a cloud is a remote collection of servers, often called a server farm; or, a network of such farms. If one farm goes down, the others can step in and take over the load, without customers noticing the switch.

Applications can be moved easily to other clouds–other networks of server farms aka data centers–with management coming in the form of data center automation tools that manage the interactions of these clouds.

Several vendors help firms run their applications from a different cloud if the first cloud goes down. For example, TIBCO Software Inc. has recently released TIBCO Silver, a software “platform” that supports applications residing “in the cloud.” The code allows companies to balance the demands of their cloud-based applications across different servers in the same cloud, said Rourke McNamara, head of product marketing.

This can help companies avoid potential downtime, he explained. For example, the recent Amazon outage only affected a couple of servers–out of all of the machines running as part of the cloud.

“An application distributed among different zones or different machines would not go down at all,” he said.

In addition, TIBCO is working on bringing in additional cloud providers, so that if one cloud vendor goes down completely, applications can be seamlessly moved to other clouds.

“That’s our goal,” he said. “As soon as we find other providers who have the same level of quality, of reliability, we plan to allow our customers to move applications from one cloud provider to another. If there’s a serious long-term outage on all of Amazon, our customers would be able to pull up their systems on another provider.”

Tibco also offers automated balancing of computing loads, so that applications are distributed among more or less servers as needed. Today, those servers are all within a single cloud. “But it would be trivial to change that to use multiple cloud providers,” he said.

And at Tibco itself, the firm schedules maintenance on individual systems, so there’s never any system-wide downtime, McNamara said.

McNamara dismissed concerns that a big event–such as the kind of “Black Friday” surge in usage that disrupted Amazon’s site and those of Walmart and Disney on the day after Thanksgiving 2006–might put significant pressure on cloud providers and hurt delivery of services to the brokerage firms that use them on that day. For example, retailers may be heavy users of cloud systems on that day, but not other sectors, he said. “And the markets are closed.”

In general, it makes sense not to tie a firm to any particular cloud vendor-or any utility, for that matter, he said. “It’s important for other cloud providers to step forward and provide services. Telecom providers, for example, are getting into the game–and they’re used to providing 24/7 service,” McNamara said

For example, AT&T first launched its cloud service–AT&T Synaptic Hosting–last summer, and recently expanded the product with AT&T Synaptic Storage as a Service. AT&T was named a cloud services leader by technology research firm Gartner in a report issued earlier this month.

And Verizon rolled out its Computing as a Service cloud offering in June of this year, focusing specifically on large and mid-sized enterprise customers. Like Amazon and AT&T, Verizon can host both applications and storage in its cloud.

Meanwhile, providers of computing capacity in the cloud also are working to improve reliability.

Currently, Amazon provides a service level agreement for its EC2 cloud server that promises 99.95% availability for each geographical region–or less than five minutes of downtime a week. There are currently two Amazon EC2 regions– one for the U.S., and one for Europe. Each region consists of multiple server locations, with each location engineered to be isolated from failure in other locations, the company said.

Another company that promises to help firms move applications between different cloud providers is New York-based Gigaspaces, with connections to both Amazon’s EC2 and Vmware’s vCloud.

It’s possible to move applications between EC2 and vCloud, said GigaSpaces chief technology officer Nati Shalom, but he warned that there is no such thing as “zero lock-in,” or complete vendor independence. Some work may be required with each move, depending on how the applications are written, and whether they require access to proprietary data services such as Amazon’s S3 data storage service.

“But we do as much work as possible to abstract the vendor layer,” he said. “People are able to move existing applications over in a matter of days.”

Amazon also provides software tools to companies that can help them tie into Amazon’s databases and storage facilities, said Ivan Casanova, chief marketing officer at DataSynapse, which built its business on helping companies split computing tasks among grids of computers in a server farm and has helps companies manage computing and storage in clouds.

“Cloud vendors are pushing their customers to use very specific tools on these cloud platforms which, yes, will lock them in” to using data storage, bandwidth and computing capacity only from that vendor, he said.

The more a firm avoids these tools, the less dependent it will be on that particular cloud if there’s a failure.

DataSynapse currently works with about 100 customers globally, the majority in the financial services. Its software enables firms to run their own in-house clouds or to connect to outside cloud providers, including Amazon, Vmware, and IBM. The goal is to allow companies to move the applications that currently sit in their in-house data centers into a cloud at the touch of a button. “It is an oversimplification,” he said, but the direction the automation is taking.

Another way to get high reliability from a cloud is to go with a vendor closely connected to your core business. For example, a brokerage typically would not put its trading application into a cloud because of speed, reliability and security issues, said Tony Bishop, CEO of technology vendor Adaptivity and former chief architect at Wachovia.

“But you could put it into a cloud provider–if the cloud provider is someone like the New York Stock Exchange,” he said. Exchanges are starting to get into this business, he said. “They can guarantee security and best execution,” he said.

Stock exchanges already provide hosted services–everything from trading itself to analytics, order management systems, and other functionality. Some exchanges are going further–the New York Stock Exchange, for example, is in the process of building a data center which it will share with both market participants and competitors. True cloud computing–in which a firm can have its own software running on cloud servers owned or operated by an exchange–is just a short step away.

Adaptivity is currently working with 15 clients, Bishop said, including seven top-ten banks, in helping them use internal and external cloud environments.

Keeping Your Clouds Moving

1. DO DUE DILIGENCE. Check them out. Cloud providers come in different forms and reliability levels. Circumstances can change quickly.

2. FOCUS ON YOURSELF. Don’t assume that if a cloud provider is good enough for other firms, it’s good enough for you. Your needs and requirements differ.

3. DIVERSIFY. Many cloud services, including Amazon EC2, allow customers to run their applications on different, isolated parts of the cloud. If one area of the cloud goes down, the other stays up.

4. KEEP THE HOME FIRE BURNING. Some cloud technology firms integrate their capacity with customers’ internal clouds. If the external cloud goes down, the in-house cloud can take over.

5. DOUBLE UP. If one cloud is good, two clouds are better. Several vendors promise to help firms deploy the same application to multiple outside clouds.

6. LOCK OUT LOCK-IN. Many cloud providers offers tools and features that help clients get more use out of the cloud, such as quick integration between application clouds and the associated storage clouds. Each bell and whistle, however, makes it harder to switch cloud providers later on.

—

Read full article at Information Management. Article originally appeared in Securities Industry News, which has since closed down.