Friday, April 22, 2011

Amazon Glitch Hobbles Websites

Haven't blogged for a while. Thought this might be worth it...

Technical problems at an Amazon.com Inc. data center caused several websites and Internet services like Foursquare and Reddit to crash or have limited availablity on Thursday.

Amazon, which rents Web servers and storage to companies, said it was experiencing "instance connectivity, latency and error rates" with a data center in northern Virginia that handles operations for the U.S. East Coast. The company said was working to correct the problem.

I've long predicted that if cloud technologies can take off they will go a long way toward putting Microsoft out of business (a positive outcome from my point of view). Even if Microsoft could be one of the cloud players (and no doubt they will try) the profit margins are *much* lower than they are used to making selling fiat certificates to use their products which were largely finalized in the 90s.

Apple will have this problem too. A big deal is made of their iTunes business, but they make most of their profits selling gadgets that fit into a narrow range... servers gone, desktops declining, iPods declining. In other words the company (and a very successful one) is now built on laptops and cell phones.

But the big cloud players now Google and Amazon to name two, are going to have to learn that not only are margins low, but risk is high.

Think about it... how many computers running Windows have lost or corrupted vital information? Countless. Yet Microsoft bears no responsibility, they make that clear in their TOS which essentially says "If our product turns out to be a piece of sh*t, don't come crying to us!"

The remarkable thing is that they have gotten millions of people to agree to this, including large corporations, government agencies. Who has bucked the trend? Banks, military, and intelligence agencies all of whom know they can't risk certain sensitive data and tasks to a half-assed *toy* operating system (that's pretty much what mainframers used to call windows and I think the implications are accurate).

Oh, lest you think I am a fanboi though, I am not satisfied that twenty years of lackadaisical attitudes about security have lulled even cloud proponents into a false sense of "who cares?"

I got an e-mail message from Google today telling me that some of the files (MP3s I guess) I've uploaded to Google Docs "might" be lost or to quote directly: "Your uploaded audio files should be fully restored at this time."

"Should be"? Well, are they or are they not? What am I supposed to do at this point, take an inventory? Listen to all 200 hours of them to make sure they are OK? Download them and do comparisons with my originals? Or just wait until some point in the future when I am totally dependent on them being all there and all correct only to find out that that is not the case?

I like the idea that my data in the Google cloud is "backed up" by being on multiple machines at any given time (is it two, or three or nine, they never say?) But is missing with todays cloud architectures, and what is probably not possible in the same way we did on mainframes (multi-generation back-up tapes going back months and years) is an "oooops" prevention system. Our existing cloud systems do nicely in handling the situation when a particular "cheap" server goes down. We've all read about that and it is marvelous.

But what about when a Google or Amazon employee slips up and runs a program that deletes all my files, and that deletion propagates to all the copies of all the files everywhere. Is that situation covered? What about when a lot of e-mail went missing a while back? Is that really acceptable? Yes, it's probably better than the aggregate e-mail lost from millions of Windows computers hundreds of thousands of which are probably in some state of brokenness at any given time, but remember, Microsoft has been saying "f*ck you" to users for years and getting away with it. They likely won't be so tolerant of Google or Amazon or even Microsoft saying the same thing when it comes to a *new* service which is highly touted as better in every way.

We've read in the past where the "bigtable" file system and other now generic components of cloud services can produce slightly imprecise results but results which satisfy the needs of search engines quite well. But do all these base components suit the needs of other cloud services which need solid uptime, perfect repeatability, perfect security and multi-level redundancy, to name a few?

The Cr-48 incorporates a "disposable" aspect to the hardware under our fingertips which I think has been missing since the mainframe days (IBM engineers could replace every component of a floor standing disk drive and recover every byte of data on the old drive in and hour or two, and a long repair might involve flying a part on a chartered airplane from the other side of the country). Users today, including as I mentioned many institutional users, are no longer used to that level of service. I hope the cloud, as it matures, and hopefully soon, spoils us once again.