Saturday, August 02, 2008

RackSpace - are they really the best web hosting?

About a month ago I switched from Virtual Private Hosting on webhost4life to dedicated hosting on RackSpace.
It was definitely an improvement, but I'm still dissatisfied.


Here are painful parts of my experience with RackSpace:

1) RackSpace wanted me to sign paper contract (WebHost4Life didn't require that). That paperwork took almost a day (several hours of my efforts + some wait time). Sales guy couldn't open several versions of "Microsoft Office Image Writer" that I emailed to him, so I had to resend the document in different format.

2) It is a little unpleasant to deal with RackSpace sales guys. They forget (or "forget") to answer some of my questions; use some slightly unpleasant pushy sales techniques. Is it typical for any sales reps, not only RackSpace's sales?

3) After the contract was signed, it took RackSpace almost 4 days to install the server. I signed the contract Wednesday July 3rd 2008 and was hoping that on Saturday-Sunday night I'll be able to move my web site (www.postjobfree.com) to RackSpace. But RackSpace set up my server only on Monday - not convenient time for me and my users to do the move.

4) RackSpace promised me that they would help with the migration. They gave some tips, but not all of them were good. For example, they suggested me to shut down my web site for few hours while I will copy my database. Not a good approach for 24/7 service. So, basically I was mostly left on my own with the migration.
Fortunately, I used advise of Omar Al Zabir about smooth web hosting migration
Ironically -- it was Omar's recommendation to use RackSpace for web hosting that made me pick them.

5) Average response to my ticket requests is about few hours (2-3 may be?). Sometimes ticket response time is shorter; sometimes it's longer (up to a day or even more in some cases). It's an improvement in comparison with WebHost4Life, but is that really the best in hosting industry?

6) Most of the time responses are good, rarely great, and in some cases responses are incompetent :-(
For example, I asked RackSpace if it would be a good idea to run SMTP server on separate IP address (same physical machine). A RackSpace guy replied that it would be a good idea. So we did the switch. But it turned out that RackSpace cannot monitor ports on another IP addresses (only on primary IP address). So we updated SMTP server settings again to allow listen both IP addresses. That caused SMTP server to use primary IP address to send emails, but at this point smtp.postjobfree.com DNS record was pointing to second IP address, and this caused painful issues with spam filters on some servers. In the end we returned back to using primary IP for SMTP, but went through some pain because of incompetent advice.

7) RackSpace seems to be not really good in analyzing past problems. They simply ignored my request about digging into this incompetent SMTP advice. Nothing like "sorry, we screwed up and would do this and that to prevent it in the future". Nothing like "sorry, it was misunderstanding and you (me) should do this and that to avoid problems in the future". I would understand if cheap hosting provider would skip such "past failure analysis". But if hosting provider claims "fanatical support" - I expect to do a little better.
Well, may be it happenned because I still didn't have a chance to talk with my account manager. RackSpace doesn't have one for me yet (after being with RackSpace for a month):

8) Rackspace's ticketing system creates unneeded noise. For example, after I create a ticket on my.rackspace.com, it adds meaningless auto-response to the ticket and sends me notification email. If I update a ticket, I get notification email again. Why would I need notification about ticket updates I made myself? That's distracting.

9) Maintenance downtime. :-(
I mentioned already that RackSpace considered few hours web site downtime during migration as "ok" practice. That attitude goes toward other maintenance things too. Today they installed hardware firewall on my server. They brought down my server for almost an hour (!). Could anybody explain me why installation of hardware firewall should bring server down for almost an hour? It should be less than a minute downtime, or preferably zero downtime. That was disappointing. I managed to migrate web server from different hosting with no downtime (I copied 4 GB database between WebHost4Life and RackSpace and still managed to avoid downtime) and now trivial installation of hardware firewall caused almost an hour downtime:

10) The hardware firewall installation caused another issue as well -- my web site was not able to send out emails after the firewall installation. RackSpace didn't notice that, because their SMTP port monitoring didn't catch the issue. We noticed it few hours later and reported in a ticket. We got no reply for couple of hours, so I had to call RackSpace and remind that the issue is still there. They were not very good in pinpointing the issue. They were trying to re-test SMTP server and it worked. So we (at PostJobFree) had to find it out the problem ourselves. The problem was that smtp.postjobfree.com could be pinged from every computer, but not from my server itself, because hardware firewall didn't resolve the request from my server back to the server itself.
RackSpace techies couldn't grasp that for a while even after I pointed them into that direction. Eventually they recommended to update my application and made it use my new 192.168.x.x IP address for sending emails. Imagine that: update and redeploy my application web app to accommodate to hardware changes. And do it in the hurry after(!) hardware firewall is installed.
I suggested better solution: simply add one record to C:\WINDOWS\system32\drivers\etc\hosts:
127.0.0.1 smtp.postjobfree.com
RackSpace techies still cannot grasp that solution and comment on it. Fortunately my solution works so far.

11) There were few other minor issues, but I think my saga is getting too long already.

On the bright side:

1) When my server works (and it usually works) it works really fast. I'm happy with the speed so far. Though I'm not sure if I should attribute it to RackSpace or to dedicated server. With WebHost4Life SQL server was shared with 4 other clients and in the end I was getting 50+ timeouts/day.

2) Some RackSpace folks taught me some useful stuff, for example about DNS [re-]configuration.

3) RackSpace is expensive, but it's not THAT expensive. I got my server + SQL Server license + hardware firewall for a little over $600/mo (with 1 year contract)


So, what do you think, is it typical to have issues like these with any hosting provider, or there are better hosting providers out there?

Any other comments?


Update (2008 August 03):
Problems with RackSpace are getting worse.

RackSpace technician configured my SMTP server as open relay
:-(

That's how it sounded on the ticket:
"We did find a setting in your SMTP that was set incorrectly, and we corrected that."

In fact, SMTP server was configured properly and the problem was with DNS configuration. Instead of fixing DNS (after installing Hardware Firewall) RackSpace guys made the problems much worse by turning my SMTP server into spam-machine and enabling prompt access to all spammers through newly installed hardware firewall.

Crazy stuff.

4 comments:

Dennis Gorelik said...

RackSpace problems discussion on WebHostingTalk

Dennis Gorelik said...

Is 1 hour network reconfiguration downtime unavoidable with every Hosting Provider?


Today I got explanation from RackSpace why they needed to bring my server down for an hour when they install Hardware firewall (see below).


Could you comment if it's really necessary have 1 hour downtime in that case?

Here's how I see appropriate maintenance:
- Install and plug Firewall's power and external network cable without touching production server (no downtime yet).
- Connect test machine behind firewall and test if firewall works properly (no downtime yet).
- Switch network cable so it connects Production server to Hardware Firewall (20 seconds downtime).

Why couldn't the maintenance be done that way?



Anyway, here's response from RackSpace:
=======
I have answered questions 1-3 below and will be passing this ticket over to our network technicians in order to get questions 4-6 answered for you.

1) Why installing hardware firewall brought my server down by almost an hour? When working this type of maintenance to add firewall:

1. Mark ticket In Progress
2. Grab parts for maintenance
3. Put Public comment in ticket we are starting maintenance.
4. Log into server(at console or remotely).
5. Verify if other users are logged in.
5a. If users are logged in, we send them a message stating server is shutting down in X minutes.
5b. If no users are logged in, go to next step.
6. Shut server down.
7. Open the server.
8. Remove back plate from an open PCI slot.
9. Install PIX 501 card into PCI slot.
10. Screw PIX 501 power supply card in place.
11. Find open Powersupply connection and connect it to back of PIX 501 power card, since this will provide power for firewall.
12. Put side panel back on whitebox.
13. Install the firewall below the rack. We have to mount it to the rack the whitebox sits on. These are racked underneath the rack for each whitebox server. We use zip ties to hold it to the rack in place.
14. Put server back on rack.
15. Plug in power and network connections.
16. Connect console to server and verify server boots up fine.
17. Log in at console and verify it can ping NAT Gateway IP(192.168.100.1)
18. Verify if server can pint out to google or some other site

19. If we are unable to ping out or ping gateway IP, we will have to double check network connections and work with NetSec to resolve issues. This could be port speed issue since the PIX 501 firewalls require the port speed to be at 10Mb.

20. If server is pinging out fine, DCOPS will come back into DCOPS room and verify we can get to server remotely.
21. If server is not remotely accessible, we will have to go back to console and see if they are running any firewall software that is preventing access or if port RDP is using is changed.
21. Change status of server to online complete.
22. Send Public comment stating server is back online.
23. Close ticket.
24. Route Contract Received ticket for firewall over to Network Security to have them online firewall.

Downtime was necessary to install PIX 501 power card since the firewall gets power through this card.

2) How 1 hour downtime goes together with "Zero downtime" RackSpace slogan? Zero downtime means that your network will be up 100% of the time. However when upgrading your configuration (adding a firewall) there needs to be a certain amount of downtime in order to add this firewall to your configuration. Whenever a hardware upgrade is made there will be hardware downtime involved. The amount of time will vary depending on the hardware upgrade.

3) Why the length of downtime was communicated to me only at the beginning of downtime, and not some time prior to that? As we spoke about over the phone today I apologized for XXX not conveying the amount of downtime you will have during this maintenance. I have already spoken with him about this and moving forward if there are any maintenances that need to take place on your account your new Account Manager YYY and XXX will make sure and go into exact detail about the amount of downtime you should expect. Consider this mis-communication taken care of from now on.
=======


Is it really necessary to shut down production server just to plug Hardware Firewall power?

Dennis Gorelik said...

So it seems that $165/mo for hardware firewall is a "cost savings", and that's why RackSpace needs to save money by bringing my server downtime.
And of course "Maintenance downtime" is not considered downtime at all. :-(

See full RackSpace explanation below:

==========
At Rackspace, we have a highly skilled and technical team of Electronic and Systems Engineers. These Engineers have designed many custom tools for use in our datacenters over the years. This gives us the means to design and create more efficient ways to pack more servers into limited power constraints. What does this mean to you, the customer? More savings in the end!

========================================================
Dennis: 1) Why installing hardware firewall brought my server down by almost an hour?

RackSpaceGuy: >> I can not speak to the hour downtime, but I do have the answer as to why the server must be brought offline. In this particular low cost setup, the tower servers are packed onto a bread-rack shelf system. Each server is connected to a single power source and the whole rack is finely tuned to max capacity. When installing a firewall to these servers, only one server is connected behind the firewall.

[edit]: Just received a phone call from the Tech whom did the maintenance. Apparently, there was a problem with the initial firewall being used, and he had to take it back down to reconfigure it. There is the hour.

Though the firewall itself has the ability to connect to multiple servers, these configs are not priced or speced out to have multiple servers behind one firewall.

As stated, in an above comment, we take the server offline to install a custom Rackspace designed power supply adapter into the server. This power supply adapter replaced the Cisco provided power brick, feeds off the servers internal PSU, and relieves the need for extra rack power. Extra rack power? The server's internal PSU is not fully being utilized by the internal components. This allows us to feed off the internal PSU to power the firewall. This method is only used in the low cost tower servers and with the Cisco PIX 501. We do not use this method for the larger rack mounted configs and other firewalls.
========================================================
Dennis: 2) How 1 hour downtime goes together with "Zero downtime" RackSpace slogan?

RackSpaceGuy: >> Your SLA of "Zero Downtime" is related to network traffic uptime, and the Hardware replacement SLA for failures is one hour, not during a scheduled maintenance. I'll let your Account Manager field this question further...
========================================================
Dennis: Cisco PIX 501 is not PCI card it's hardware device with own chassis.
Look at http://www.cisco.com/en/US/docs/security/pix/pix62/quick/guide/501quick.html

RackSpaceGuy: >> As stated above, we have designed our own replacement to the Cisco power brick, for power, space, and customer cost saving reasons.
========================================================
Dennis: If it's PCI card - why I can't see it in Devices on server?

RackSpaceGuy: >> The "PCI Card", as it was explained to you, is not actually a PCI interfacing device. There are no gold fingers on this power adapter card to interface with the system bus. It simply clicks in to one of the PCI slots for support, and plugs directly into one of the internal molex connectors, from the server's PSU.

This would be the reason you do not see the device in the device manager.
========================================================
Dennis: 1. Why isn't it possible to plug firewall into separate power source in order to reduce server downtime during maintenance?

RackSpaceGuy: >> Again, due to the construction and cost saving (to you) methods of our lower cost racks, we are limited to the power and space for such a setup. The server downtime should have been explained to you prior to the maintenance.
========================================================
Dennis: 2. What if there are several computers behind that firewall?
Do you mean that in this case firewall would depend on one of these computers?
So, if that computer fails, then firewall will be down and whole web farm will be down?

RackSpaceGuy: >> Only in the low cost systems. Only one server is connected to a firewall in this setup. In order to go with a complex firewall config, you would have to upgrade to a rackmounted server config. Then the firewall would have it's own power supply connection, independent of an individual server. Also, multiple servers would be able to connect to a single firewall. If you plan to add servers to your farm and be behind a single firewall, your account manager can help you with plans to upgrade your account.
========================================================
Dennis: 3. Note, that my question was about my server downtime.
Not about whole length of maintenance. Only about production server downtime.
Could you please name only the steps that cause server downtime and are unavoidable during installation of Hardware Firewall?

RackSpaceGuy: >> Again, an unfortunate lack of communicated steps to you. These steps should have been explained ahead of time. Your account manager, Bryan, has informed me that he will talk more about this with you tomorrow. The downtime is explained earlier in my response.
========================================================
Dennis: Here's how I see appropriate maintenance:
- Install and plug Firewall's power and external network cable without touching production server. (no downtime yet)
- Connect test machine behind firewall and test if firewall works properly (no downtime yet).
- Switch network cable so it connects Production server to Hardware Firewall (20 seconds downtime).

Why couldn't the maintenance be done that way?

RackSpaceGuy: >> See above.
========================================================
Dennis: 2) Isn't Hardware Firewall part of network that supposed to be 100% uptime

RackSpaceGuy: >> During a maintenance, the 100% network uptime does not apply, if the maintenance requires a network disconnect. Should a firewall's hardware fail, that is considered a hardware failure, not network failure, and would fall under the one hour hardware replacement SLA.
========================================================

Anonymous said...

I see no reason to shut-down the server to install a PCI-shaped card that does not interface with the PCI bus, other than the risk of accidentally short-circuiting something if one of the screws gets dropped.

It's best to use cases that can have PCI cards installed without using screws.

PCI supposedly hot-pluggable, so even cards that connect to the bus can be installed while the server is running.

Followers

About Me

My Photo
Email me: blogger at dennisgorelik.com