Knowledgebase

Gnax Down? [ Merged ]

Posted by net, 05-20-2008, 08:23 PM
I got a notification from PingDom that it is down. What's up? Net

Posted by lizardopc, 05-20-2008, 08:23 PM
is Gnax down? I cant access to my servers and gnax web page from 2 locations

Posted by myusername, 05-20-2008, 08:24 PM
Seems like the whole thing went lights out.

Posted by lizardopc, 05-20-2008, 08:25 PM
Yes is down i cant access to my servers

Posted by gilman01, 05-20-2008, 08:26 PM
It's back...

Posted by Dougy, 05-20-2008, 08:27 PM
Yes, JaguarPC is down as well.

Posted by net, 05-20-2008, 08:27 PM
Hmmm, my server got rebooted and it is up now....

Posted by Peter Anthony, 05-20-2008, 08:27 PM
It's back

Posted by spal911, 05-20-2008, 08:28 PM
All servers at GNAX are not available.

Posted by likwidmon, 05-20-2008, 08:30 PM
Correction: all *your* servers are not available. Mine is.

Posted by lizardopc, 05-20-2008, 08:32 PM
what is your IP?

Posted by jandres4, 05-20-2008, 08:35 PM
Yep, all our servers are up as well, downtime was 6-7 minutes for all of them. Let's hear what GNAX have to say.. Regards!

Posted by TotalChoice - Bill, 05-20-2008, 08:36 PM
All our ganx boxes went down as well.

Posted by Dougy, 05-20-2008, 08:38 PM
Is your uptime intact, or did power cycle?

Posted by spal911, 05-20-2008, 08:38 PM
Some servers are working fine and some servers are not responding. Last edited by spal911; 05-20-2008 at 08:47 PM.

Posted by Cyborg--, 05-20-2008, 08:39 PM
Mine went down as well, back online now. It did reboot...

Posted by likwidmon, 05-20-2008, 08:42 PM
Uptime in tact here. but noticed my route to the server was changed when the drop happened. weather channels show it is storming badly there.

Posted by TotalChoice - Bill, 05-20-2008, 08:44 PM
all of our servers were power cycled.

Posted by Peter Anthony, 05-20-2008, 08:45 PM
yep, severe thunderstorm watch my uptime is intact

Posted by RyanD, 05-20-2008, 08:56 PM
I'll be very brief as our primary goal is to make sure everyone is online. We did experience a power failure and are in the recovery process now. A rull RFO will be posted shortly. Please excuse our lack of responses in these threads as we are all diligently working to make sure 100% of the systems here are online and operational. Thank you for your understanding!

Posted by universal2001, 05-20-2008, 09:01 PM
various servers down here too

Posted by DaiTengu, 05-20-2008, 09:13 PM
Just got my server moved to a host in GNaX yesterday. This isn't normal, is it?

Posted by myusername, 05-20-2008, 09:28 PM
The short answer is that power used to be a very common problem with GNAX though it has been considerably better from the last datacenter, many of us have been told this type of event is NOT supposed to happen again because of all the extra special work that was done durng the last power-related outage, a few months back. I suppose RyanD will tell us why we suffered another power loss after all the preventative measures that went into patching the problem from last time. I don't think they understand that when this happens it can mean hours and hours of work repairing databases. That is AFTER they get the machines back online, which can be hours in itself. Even if the "power outage" only lasted 5 minutes it can equate to hours of downtime for the host and it is frankly, starting to drive me crazy. I wish they's look at the bigger picture it seems to me they think, "well, it was only 5 minutes so everything is fine..." Last edited by myusername; 05-20-2008 at 09:33 PM.

Posted by DaiTengu, 05-20-2008, 09:37 PM
Any idea how long I can expect my server to be down, then? (I'm hosted through Dixie Servers) The other datacenters I've been at, when they've had a power outage, usually the box was back up rather quickly. At this point it's been about an hour, I guess.

Posted by Cats-Computing, 05-20-2008, 09:39 PM
Ours are all back. Try contacting Dixie Servers directly.

Posted by universal2001, 05-20-2008, 09:43 PM
I agree, this seems to be the only major issue they keep having and its worst than a network downtime, as a power reset means lots of servers require fsck, corrupted data, etc.

Posted by Website Rob, 05-20-2008, 09:50 PM
I would also like to know why the Diesel Generators for backup power did not kick in. This is twice now, within the last few weeks, Servers have been down due to Electrical problems and no power backup.

Posted by DaiTengu, 05-20-2008, 09:52 PM
Probably out of fuel, have you SEEN the price of diesel in the US lately? Backup generators sound like a really simple system. In reality, they're horribly complex, especially for a large datacenter.

Posted by Echelon, 05-20-2008, 09:54 PM
I would assume they did kick in, as some servers seem in tact uptime wise, while others are affected by the outage.

Posted by myusername, 05-20-2008, 09:59 PM
Even if they did fire up the generators, that takes a few minutes to do. I belive the order of operations start with the magic battery system that is supposed to keep everything online until they have time to fire the gen sets which will pick up the slack when the battery power peters off in a few minutes of normal operation. I don't think their facility is rigged up that way though, because even their own equipment goes dead to the world sometimes. If any of you live down in hurricane country you know what I am talking about. My batteries here at the office last me long enough for most power outages due to some of the large electrical storms we get down here. I can sit here and work in the dark for about 45 minutes until the electric company gets their stuff in order. Ma bell usually has the Internet running just fine when the lights go out. If I have to fire up the generator, that means that ma bell is probably not feeding me internet anyways, and a major hurricane came through. I don't see why its not possible on a level grander than my office. I have PCs that have been running since the last hurricane here...2 years ago? Last edited by myusername; 05-20-2008 at 10:05 PM.

Posted by Cann, 05-20-2008, 10:03 PM
I'm hosted with RZ and my server was down for more than 30 minutes... For me, it was very scary at first when I realized that everything was down. Of course I didn't know about the power outage untill I found news about it at the RZ forums. This actually my first post at WHT and I somehow feel better that other people experienced the same thing that I did just now. Of course, I'm not glad that it happened. I'm glad everything is back up and I am curious as to an explanation.

Posted by Douglas, 05-20-2008, 10:11 PM
In no way should this be taken as an official GNAX post, rather it's coming from someone that lives in the Atlanta area and a mere 7 miles away from the new GNAX facility. Do not take this as an official excuse, just an POTENTIAL explanation as to what might have happened that triggered this. Atlanta (and the Southeast) has been rocked over the last 45 days with severe storms. Atlanta proper had a twister that was a mere 450 feet or so from my own home... I'm sure you all saw that one or heard about it on the news. A couple weeks back, we had another ugly storm system that brought a LOT of wind and rain. And tonight, we had yet another storm system (and a confirmed tornado touchdown that didn't cause damage) followed by a smaller, but just as rainy system... accompanied with winds up to 60 MPH and a plethora of lightning strikes. One area reported over 270 lightning strikes in a 15 minute window... and this was par for the course while the system travelled through the state. There may be more than meets the eye here... so please keep that in mind. Again, this is only a potential explanation as to the original issue... so take it for what it's worth. Before anyone asks, no, I'm not a GNAX customer, though I was a customer of someone else at GNAX in my past life. I'm speaking specifically as someone that lives in Atlanta, only. Again, this is not an official or unofficial GNAX anything type post, rather to enlighten y'all about the storm system that came through. Signed, An Atlantan that is lovin' this weather!

Posted by Echelon, 05-20-2008, 10:14 PM
Sounds right to me, hehe. I guess people don't remember anymore of the lovely quote: Stuff happens.

Posted by Cats-Computing, 05-20-2008, 10:16 PM
Thanks for that insight Douglas! Can never be as bad as the power here two summers ago, cut out atleast for an hour once a week. Good times!

Posted by Tina J, 05-20-2008, 10:31 PM
We have around 100 servers there and all but 2 are back online. I have to say, they worked their butts off getting power restored and everything fired back up. I'm amazed at how fast they were able to get everything back online. Its my understanding that Atlanta is currently under one hell of a storm. Even with really good backup power, there are some things that require manual intervention during an outage. Anyway, been at that DC now for a couple of years and this is only the 2nd power outage I remember. Both times they were amazingly fast at getting us back online. Kudos to the guys at Gnax! --Tina

Posted by sailor, 05-20-2008, 10:50 PM
5/20 Power Outage RFO -------------------------------------------------------------------------------- Severe storm cells came through North Georgia Region this evening. AtlantaNAP experienced an over current fault outage on one of our 2 main feeds. The feed is the original feed that has the most load currently connected to it. The amount of systems connected to the load is the amount of lightning and over current that will try to be passed to the system – i.e. if you don’t have very much load on it - like our new feed is currently only at 1/6th load - then current does not try to flow to it very much. Our first system is currently at 65% load so it tried to absorb much more of the lightning strike than the other one and hence the main breaker going into over current fault. I have spoken with all of our key electrical engineers associated with the building at this point. According to Georgia power / our PSSI and Cummins engineers – we likely took a lightning strike to the utility very near the facility which caused an over current fault on our main incoming breaker on our first set of switchgear. The breaker is designed to trip in the event of this kind of fault to protect the gear (your computers) inside the building from being burned up by the lightning strike. When this type of fault happens - the computer that is the brains of the swithgear will not start the generators until an engineer verifies where the fault is. This is because a fault inside the wiring plant could also cause this kind of over current in the event of a main short if a feeder wire of main current in the building were to become damaged. In that case it would be very dangerous to turn the power back on manually or to force a manual start of the gen sets and push current to the system with a fault remaining. Lives and machinery could be lost. We dispatched several of our staff visually to inspect for faults – (we did not want to turn something on and have it fry everyone’s gear) and found none and verified it was likely a lightning strike and manually started the generators to restore power. Unfortunately the ups system is only designed to carry that load for 10 minutes which was not enough time for us to safely verify and do a manual start. This is apparently a rare event – to get a direct utility strike like this – that close that does not get dissipated before it hits us. The farther away from your site the strike occurs - the more other load and grounds it has to dissipate before it gets to you. The good news is we did not burn up any equipment. Some of you did not lose power because you were connected to the other lightly loaded feed coming in and it was not enough load source to overwhelm the breaker since it is only 18% loaded at this point. Some of you lost network connectivity because downstream feeder switches that your computers are connected to are only single power supply units. We are in the process of examining a facility wide network upgrade that will move to a newer chassis based solution throughout the facility - we started looking at this as a way to offer new services capability that many f you have been asking for - it is a costly upgrade and will bring redundancy but also brings some pitfalls as well since you have more connections into a single chassis. We are still looking at this currently and will keep you up to date as to the direction we decide to move. They have told me that under normal operating conditions there is really nothing we could have done and we should simply be glad we had good equipment installed that kept our computers from being fried. I am thankful that I am not looking at a lot of damaged equipment that could not simply be turned back on - that would be a disaster I do not want to deal with. At this point it seems like the new switchgear with over current protection was a good investment.

Posted by Tina J, 05-20-2008, 10:58 PM
Well, there you have it. Everything worked as it was supposed to. Direct lightning strike, so everything shut down to avoid frying our servers. Thanks Gnax! --Tina

Posted by ayksolutions, 05-20-2008, 11:16 PM
Thanks again for the insightful post Jeff. We still have 2 or 3 boxes down, but did not really get affected that much.

Posted by utropicmedia-karl, 05-21-2008, 09:49 AM
Yes, but that's why UPSs and DC battery plants were created. Regards,

Posted by plumsauce, 05-22-2008, 02:31 AM
Then there are still lessons to be learned from telco environments, because this is not the way telco power systems work. People may despise telcos, but they build their facilities right. There is a reason there is always a dialtone. Then the ups is inadequate based on the time requirements. If it is known that xx minutes is needed for startup preparation, then xx + yy minutes is required of the ups. 10 minutes is not a whole lot of time. It might take 5 of those precious minutes for the junior person to get hold of the senior person, and another 5 for the senior to get hold of management. That would be 10 minutes.

Posted by danclough, 05-22-2008, 03:18 AM
Telcos do build their facilities to be very fault-tolerant, but what you're forgetting about is how much power a phone needs to operate as opposed to a Dual Xeon server with four SCSI disks. All the telco needs to do is power its POTS switching equipment and put some power down the line for the signal. GNAX needs to power their edge routers, core routers, and everything down to the switches in every rack. Add to that the servers which draw orders of magnutude greater power than what's in a telephone signal and it gets pretty expensive to provide the same level of power independence that the local bell can.Rack-based UPS systems are only designed to run for about 20 minutes. Battery racks are only designed to run for about 5 minutes while the gensets warm up. Have you priced a full InfrastruXure setup from APC? If you want full battery backups capable of powering an entire datacenter for even an hour, be prepared to shell out more than your building's worth.I somewhat agree with your final point. Always have a senior manager (or senior ANYTHING) on staff. The building should always be occupied by at least a few people who have EE backgrounds and have the authority to kick on the gensets. But also, realize that datacenters have miles and miles of cabling just for power. Checking every single PDU and every single circuit panel, even with a 5-man team, can take about 15-20 minutes. I'd say GNAX did a pretty darn good job given the obviously uncontrollable scenario. It's easy in theory and it looks simple on paper, but mother nature never goes by the book. Last edited by danclough; 05-22-2008 at 03:25 AM.

Posted by sailor, 05-22-2008, 05:41 AM
Most people dont want to pay for that kind of redundancy. I used to work for BellSouth ATT - I have been in plenty of CO's. Most of the CO's were unmanned after hours. It is not affordable to have EE on staff 24x7 at a Co or a dc. We put up a redundant system - that had an A / B power feed to the equipment from different DC plants so that if any part of the system failed we had a different feed. This is the best way and is the way most DCs do things. However most customers refuse to buy a B feed power circuit - but this is their choice - they would rather take the occasional outage than to pay monthly known cost. There is nothing wrong with this - its a businesss decision for them and its good to have the ability to choose from redundancy or no redundancy. The better investment of redundancy is for a customer to buy a redundant power feed known as a B feed- most decide not to make the monthly investment when it comes down to it and would rather take a downtime risk. These are available. None of our customers who have B feed service from us experienced a power outage. If I were to put a massive amount of batteries to cover this very rare kind of event - it would still have to be passed on in increased costs - but then everyone would have to pay it - and many choose not to based on their requirements. I dont think it would be fair to force that on those that dont need or want it. Those that need or want it can buy B feed power and put dual PSU's in their servers for proper redundancy. More batteries wont provide redundancy for failed downstream breakers and other gear - which does happen in plants as well. better to get B feed power. BTW - Our gensets are online and carrying load under auto swithover within 15 seconds. We run parallel sets with N+1 redundancy so that if one fails or we need to take one down for maintenance during an extended utility outage we are still fine. We test them every other week under load. One last thing - B feed power....... Last edited by sailor; 05-22-2008 at 05:49 AM.

Posted by NolF, 05-22-2008, 03:14 PM
So that was why sweet I was part of the 18% only network went down for a while ^^ I'm glad that nothing burnt up. Keep up with the good work ^^

Posted by realvaluehosting, 05-22-2008, 10:20 PM
Just for everybody's info. a few servers of our took over 12 hours to came back online

Posted by Echelon, 05-22-2008, 10:26 PM
It happens. Would you rather it not ever come back online due to the equipment taking a spike from the lightning strike? You have to give people some slack. Especially in this situation. This isn't a normal outage.

Posted by Linn_Boyd, 05-23-2008, 10:38 AM
You are so RIGHT! I will defend GNAX in this situation and outage. It appears to be out of your control. Things happen that you can not plan for. If you really need that extra uptime buy A+B power, HSRP ports, and put your servers on DC power! Most people here are looking for the "Best" deal, but forget to look at what they pay the datacenter for their backup solutions. Generators, Engineers etc are not cheap! If you want them onsite then pay for it and let your datacenter know. We have a customer that does payfor it and needs it. This customer understands that a single rack with power costs $3500/mo with these options and has no problem paying it!

Posted by FastServ, 05-23-2008, 11:49 AM
What good is selling customers B feeds when their upstream switches are not redundant?

Posted by sailor, 05-23-2008, 08:06 PM
please read the posts more carefully before jumping in on a thread. HSRP is available in our facility - which gives you network redundancy. Next time if you read carefully you wont look so silly. Have a great weekend.

Posted by sailor, 05-23-2008, 10:38 PM
Well - I guess I did not post it on this thread - it was on another one in another forum- so appologies - I am silly tonight So - for clarification - we have HSRP dual feeds available in the facility which is better redundancy since you are also protected agains hardware failure. I hope this helps.



Was this answer helpful?

Add to Favourites Add to Favourites

Print this Article Print this Article

Also Read


Language:

Client Login

Email

Password

Remember Me

Search