Knowledgebase

Wholesale internet down?

Posted by RISTMO, 02-19-2005, 07:20 PM
Is it just me or is www.wholesaleinternet.com down? I can't get to their site or mine (69.30.208.46). Rick

Posted by RISTMO, 02-19-2005, 07:32 PM
Not down -- upgrading :-D. Rick

Posted by Rman2003, 02-19-2005, 07:50 PM
Yeah, I've got a server there myself. Just got off the phone with James not too long ago. Seems they're putting in the new router.

Posted by Ankheg, 02-19-2005, 07:50 PM
Upgrading? Was there an announcement we missed?

Posted by Rman2003, 02-19-2005, 07:53 PM
They sent out an email to everyone in the billing system. Unfortunately, I'm not in the new one yet... you may not be either if you didn't get an email.

Posted by CodePHP, 02-19-2005, 08:02 PM
30 minute procedure taking over 2 hours now. I'm not a happy camper.

Posted by Ankheg, 02-19-2005, 08:03 PM
Nope, no email. Ah, well. Is there an ETA for their finishing?

Posted by CodePHP, 02-19-2005, 08:07 PM
No, but you can give James a call at 913-710-7378 Post anything you find out here so other people will know what's going on, thanks.

Posted by Rman2003, 02-19-2005, 09:29 PM
*tap tap tap*

Posted by CodePHP, 02-19-2005, 09:38 PM
Server came back up for a moment, but then went back down again. An update from WSI right about now would be great :-P

Posted by xachen, 02-19-2005, 10:10 PM
Yeah no doubt. I phone around 4:30 and was told it would be up again at 5 PM (30 minutes). Its 7:10 now and I'm getting a little upset about this. Their main site is online again but my server isn't (I'm sure yours aren't either.) An update would be extremely nice at this time.

Posted by Rman2003, 02-19-2005, 10:14 PM
It's 8:10 where I'm at, and their site is up for me here also.. that's SOME hint that they're headed in the right direction. But you're right.. my service isn't restored yet either. There's no reason this should have taken this long... how long could it possibly take for the BGP routes to clear themselves up? Last edited by Rman2003; 02-19-2005 at 10:25 PM.

Posted by CodePHP, 02-19-2005, 10:39 PM
My server came back online not too long ago, hopefully it'll stay that way.

Posted by CodePHP, 02-19-2005, 10:42 PM
Heh, two seconds later, it's back down again!

Posted by Cirtex, 02-19-2005, 11:18 PM
Didnt get notice here either, hopefully downtime wont last all night.

Posted by Rman2003, 02-19-2005, 11:21 PM
Hopefully.. because I'm not too fond of the phone calls I'm getting as a result of it. They're definately not 'friendly' conversation.

Posted by Ankheg, 02-19-2005, 11:21 PM
We're back up now, but can't reach their site. MRTG suggests we were offline right about four hours. :/

Posted by Rman2003, 02-19-2005, 11:24 PM
Their site is up for me at this moment... Unfortunately, my server is not. And from the time I got the first notification of being down (from a client) and called James, I'm looking at about 4 hours, 25 minutes.

Posted by xachen, 02-19-2005, 11:30 PM
It seems to have just came back online. WEll kinda, I see it is kinda flapping still but for the most part I think we are alright now

Posted by Rman2003, 02-19-2005, 11:35 PM
Lets hope so.

Posted by Rman2003, 02-20-2005, 12:14 AM
Just out of curiosity... is anyone else still down? Or am I just the unlucky one?

Posted by Cirtex, 02-20-2005, 01:03 AM
still down

Posted by Rman2003, 02-20-2005, 01:05 AM
Ok, so it seems that some are up, and some are still down. It's going on 6 hours now... c'mon guys! YOU CAN DO IT!

Posted by Ankheg, 02-20-2005, 01:35 AM
What errors are people getting? No route to host? Or is routing working, but traffic nor making it thru? When everybody was down earlier, we were seeing "no route to host" for our server...

Posted by Rman2003, 02-20-2005, 02:07 AM
When I try to ping my server.. it's been 100% loss since the get go at ~5PM CST. We're now pushing 7 hours. I'm no CCNA, but how long do these things usually take? Shouldn't there be some kind of a contingency plan for something like this , if it really is THAT major? There as to be a way to avoid this much downtime. Hopefully this never happens again, but if it were... what steps could be taken to minimize the impact?

Posted by WII-Aaron, 02-20-2005, 02:45 AM
Basically the issue is that everything that could go wrong did. We've actually been planning for this upgrade for 3 months. It never seems to go into production like it does on the test networks. At this point the current issues are: a. our level 3 connection, which, for some reason seems confused, keeps flapping. Traffic is getting through on Global crossing fine but level 3 is acting wierd. b. People with secondary IP ranges. We're having an issue with the new distro switches recognizing the secondary ranges that some of our customers have. At last count we had about 20 servers still offline. We are working to restore these as fast as we can. Believe me. We have all been here all day and really would like to get you back online as soon as possible so we can go home and go to bed. Aaron

Posted by Rman2003, 02-20-2005, 02:48 AM
Aaron, I don't think anybody has any doubts that you and James are there working your rears off to get things going again, and it's CERTAINLY good to hear from one of you two. I'm sure you've been flooded with calls. Heh. At least it's not still 'peak' hours and most people have long since gone to bed. Thank you for the response, at least now we know what's going on.

Posted by CodePHP, 02-20-2005, 04:08 AM
I was up online for quite a while, but now I'm back down again

Posted by NeedUptime, 02-20-2005, 04:50 AM
Yeah, I got the e-mail notice. They said: You talked to James? They must not answer the phone when a call comes from my area code. I called and left a message for him at about 5:30 EDT when I didn't come back up, as instructed. No answer. Pager also went unanswered. I went to bed at around 5 (was up all night dealing with MySQL problems) and then I woke up a few minutes ago. Its 3:43 EST and I am still down. I called, no answer. They probably both turned off the phones and went to bed. Apparently my two servers are part of the 20 that are down. Mine have not come back up at all from what I can tell, and if they did it would have been only for a few moments - not even long enough for the e-mail client to grab mail. Come on Aaron, lets get this taken care of. The amount of downtime my servers have experienced over the past couple of months is inexcusable. My patience is slowly leaving me...

Posted by NeedUptime, 02-20-2005, 04:55 AM
I am no CCNA either, but they are replacing a core router as I understand it. They call it the core router for a reason, its pretty much the heart of the network. There is no way to replace it without taking stuff down at least briefly. Now, it shouldnt take this long for thier datacenter size... but it looks like they are having some other "issues".

Posted by NeedUptime, 02-20-2005, 05:05 AM
Heh, thier site is up now. My servers are not.

Posted by NeedUptime, 02-20-2005, 05:08 AM
Actually, I take that back. One server is back up... future looking brighter... unfortantly the server that is back up isnt the one that I need...

Posted by WII-Aaron, 02-20-2005, 06:13 AM
No. I did not turn off my phone and go to bed and there's only one customer that I don't take calls from. If you sent a page with your IP address then I am working on you. If you're my favorite customer from Ohio then I sent you a request for your root password. You issue has nothing to do with today's upgrade. Aaron

Posted by WII-Aaron, 02-20-2005, 06:21 AM
James is out of town at his inlaws. probably in bed asleep since it's 4:15am here now. If you are still having issues, PLEASE submit a ticket to support@wholesaleinternet.com and I will get to you. I have 7 people still on my list, a couple whose problems are not related to the upgrade. I let the techs go about 1am and the other engineer left about 3:30. I am finishing up the last lose ends and then will take a break for some sleep. Then I will be back. I will post a full explanation of what went on as soon as the last server has been taken care of. I know this has been a pain in the a$$. Believe me. I would much rather be asleep but I think it will be worth it. So far we have customers reporting that thier ping times have dropped by as much as 20ms and one told me his transfer speeds had trippled. This upgrade also sets up the entire operation to be much more scalable. We should not have to do this again. Aaron

Posted by NeedUptime, 02-20-2005, 06:32 AM
I am your favorite customer, Aaron, and you will start picking up the phone when I call. Belive me, I am going to call James tommrow and I am going to have a chat with him. You will not get away with not supporting our boxes and telling me in a public forum that you dont take calls from me. 1. You sent a request for my root password. a. You have been given my root password so many times you should have it memorized by now. b. I did not get your e-mail because, alias, I have no e-mail access due to the fact that my mail server is offline. c. You have had almost a month to fix the default route issue. You should have set it up correctly the first time, when you built the server. You did not. You have been asked numerious times to fix it. You have not. We have followed all of your instructions on fixing that issue, they do not work. d. I will help you pull your head out of you know where on Monday.

Posted by NeedUptime, 02-20-2005, 06:34 AM
I sent the root password to: aaron@wholesaleinternet.com

Posted by NeedUptime, 02-20-2005, 08:02 AM
Well, I am finally back online. Thank god for small favors. Hope you all are back up now and stay up.

Posted by RISTMO, 02-20-2005, 11:15 AM
Well my pings take slightly longer than before the upgrade. My download speed is the same. I had one customer email me three times throughout the process eventually saying that he had over 6 hours of downtime, and if the process started at 3pm, he must have missed the first 3. I guess at this point, I'm just happy that things are working -- and that my other customers haven't complained yet... But I didn't receive the notification email....I wonder if there's a way I can be added so I know about these things in advance next time ;-). Rick

Posted by NeedUptime, 02-20-2005, 11:59 AM
Actually, I take that back. The other box is online. Both of them are. But on the box that matters, I can SSH to it only. There is no access from the web or anything else. Its like all incoming traffic to the ports have been blocked except for SSH. If I ping a domain from my other box which is also in thier network, it says host unreachable. I talked to James a couple of hours ago, he said he would call Aaron and let me know what was going on. Havent heard back yet, getting voicemail again. Who knows what has happened there. I know James it out of town, dont want to bother him... but on the other hand... something HAS to be done. This after I got the message from Aaron saying that he was "leaving for a bit" and start I should try typing "service start and the service I want to run", as if this is some server admin problem. This is getting really old. NectarTech didnt suck this bad when they had thier powerless datacenter (boy, that was a fun day) because at least you could get ahold of someone. James is out of town and Aaron dosent answer the phone for his "favorite customer in Ohio".

Posted by CodePHP, 02-20-2005, 02:04 PM
NeedUptime: If you're incessnantly calling them while they're trying to get something as major a problem as this resolved, then I wouldn't blame Aaron at all for not answering his phone when you call. Even in their e-mail it stated to not page the techs. You're sounding like "gimme gimme" kid the way you're demanding support there. Aaron: Pings have also dropped about 20ms for me, and speed has noticeably improved. Although the downtime wasn't fun, I must say these new speeds are fantastic!

Posted by NeedUptime, 02-20-2005, 02:17 PM
No, the messge said dont page the techs until after 5 EDT. And it specifically said to "contact them via the usual means" if you are still down after 5. The usual means are ticket, e-mail, phone and/or pager. And guess what, about 20 hours later, I am still dead in the water so I am entitled to page and ticket and call away, espically when I cant get any answers out of anyone. It would be nice for someone to simply say "We're working on it, it'll be fixed as soon as possible" but I am not even getting that out of them. I am getting silence. He never answers the phone when I call - network problem or not. I would tell you the story but its not worth the keystrokes. Problems with Aaron are ongoing for me but I will get James to straighten him out one of these days. I doubt anyone else is affected by the problem with Aaron as it is a personal one in nature and goes back a long ways. In thier defense, the ping times do seem to have gotten better to the one server that I can use... but this other server... the one I need... the one that has DNS and mail... that one just aint quite doing it. But I need both servers, and in all honesty, I need the other one more than this one and we do pay for two servers... not just one.

Posted by Rman2003, 02-21-2005, 04:19 AM
Seems we're back down again for another round. GUH!

Posted by CodePHP, 02-21-2005, 04:30 AM
Same here

Posted by Ankheg, 02-21-2005, 05:00 AM
Was this one announced that I also missed, or is this a surprise to everyone? At least this is the quietest part of the day for the server we have there...

Posted by CodePHP, 02-21-2005, 05:07 AM
Nope, no annoucement, just plain old crappy downtime

Posted by Rman2003, 02-21-2005, 05:16 AM
Just spoke with James via MSN. He's on his way downtown now.

Posted by NeedUptime, 02-21-2005, 05:35 AM
What James's MSN nic?

Posted by NeedUptime, 02-21-2005, 05:42 AM
I dont know whats going on, Aaron keeps trying to blame my server. Both of my servers are0" online". One is working just fine. Unfortantly, its not the one I need. Naturally. The other once, I can SSH into it but it isnt reachable by any domains. Aaron keeps saying its an IP problem on my box. But the odd thing is the box worked just fine until they did the upgrade, and I havent been in to change anything on the box. Tracing route to [Redacted] over a maximum of 30 hops: 1 <10 ms <10 ms 10 ms 10.35.96.1 2 10 ms <10 ms <10 ms 10.35.96.1 3 <10 ms 10 ms 10 ms srp2-0.clmboh1-rtr2.columbus.rr.com [24.95.81.19 0] 4 <10 ms 10 ms 10 ms srp10-0.clmboh1-rtr4.columbus.rr.com [65.25.129. 100] 5 10 ms 10 ms 10 ms son0-1-2.ncntoh1-rtr0.neo.rr.com [65.25.128.221] 6 10 ms 20 ms 20 ms son0-1-3.mtgmoh1-rtr0.cinci.rr.com [65.25.128.24 6] 7 10 ms 20 ms 20 ms pop1-cin-P3-0.atdn.net [66.185.133.9] 8 10 ms 20 ms 20 ms bb1-cin-P0-0.atdn.net [66.185.133.0] 9 130 ms 41 ms 230 ms bb1-chi-P3-0.atdn.net [66.185.153.50] 10 20 ms 20 ms 30 ms pop1-chi-P0-0.atdn.net [66.185.141.85] 11 20 ms 30 ms 20 ms GlobalCrossing.atdn.net [66.185.151.222] 12 50 ms 50 ms 50 ms so0-0-0-622M.ar3.KCY1.gblx.net [67.17.71.233] 13 70 ms 50 ms 50 ms American-Fiber-Systems.ge-0-2-0.ar3.KCY1.gblx.ne t [208.49.195.126] 14 50 ms 50 ms 50 ms 206.165.102.2 15 * * * Request timed out. 16 * * * Request timed out. 17 * * * Request timed out. Now if this is a problem local to my box, I should get at least to the 69.30.199.x network where my box is, but instead I am not even reaching thier core router I dont think. I had several other people on IRC trace the route as well (they dont use my ISP) and they all stop on 206.165.102.2. This second server has been like this all weekend, I have been down on that box since they started the upgrade on Saturday. It never has come back up. Just talked to James. He is going over to get Aaron and they are going down there, looks like its not just me thats having problems Last edited by NeedUptime; 02-21-2005 at 05:49 AM.

Posted by Rman2003, 02-21-2005, 06:55 AM
Looks like it's back up now. Time to sleep!

Posted by NeedUptime, 02-21-2005, 07:00 AM
James called, said he was rebooting the router. Apparently that has been done, but mine is still down. Last edited by NeedUptime; 02-21-2005 at 07:04 AM.

Posted by NeedUptime, 02-21-2005, 08:10 AM
Its alive.

Posted by WII-Aaron, 02-21-2005, 03:09 PM
Good morning guys. Here is the short version of what happened, with some history. About a year and a half ago Wholesale Internet moved to a new facility. The new place was massively oversized for us and we filled less then 10% of it. At that point our network was pretty simple. A couple small routers agregating our bandwidth into some basic switches. Once we were in the new facility we started to grow.... rapidly. The decision was made to move to larger equipment. We purchased a Cisco 7513 and migrated to it. The box is a workhorse and served us well with few exceptions. One of the issues with "older" cisco equipment is it's inability to deal with large amounts of small packets. Unfortunately most attacks come in this form so everytime someone would get attacked, the router would choke. The other issue is that when we implimented it we had about 20 VLAN's. The 7513 did our inter-VLAN routing in addition to holding our BGP tables. When we took it offline, the 7513 had over 500 VLAN's it was routing for in addition to our BGP tables. Back in October we were approached by some large, well known companies who wanted to purchase services from us. These companies expressed concerns that our infrastructure wasn't robust enough to continue to provide reliable service if we added thier networks and growth to our own. over the last year our business has been drifting more toward high end colocation and away from "budget" dedicated servers. These new, large companies commited thier business to us with the provision that we upgrade our network infrastructure and capacity in the near future. At this point I started to design the new network. The new design is many times more stable and scalable (It is, I swear...) and brings us up to a level where we are able to compete with larger colo facilities and service providers. When James said we were replacing a router he was somewhat misleading. We were replacing everything. Edge routers, core routers, core switches, distribution switches, etc... We gutted the entire infrastructure and rebuilt it all from scratch. If everything had gone exactly as planned this would have taken an hour. If we ran into a glitch or two it should have taken 2 hours. Everything was ready to go, all the equipment ready to be moved in. Everything configured and tested. I should have known it wasn't going to go well when we put the first router in the rack. Sitting on the bottom the holes didn't match up so we had to have two people lift a 350 lb. piece of equipment 1/4 inch off the floor while a tech screwed it in. This took almost an hour and a half. Here are some other fun things that happened: 1. Strand of fiber came away from one of the fiber patch panels and had to be reterminated. 2. Tech forgot to label one of the patch panels. 48 lines to retrace. 3. Local customers decided it would be a good time to drop by unannounced and ask questions. We had to put the place in lock down and post a guard. 4. IOS version on new router kept flushing the routing tables every 20 seconds causing our BGP sessions to flap like a hummingbird. This caused level3 to dampen us and not take routes from us for over an hour. We had to downgrade to a different version. 5. Older version doesn't support multiple subnets on a single VLAN. Needed to fire up the old core router and route any multiple-subnet VLAN through it. Adding hours for reconfiguration and re-routing to the downtime for these customers. 6. All the normal issues with repatching hundreds of cables. Cable gets plugged into port 5/13 on switch 12 when it needed to go to 5/13 on switch 23, etc., etc. Even with all these issues, every single one of our suite, cage and multiple machine colo customers were back online by 5pm. (then we had the level 3 issue about 1:30 or 2am) Once we realized that we wouldn't make the 2 hour window we went into triage mode. We started at the top and worked our way down. The last people to come back were anyone with multiple IP ranges on a single server and anyone on a shared block (People that don't have thier own VLAN). When everything was done, we went through every support ticket and checked anyone who had submitted one. When I left at 7am Sunday morning all submitted tickets related to connection problems had been resolved. I returned to the DC on Sunday at noon and sorted through the tickets that had been submitted over the 5 hours I wasn't there. (About 10) As of yesterday afternoon the only connection issues were the normal: reboots, IP screwups, "I locked myself out of my machine", etc. Here are some of the common questions I've gotten: What happened last night? The new core routers did not go down. The old router, which is still servicing the customers mentioned above did not go down. The issue was with the trunk running from the old router to the switch. It failed due to a misconfiguration. It was brought back up and a failsafe configured and added. Only customers on the 192, 208 and 209 networks, and those having multiple subnets allocated to a single interface were affected. This accounts for less then 10% of our total customer base. This was a royal pain in the @$$. Is it going to be worth it? We hope you think so. With few exceptions, customers have been reporting latency on average of 20ms less then before. They have also been reporting transfer speeds up to 3x as fast. What happens the next time you need to upgrade? At this point our core systems are fully redundant and our network design completely modular. If we need to replace a router or upgrade in the future, we can do so without taking everyone offline. How much did you really upgrade? Is this new setup really going to last? Our total core processing capacity before was about 100,000 packets per second with 2Gb of throughput. Our new core currently allows for 288 Million packets per second with 384Gb of throughput. Exactly what kind of equipment did you put in? While we don't release exact specs on the equipment we use for security reasons, rest assured that it is not yesterday's datacenter close out deal. It is top of the line and still has that new equipment smell. Last I would like to thank all our customers who were down longer then expected. The vast majority of you were very understanding. Thanks for not showing up at the DC at 2:30pm so you could get a good seat to watch the upgrade. (It totally blew my mind that people would do this.) And, this is done WAY to rarely in public, thanks to my technical team. Bryan, Andrew, Ty and Nate who were promissed beer at 7pm and ended up pulling an all nighter instead. If you are having an issue of any kind, please submit it to our ticket system at support@wholesaleinternet.com. I know james has a sexy voice but when I'm checking to make sure everyone's been taken care of, I check the ticket system. Aaron

Posted by RISTMO, 02-21-2005, 04:23 PM
Glad everyone else has better service now, but since the upgrade, my ping times have gone from 50 to 80 on my dsl connection at work....and my mysql database takes a lot longer to access and update now. Maybe it's just me? Rick

Posted by sightz, 02-21-2005, 04:37 PM
WHY did you decide to upgrade everything at once? I would expect a fundamental redesign like this would be built up and rolled out in stages, but I'm not a network engineer.

Posted by WII-Aaron, 02-21-2005, 06:54 PM
try now. There was a duplex mismatch on your port. Aaron

Posted by WII-Aaron, 02-21-2005, 06:56 PM
We are. This was phase 1 of 3. The remaining phases involve adding additional edge capacity and new bandwidth providers as well as an infrastructure buildout in our new expansion. They won't involve downtime for any current customers. Aaron

Posted by CodePHP, 02-21-2005, 06:59 PM
What IP range or port was that duplex mismatch on? I've been noticing some of the same thing. Seems to have just improved now though.

Posted by Rman2003, 02-21-2005, 07:07 PM
Glad everything seems to be back online now. Seems like we've gotten through the rough stuff. I think the biggest problem with the upgrade, were the amount of people not in the new billing system that didn't get any type of a notification. Yes, it's on your website, but aside from checking the latest prices, how many people actually visit just to visit? Not I, and judging from the other responses to the downtime, not many others either. I'm hoping that when I make my payment this month, I'll be able to be added to the new system and recieve notifications of any future issues. Ping times have stayed roughly the same where I'm at, but transfer speeds have increased dramatically from what they were. As big of a pain in the arse as this was... I still have to give you kudos for staying on top of it as best you could.

Posted by zafarnc, 02-22-2005, 06:21 PM
Is it just me, or is WII down again. My server is also.. But at least when it was up i got about 10ms cut off the ping times

Posted by NeedUptime, 02-22-2005, 06:23 PM
They just went offline 5 min ago. James said Aaron is working on the core router, had to put some settings in again. He said it will be up in a few minutes.

Posted by NeedUptime, 02-22-2005, 06:25 PM
I went from ~110ms to ~57 on average. I ping flooded my server for 12 hours after it came back up, the average was 57 which is pretty damn sweet.

Posted by maxhest, 02-22-2005, 06:26 PM
*edit* Didn't see previous posts

Posted by CodePHP, 02-22-2005, 06:38 PM
Servers are down yet again, this is really starting to get bad. I haven't been able to reliably conduct business for the past 3 days. Last edited by CodePHP; 02-22-2005 at 06:42 PM.

Posted by NeedUptime, 02-22-2005, 06:40 PM
I knew there was a reason I called James. And yes Aaron, the upgrade was worth it - I think. It sounds like you all have beefed up the network to the point that these issues will be a thing of the past. I sure hope so. I was impressed with the increased response time, it was needed. However, you all really should have waited for the recabling effort. That could have waited until a non-peak time, i.e. 4 AM some day and been done one box at a time, that way the outages were only a minute or two per box.

Posted by Rman2003, 02-22-2005, 07:14 PM
Yep, here we go with the phone calls and IM's again. C'mon guys... This isn't good.

Posted by NeedUptime, 02-22-2005, 07:32 PM
A few minutes is turning into an hour...

Posted by Cirtex, 02-22-2005, 07:38 PM
Down again, new router again?

Posted by CodePHP, 02-22-2005, 07:47 PM
We're finding another provider once our server comes back online. This is absolute crap WSI.

Posted by NeedUptime, 02-22-2005, 07:53 PM
Called James's cellphone... his voicemail OGM says something about they are having problems with a lucent switch and they are migrating those customers off that onto something else... send an e-mail to support@wholesaleinternet.com (even though thier site is down) and they are really sorry for the hassle, etc...

Posted by RISTMO, 02-22-2005, 07:55 PM
I think you turned the knob the wrong way. I went from 52 to host cannot be found ;-). ... and I just had another client email me about the past 2 days. As hard as it's been on my reputation being hacked, fixing that, and then being down off and on for the past couple days. They aren't believing me anymore when I say that the downtime was a temporary thing that we fixed. I'll be in a real fix if it goes down while I'm at work tomorrow, and I make the two hour trip only to find that I can't access my server to do billable work. So yeah, hopefully things start staying up soon.... Rick

Posted by Ankheg, 02-22-2005, 07:56 PM
Well, we're back up... with something like 70% packet loss, erratic routing, and what seems like a lot of DNS timeouts. Woo hoo! Sorry, I exaggerated slightly. To be fair to Aaron, James, and everyone else at WII, I only had 66% packet loss on my most recent pings: --- ns1.redpin.com ping statistics --- 100 packets transmitted, 34 packets received, 66% packet loss round-trip min/avg/max = 23.5/11.6/429495476.3 ms But those 23.5ms pings almost make it worth it. Last edited by Ankheg; 02-22-2005 at 08:11 PM.

Posted by CodePHP, 02-22-2005, 08:00 PM
My server is still down, maybe they dropped a couple boxes while migrating, not a happy camper here. Last edited by CodePHP; 02-22-2005 at 08:12 PM.

Posted by CodePHP, 02-22-2005, 08:23 PM
2 hours of downtime and counting for today, does anyone have a running total of the downtime we've had over the last three days?

Posted by NeedUptime, 02-22-2005, 08:30 PM
I have a running total on downtime for my main box... its been dead like 12 days of the past 30... but I am getting a double whammy here. Not only do I go down for the network, the box has a problem - never setup properly - and it looses the default route everytime the box gets rebooted. Getting someone to go down there and fix it is like pulling teeth sometimes. James credited me a month of service and said he would have Aaron fix the route. I belive that when I see it, Aaron isnt known for being espically prompt at fixing things or keeping his word.

Posted by CodePHP, 02-22-2005, 08:37 PM
Beginning to become "my favorite datacenter from KC."

Posted by chuckt101, 02-22-2005, 09:17 PM
well this stinks...

Posted by xachen, 02-22-2005, 09:36 PM
I am really getting sick of this. Downtime AGAIN. I cannot afford this anymore. Aaron: What am I going to get out of this for all these excessive downtimes? Not to offend you or James but this is highly getting ridicoulous.

Posted by Rman2003, 02-22-2005, 09:39 PM
Was back up for about an hour... down again. Not sure what IP range everyone else is on.. but I'm in the .199's.

Posted by xachen, 02-22-2005, 09:41 PM
.200's here

Posted by Rman2003, 02-22-2005, 09:47 PM
Seems to be back online now... it's like a rollercoaster.

Posted by NeedUptime, 02-22-2005, 10:01 PM
199s and 193s here. I am still down on both boxes.... I have a constant ping running on both... I have not gotten a single response since my first post today, so I dont know what the 199ers who say they were up for an hour are talking about. I just ordered another server at another NOC. I may keep the WI boxes around for remote backups or secondary failsafes or something, but they can no longer be trusted to be primary production boxes. Well, what do you know? Thier site is back up now. 69.30.201.9. Looks like the 201 people are made in the shade.

Posted by NeedUptime, 02-22-2005, 10:37 PM
All right, come on Aaron. I am losing my patience with you again. Its been something like five hours now. Have you completely lost control of the situation? How much longer are we going to be down? Do you even know what you are doing at all? What did you do, power the router off in the middle of a firmware upgrade or something? Here is an idea. Lets plug the old core back in and route people through it until you all figure out what the major malfunction is with the new one. I wish you all still had the webcams in the office so that I could verify that you havent went home for the night. It wouldnt be the first time you have left me down and went home because you decided you were done for the day. Many other people seem to be up and your site is back up but I am not. Did you intentionally leave my boxes offline yet again? I know just how hard it is to get your sorry behind down there to fix the default route and it seems like I am always the last one on the list to have his box restored when something goes to pot. So lets get this crap taken care of. Urge to kill.... rising!

Posted by NeedUptime, 02-22-2005, 11:09 PM
Is there anyone still down now or am I the only one?

Posted by Ed_Case, 02-22-2005, 11:14 PM
I'm still down, 69.30.198.* their homepage states they are having a problem with a switch so all we can do is wait for it to be fixed/replaced.

Posted by NeedUptime, 02-22-2005, 11:21 PM
Sorry to take pleasure in your demise, friend... but usually when I am the only one still down that means that its intentional and they have left for the night leaving me down... so I am glad that I am not alone. Usually when thier homepage is up and both of my boxes are down that means I have been singled out to be bent over and taken advantage of. Just called James again.... he is "doing my boxes next". Who knows what that means. I hope that whatever it is will bring about a positive result.

Posted by NeedUptime, 02-22-2005, 11:45 PM
Well, both boxes just came back up. Are we out of the woods yet? Time will tell.

Posted by anon-e-mouse, 02-22-2005, 11:48 PM
NeedUptime, would you rather they spent time answering your comments here or working to get you back online?

Posted by xachen, 02-22-2005, 11:55 PM
That is what I was just about to say. Most of us will make our statement each time it goes down but we don't go egging it on and on...

Posted by Rman2003, 02-23-2005, 12:09 AM
It's almost funny how we (as customers of WSI or any other provider) come together when something goes wrong. However, we'd probably not even be aware of eachother if nothing had happened. I honestly didn't know there was anyone else that used WSI on this board. I'm obviously Ryan. Aside from the circumstances... it's nice meeting you all.

Posted by xachen, 02-23-2005, 12:12 AM
Nice meeting you too. I knew there were other WSI members here but didn't know who hehe.

Posted by WII-Aaron, 02-23-2005, 02:36 AM
I hate this thread. Years of great stability and then all it takes is a couple bad days to blow it all. Maybe we'll become the flavor of the week? Today we lost a switch. By the looks of things here it happened to be our Webhosting Talk switch. It has been fixed and everyone should be back to normal. If you're not, please let me know. support@wholesaleinternet.com As for the future, I don't forsee this happening again. It was just a fluke that this switch went down at about the same time as we did our upgrade. As for not posting here sooner... When something like this happens we do nothing but focus on solving the problem. Posting here comes after the last customer is back online. This outage affected roughly 30% of our customer base. Aaron

Posted by Rman2003, 02-23-2005, 02:46 AM
I have to agree. In the short (a couple months) time that I've been with WSI, this little fiasco is the first time I've experienced any downtime. As far as losing the "Webhosting Talk Switch"... I'm holding you to that.. go label it now. I don't want someone else stealing our switch if their's breaks! Good job on getting us back online.

Posted by NeedUptime, 02-23-2005, 08:57 AM
Boy, did you ever. I talked to James for almost an hour this morning, Aaron. He called me at about 4 AM and woke me up to tell me that everything was back on. I know what really happened. Out of respect to James - and because you would probably lose alot of customers and I dont think it would accomplish anything at all other than make alot of people angry - I am not going to tell everyone else. You should be thankful that you work for James and not me. Because if it were left up to me - knowing what went wrong and how - I would have canned you tonight - assuming I would have been able to resist the urge to do so at some point over the weekend. You need to think before you do things. You need to plan for the possibility of catastrophe before the catastrophe becomes reality. You know exactly what I am talking about and you know exactly where you failed. This second outage was very preventable and you and I both know that. I am back online now. I hope I stay that way. If I ever hear of an outage happening under the circumstances that this one did again, one of us will be parting ways. It'll either be you or me. Everyone is entitled to make mistakes and boy you sure have made alot of them this week. But as James said the service has been very reliable up until recently. And as the other poster said none of us knew of each other until WI went down, which is a good thing. You're getting a second chance with me. I would encourage the other customers to give you one as well. Knowing what I know, I think things just snook up on you and overwhelmed you. And that's all I am going to say about this topic.

Posted by WII-Aaron, 02-23-2005, 11:43 AM
I don't think it's any huge secret. The switch lost it's config. It's NVRAM got scrambled. Like all of us, you included if I remember correctly from a month or so ago, my backup was not as current as I would have liked and because of the recent, major changes in the structure of our network I had to rebuild most of it from scratch. You seem to think I have it in for you yet I'm the one that defends your right to host the sites you do when the complaints come in. I'm the one who gave you a home when all your other providers threw you out and then tolerated your payment issues for over a year. I even drilled a hole in our firewall so you could do some IRC stuff when our policy forbids it. I get repaid with 20 emerency support tickets within 2 minutes for a server was wasn't down. When I delete all but one and respond to it I get another 10 support tickets that all say "Stop deleting my f***ing tickets!" I brushed it off when I stepped out to get a bite to eat, leaving my personal cell phone on my desk, and came back 45 minutes later to find that I had missed 37 calls from you. Again, your server was pingable and SSHable. I am not your personal system admin. It is my responsibility to make sure you can access your server. I made sure you could access it via SSH and DirectAdmin. You also seem to have some misconceptions about our company structure. James and I have been in business together for over 5 years. I do not work for James just as he doesn't work for me. Like any couple (I mean that in a strictly platonic, hetrosexual way) who spend so much time together we sometimes have our differences. We do not air them in public and when push comes to shove, we will always back eachother up. Aaron

Posted by sightz, 02-23-2005, 12:10 PM
Anything in your TOS about being able to "fire customers"? Sounds like you have some real winners. :-)

Posted by NeedUptime, 02-23-2005, 12:24 PM
Aaron, dont sit there and jerk me off. I heard about the USB thumbdrive. And I heard about the type of scrambling that occured. I am not stupid, Aaron and I can put two and two together. You screwed up and you know it. And yes, I was recently guilty of not keeping a current backup of a server. The difference is that I am not hosting other people's sites. I am not in the hosting business. My lack of keeping a backup only impracted my ability to do business. It did not affect hundreds of other people that pay me to do something. There is no comparison to be made there. I paid James to get the hard drive out of the hacked server since you wouldnt do it. You dont seem to understand something very basic here. You have posted elsewhere in this thread that you refuse to answer my phone calls. There is only one customer that you dont answer calls for, you said. Its not just you are busy at the moment and you cant talk. You also do not return my messages that I leave. I have not spoken to you on the phone for at least six months. You rarely respond to pages. You answer urgent tickets when you feel like it, if you feel like it and how you feel like it. That has gone on for months, its not something new. You do realize that if you would simply answer my calls and pages the first time I send them, I wouldn't have to call you over and over and over and over. But since you never answer my calls, rarely answer my pages and do tickets when you feel like it, I am left with no choice but to send you repetive pages and calls in an attempt to encourage you to pickup the phone. Try it. Next time I page you and say a server is down, reply back with "Working on it" or something similar and you'll see that I dont page you over and over and over, or slam you phone with calls. If its still down several hours later, I'll ask for a status update. As long as I know that someone is still there working on it, thats fine. But you refuse to give me that satisfaction, you know that I have an extreme dislike for silence and knowing this you ignore me completely and then complain when I get nasty. I have always paged or called once when something happened and then given several hours for a response before I plug your number into dialpad and start calling it over and over and over and over, and copy and pasting the same message into e-mail and SMS'ing it to you numerious times. James has explained this to you numerious times. He has told you that when someone calls you if you answer it the first time they call you wont hear from them again. But if ignore thier pleas for help, they will keep calling. Its really an easy concept, even the dumb kids can understand this one. Its called customer service, something that you are not very good at and dont seem to care about at all. No, you are not my personal admin. But you built the server, Aaron. You set it up and you charged me to set it up. It *is* your job to set the server up so that it works properly and can connect to your own network when it is turned over to me. You failed to do this. Every single time the server is rebooted, it loses the default route. This has been a problem since day one and you know it. You told me to add a line to the inittab file and I did so. It did not take care of the problem. I have told you this several times and you reply back to tickets with "Your server is back up now" and other meaningless responses. You have went back and forth with James for some time on this and you have told him you wont fix it and its not your problem. And I say it is your problem and you will fix it. Our level of service with you is "semi-managed". That means that you will fix things occasionally and they might not all be hardware related. It does not mean that I call you when I forget my e-mail password. It does not mean that you debug my scripts. But it does mean that you keep the server reachable, espically when the issue in question is a network connectivity issue that orginated when you set the server up. The problem that kept us down all weekend long after your upgrade was done had to do with the server losing all of its IPs. The server worked fine until you started the upgrade. James told me you powerered the server down to move it into another rack during the upgrade. Because you simply pulled the power cord instead of shutting it down properly (since you have root access to the box, you could have went in and done "shutdown" (I think), I suspect that you caused the routing tables to become corrupt. I had to manually add all of the IPs back into the box. They were present in DA. Again, this is your fault and something you should have fixed it. Instead, you responded back to all my tickets saying that SSH was working. I knew that and that wasnt the problem I reported. Its like me telling you my server is down and you respond back and tell me that you paid the power bill. SSH working is irrelevnt. I was very specific on the tickets and you deleted all of the tickets on this topic in my account now. There are only two - one that I opened earlier. However, instead of fixing the problem, James told me that you went home and told him "He can SSH to the box so its not my problem". This went on all weekened and left me offline until Sunday afternoon. If James keeps backing you up after you do something amazingly incomptent like this, both of you will find themselves without a customer. Maybe thats what you want. I dont care one way or the other. Its your business. You leave me with no choice but to air complaints in public, Aaron. You dont answer the phone where I could air them with you privately, and I dont like complaining to James about you because James isnt the source of the problem. And for the record, my other datacenter did not throw me out. I left on my own. If you remember, I came to you when NectarTech lost power at ePaul200. I dont keep servers in datacenters that arent capable of keeping the power on, something you have thankfully not yet had a problem with. I went to another host who sold me a reseller account so that I could regroup and get things back together after what I call "The Great NectarTech Clusterf**k" and restore stuff as quickly as possible. That host didnt want adult sites on his network but allowed them to stay until I found a new host, at which time I called you. The only datacenter I have been thrown out of is CI Host. They threw me out when I put something in my server's MOTD that one of thier Christian noc techs found offensive. And they arent well known for thier comptence or good company ethics, either.

Posted by BigBison, 02-23-2005, 01:15 PM
No choice but? I'm sure everyone here has dealt with less-than-satisfactory response times from a host, particularly if they're in crisis mode. It seems a bit to overreact by cut-n-pasting dozens of messages and using dialpad to simultaneously mail- and phone-bomb a host if they don't immediately reply. I can see where such antics would wear thin, leading to deliberate avoidance of your messages, and I'm surprised WSI hasn't asked you to move on because I know I would have. This behavior doesn't "encourage" anything but silence. Try a polite, professional attitude sometime, and see if that doesn't help.

Posted by WII-Aaron, 02-23-2005, 01:47 PM
Keith, I'm not sure what you are talking about with a thumb drive. I put a copy of the router config and IP lists on one for James so he could trace cables for me and associate them with thier VLANs. Was this some kind of mistake? (Since James kept thinking that the network addresses were the IP's of the machines maybe. ) I don't see how that has any bearing on anything. I was also wearing a red shirt and khaki pants if that makes a difference to you as well. Oh, and I squat to config a switch, not stand. I also touched my face 177 times, blinked over 500 times and danced a little jig when everyone was back online. As for scrambling... hell yes I scrambled. When someone's down, especially when multiple people are down, I scramble. I really don't even know how you get away with posting here at all since your last user ID was banned for life (posting threats toward the president was it?) and I'm pretty sure it's against forum rules to create a new one to circumvent a ban. I really am a nice guy. Very laid back and tollerant. Even I have my limits though and can be pushed to far. It's obvious to me, and everyone else here, that our relationship isn't working out. BigBison is right. I see no reason to continue it. I will check your billing status and if you are in good standing your service will continue through the end of this cycle. If not you will be terminated immediately. I have given you to much latitude and I have let James talk me out of canceling you to many times. Aaron Last edited by WII-Aaron; 02-23-2005 at 01:58 PM.

Posted by NeedUptime, 02-23-2005, 02:23 PM
Look Aaron, I am sorry if I pissed you off. I dont get paid to squabble with people, I make money when stuff sells and I cant do that when I am not online. All I want is for my service to work right and when something happens I want it to be taken care of. Its possible that I over-reacted to this situation for which I am sorry. But you have to understand how frustrating it is to experince the problems I have had with this. If you'll fix the default route issue and promise to handle any future situations better, I'll forget about this and I will try to be more professional in my contacts on future issues.

Posted by CodePHP, 02-23-2005, 04:54 PM
Lol, I think it's a little late.

Posted by MaB, 02-23-2005, 05:32 PM
Not to sound harsh, but every single web host in this forum has at one time or another gone into emergency mode, where all of our time is devoted to solving an extreme issue at hand. When this happens, I know that no one would like to see dozens of duplicate tickets or 37 phone calls from one customer - that will only slow down the progress which is not the goal. The goal in this type of situation is to fix the problem. After reading the way that you and Wholesale interact, it is probably in the best decision for you both to go your separate ways. You are allowed to be upset for the downtime, you truly are and I would be too - but everyone grows, and everyone has growing pains, and that is 100% not an excuse, but it's not as if they took your money and ran off - they were working their heads off to get everything back in order, not slacking around. Everyone has had a serious upgrade that was planned and supposed to take 5 minutes go wrong for a dozen reasons - it happens every day and all of the time. In the end, it's probably better for your relationships if you went your separate ways. edit: spelling Last edited by MaB; 02-23-2005 at 05:38 PM.

Posted by maxhest, 02-24-2005, 08:12 PM
OK, I think this thread has had it's time, and its time for it to be closed. One person has already gotten banned, and I think this should be closed. Mods?

Posted by sightz, 02-24-2005, 08:19 PM
Interesting. NeedUptime had a bad attitude and needed to be fired as a customer, but I wonder exactly WHY he was banned?

Posted by Rman2003, 02-24-2005, 08:26 PM
Because he was previously banned under a different username, and created a new one.

Posted by gina_, 02-24-2005, 11:00 PM
Perhaps you could continue your differences in PM? Thanks.



Was this answer helpful?

Add to Favourites Add to Favourites

Print this Article Print this Article

Also Read
vBulletin (Views: 586)


Language:

Client Login

Email

Password

Remember Me

Search