<COMMUNITY>
Post of the Day
April 16, 1999

From our
Improve the Fool Folder

Posts selected for this feature rarely stand alone. They are usually a part of an ongoing thread, and are out of context when presented here. The material should be read in that light.


Go To: Post | Folder

Subject: Re: Lots o'glitches, lately
Author: DwightG

Hey There,

First things first: My apologies for the delay in responding to your note. I have been a bit busy and wanted to make sure you got a thorough explanation. Speaking (writing?) of which -

So ya wanna know about the technical problems we are experiencing on the Fool web site? OK. Here's a partial list:

Quotes
Our quote feed was down for 30 hours. The reason was that the power supply on one of S&P's controllers died. The controller is the computer that talks to the satellite dish that talks to the satellite to get the quote information. S&P provides our quote data. It took us awhile to diagnose the problems as we are not supposed to touch the equipment � it belongs to S&P, not us. Once we did figure it out, we had to wrangle with S&P tech support to send us another one. My favorite part of the wrangling:

FoolTech: "Can you send us two of them so we have a backup?"
S&P Tech: "You don't need an extra one. These don't fail."
Uhhhhh ... fascinating information given our situation.

So what are we doing about this? We're getting pricing for a fully redundant quote feed (from the satellite dish to the quote real time database server and all the parts in between).

Bandwidth
Have you noticed that since last Monday (the 5th) connectivity to the Fool web site has been really bad? If not, you're probably not on UUNet's backbone. Connectivity has been horrible between Global Center (our web hosting provider) and UUNet. I won't go into the long story here or point fingers, suffice to say Global Center is working with UUNet to fix the problem.

Network Appliance Filer
We use a Network Appliance Filer to serve most of our graphics files. Unfortunately it has been crashing frequently over the last two weeks. Have you seen any broken image links or ads that would never paint during the last 2 weeks? This is why.

So what are we doing about this? Earlier this week we took the Filer out of rotation completely. Network Appliance has discovered a bug and they are in the process of coding a fix for it. Once we get the patch, we'll test it for a couple of days and then roll it out to a live server for a couple of days. If it's stable, we'll roll it out to the rest of the site.

Ad Server
Until recently we were using an Adfinity ad server. It was not doing the job, causing the site to be painfully slow and/or the servers to crash. So we recently switched to NetGravity. The roll out of NetGravity was much less than optimal, mostly due to some IIS issues (see below). After two weeks, I think we finally have the system stable.

We're using IFrames for the ads. This is great on IE 3 and later since the page is drawn separately from the ads. The result is that you will see the page content while the ads are painting. On Netscape Navigator it does not work quite as well since the page has to wait for each ad to be drawn before it can draw the rest of the page. Although Internet Explorer and Navigator completely finish drawing the page at roughly the same time, IE will show parts of the page sooner since it supports IFrames for the ads and renders tables more quickly. It will be interesting to see what the upcoming open-source Mozilla 5.0 is like. It looks very promising.

We also added more fire power to the ad servers. We have 4 monster servers serving up ads. If the ads are slow, it's probably not server power. More likely it's poor connectivity.

Microsoft Transaction Server (MTS)
We have been seeing some problems with MTS. So last Thursday (a week ago), we called the MTS support folks. They promised a 24 hour call back. We have heard nothing from them yet, more than a week later. I could make all kinds of snide comments here. I won't.

Microsoft SQL Server/Cluster Server
We've had some problems with SQL recently. It just resets itself or fails over. This means that all of our data stuff (ports, boards, etc.) goes down for a minute or two. You've probably seen this. We're still researching this one. In fairness to MSFT, the problem could be with the hardware driver. We should know by Monday.

Microsoft Personalization Server (MPS)
Have you ever seen your favorites or portfolios show up empty when you had favorites or boards before? One of the two reasons for this is MPS. There are so many anomalies here, I cannot list them all. We're currently writing code that will allow us to remove MPS completely from the Fool site.

One important note: If you get a page that says your favorites/boards are empty and they shouldn't be, don't freak. The data is not gone. MPS just doesn't want to share it. Usually hitting Reload will fix the problem. Yes � I know this is a PITA. That's why we're dumping MPS.

Microsoft Internet Information Server (IIS)
We use IIS as our web server. It's not very robust. It dies on a regular basis and has to be restarted. Note: the problem here is not Windows NT. Our servers themselves have been quite stable. We very rarely have to reboot them. In contrast IIS needs to be restarted multiple times a day on each server. Warning: Technical jargon ahead!! Read on at your own risk.

We have seen plenty of times when an ISAPI (NetGravity for example) asks IIS for memory but IIS will not allocate it. This despite the fact that we have a couple of hundred MB of free RAM available. IIS crashes shortly after this. Before it does, it freaks out and spews error messages. Ever see an Ouch message on the Fool site? This is why. We also see times when ADO makes a request and gets nothing back � no data, no error message, nothing. That is another reason for blank favorites and ports. We had an object on our servers that was somewhat suspect (the old Adfinity object). This will be removed on Monday. Then we can wrangle with MSFT on this problem. Hopefully they will return our phone call.

One of the strangest problems we have seen with IIS is incorrect 404 errors. Have you ever seen the Haiku "page not found" error message - You step in the stream, but the water has moved on. This page is not here? Chances are the page actually is there, IIS just thinks it is not. This is so bizarre. IIS will serve the same page for 3 hours and then decide the page does not exist. We have seen this numerous times and checked to make sure the page is there. We've copied it back up to the server. We've changed the file. Still, IIS refuses to acknowledge it is there. The only way around this is to restart IIS. Truly bizarre.

The Fool And Unix
If you are about to flame me for using NT/IIS and/or suggest that we switch to some flavor of Unix, please don't. I've already had enough of those and really don't need any more :). In fact, I have a couple of my techies playing with Linux and Apache. We're looking at PHP and a couple of other things. What we want is a stable platform that has a tool to run our apps in process and that supports templates. Ideally that would be NT/IIS/ASP. Given our recent experience, I am not optimistic. We're checking out several options. Before we do anything, we want to have a clear idea of what is involved. This kind of change is not something to be taken lightly.

Don't expect anything to happen very soon. Even if we decided to move from NT/IIS to something else, we have a couple hundred thousand lines of ASP to convert. That cannot happen overnight. And yes � I know there are several converters out there. We have not been impressed by them. And yes � I do know that we should take this one small step at a time. And yes � I know there are thousands of consulting firms that would love to help us. Can you tell that I receive a couple hundred emails a day ;)?

This is not to say that the Fool site is gonna be painfully unreliable for the foreseeable future. We've already created a bunch of Band-Aids and work-arounds for the problems we have. We're working on several more. This leads to the next point -

The Speed of FoolTech
I have seen several posts and numerous emails asking why it takes "forever" to see any improvements to the Fool web site. I know that the site does not change as quickly as Fools would like (this includes the techies). We're working as fast as we can. All of my techies are about 120-150% allocated, most putting in 80+ hour weeks (three of 'em pulled all-nighters last night, two the night before � pretty typical). The salient point here is: We have a LOT of work to do and not enough techies to do it. I'm hiring as fast as I can. Unfortunately finding good, smart, Foolish techies is not easy mostly because I won't compromise on any of those qualities, the last one being the most important. It's that whole "chain as strong as the weakest link" thing.

Lest you think I am pointing fingers and disavowing responsibility for the problems on the Fool site, I will be the first to admit that the TechDome is not perfect. We make mistakes. Not all of our code works perfectly. We've even done some things in the past that I can only classify as "boneheaded." However, while we are Fools, we're not idiots. We know what needs to be done and we're working on it.

I know you don't like it when your favorites and/or ports disappear or when the Fool site is slow or when the little things that "should only take a couple of minutes to change" don't get changed or when you get an Ouch message or when the quotes are down. Believe me � we know. I've got the scathing flame mail to prove it. My techs never see that mail (I don't think it is too productive to slam people who are working their butts off) but they understand the customer frustration. While you may not believe it, we actually feel much worse about this stuff than you do because it reflects poorly on us. Suffice to say, we're working on it as fast as we can. I'm looking forward to the day when my techs and I can get more than 5 hours of sleep a night. I don't see that happening any time soon, 'cause we have promises to keep, and miles to go before we sleep, and miles to go before we sleep.

BTW: If you are a techie (or know any), have strong technical kung fu, and would consider working full-time for the Fool in Alexandria, VA (no telecommuters and no consultants please), check out http://tech.fool.com [Very Cool]
[Or, check out all of our
Current Openings.]

Fool On,

Dwight

________________
Dwight J. Gibbs
Chief Techie Geek
The Motley Fool
http://www.fool.com
AOL Keyword: Fool