I'm getting some very odd behavior from one of my SQL Servers.
It appears to be caused by the job which is starting the full text
population on one of my tables.
This morning when the job started the SQL Server started kicking out error
messages stating
2003-09-22 05:05:07.10 spid230 WARNING: EC 303e55e8, 0 waited 300 sec. on
latch ad8c88. Not a BUF latch.
2003-09-22 05:05:07.11 spid 230 Waiting for type 0x4, current count 0xa,
current owning EX 0xA8A55E8.
It kickes these out every 5 minutes until either the full text search starts
or as happened yesterday, the SQL Server stops responding, and we restart
the service.
Now, yesterday, when all this started at 01:59:49.11 I started to receive
the following
Error: 17883, Severity: 1, State: 0
The Scheduler 0 appears to be hung. SPID 295 ECID 0, UMS Context
0x03407150.
The server has 4 processors, and 4 Gig on RAM. 2 for SQL, 2 for the rest of
the OS. AWE is not enabled at the OS, or the SQL Server. OS is Windows
2000 Advanced Server, and SQL is 2000 Enterprise sp3 build 760.
This all started following moving the log files from a single SCSI drive, to
a SCSI RAID, then the RAID card failing. A new RAID card was added Saturday
night, and 1 hour later the 17883 started showing up. The 17883 errors
didn't appear this morning however.
I'm currently looking at KB 319892, and it's not being all that helpfull.
Both times this problem has come up, I've either been away from an internet
connection, or I've been sleeping. The first time (Sunday Morning, ending
Monday Morning ~1am) our sa rebooted the server to get the services back up
and running. This morning starting at 5am, ending at 10:15am the full text
index job finished and job the index rebuiling started. Can anyone shead
any light onto this mess? I'd like to set up an alert to page me the next
time this happens, so I can start looking at the issue as soon as it starts
to attempt to get more info about what's going on, on the server when it
happens, but I don't have an error number to setup the alert on. I will
setup one for 17883, incase that happens again.
If anyone would like to see the current, and last errorlogs please let me
know, and I'll send them. They are both very small.
A new server is on order to replace this machine. I was planning of coping
the files to the new server, and attaching them to the new database server,
but if that will cause this problem to follow, I'll keep the system offline
for longer, and copy the data into new database files via dts.
--
Denny Cherry
DBA
GameSpy IndustriesDenny,
This is very unusual (actually unique) for FT Populations to be causing
these 17883 errors, unless the FT-enable tables that it's reading is somehow
corrupt or possibly the FT Catalog is corrupt and either condition is
causing the scheduler to appear to hang... You might want to run DBCC
Checktable against your FT-enabled table to see if there is any corruption
present in the table.
While there are no real diagnostic tools for checking the MSSearch managed
FT Catalog files, you should review your Application Event log for any
"Microsoft Search" or MssCI (especially MssCI as these events will record
the corruption stack trace) source event - warnings or error or
informational, near the date/time of the 17883 errors. Also, review your
System Event log for any disk controller i/o error or warnings (especially
related to low disk space on the drives where your FT Catalog exists).
Also, in regards to the schedule job, does the FT Population (full or
incremental ?) complete successfully (you'll see a master merge
informational in the App log that indicates a successful completion) before
you fire off the job again? The SQL errorlogs are not of much help in
troubleshooting FT Population issues, as the App log is where you should
look for more answers...
Note, you can also post FTS related questions to the newsgroup:
microsoft.public.sqlserver.fulltext
Regards,
John
"Denny" <mrdenny@.gamespy.com> wrote in message
news:#$NFjaTgDHA.2580@.tk2msftngp13.phx.gbl...
> I'm getting some very odd behavior from one of my SQL Servers.
> It appears to be caused by the job which is starting the full text
> population on one of my tables.
> This morning when the job started the SQL Server started kicking out error
> messages stating
> 2003-09-22 05:05:07.10 spid230 WARNING: EC 303e55e8, 0 waited 300 sec. on
> latch ad8c88. Not a BUF latch.
> 2003-09-22 05:05:07.11 spid 230 Waiting for type 0x4, current count 0xa,
> current owning EX 0xA8A55E8.
> It kickes these out every 5 minutes until either the full text search
starts
> or as happened yesterday, the SQL Server stops responding, and we restart
> the service.
> Now, yesterday, when all this started at 01:59:49.11 I started to receive
> the following
> Error: 17883, Severity: 1, State: 0
> The Scheduler 0 appears to be hung. SPID 295 ECID 0, UMS Context
> 0x03407150.
> The server has 4 processors, and 4 Gig on RAM. 2 for SQL, 2 for the rest
of
> the OS. AWE is not enabled at the OS, or the SQL Server. OS is Windows
> 2000 Advanced Server, and SQL is 2000 Enterprise sp3 build 760.
> This all started following moving the log files from a single SCSI drive,
to
> a SCSI RAID, then the RAID card failing. A new RAID card was added
Saturday
> night, and 1 hour later the 17883 started showing up. The 17883 errors
> didn't appear this morning however.
> I'm currently looking at KB 319892, and it's not being all that helpfull.
> Both times this problem has come up, I've either been away from an
internet
> connection, or I've been sleeping. The first time (Sunday Morning, ending
> Monday Morning ~1am) our sa rebooted the server to get the services back
up
> and running. This morning starting at 5am, ending at 10:15am the full
text
> index job finished and job the index rebuiling started. Can anyone shead
> any light onto this mess? I'd like to set up an alert to page me the next
> time this happens, so I can start looking at the issue as soon as it
starts
> to attempt to get more info about what's going on, on the server when it
> happens, but I don't have an error number to setup the alert on. I will
> setup one for 17883, incase that happens again.
> If anyone would like to see the current, and last errorlogs please let me
> know, and I'll send them. They are both very small.
> A new server is on order to replace this machine. I was planning of
coping
> the files to the new server, and attaching them to the new database
server,
> but if that will cause this problem to follow, I'll keep the system
offline
> for longer, and copy the data into new database files via dts.
> --
> Denny Cherry
> DBA
> GameSpy Industries
>sql
No comments:
Post a Comment