I'm not an index expert, but I think I need help from one.
I have a table with 4.9 million rows. Its PK is a composite key on 2 unique FLOAT columns. The floats are actually long ints; there are no decimals. The 2 fields are a LOW and a HIGH, and together they define a unique numeric range.
Given a number input, I need to retrieve THE row where the input number falls within that row's LOW & HIGH range. Here's essentially what I'm doing:
SELECT * FROM theTable WHERE @.inputNumber BETWEEN RangeLow AND RangeHigh
My question is on the best way to do the indexes -- right now I'm getting results, but the query takes a good 15 seconds. Isn't there a way to make such a search faster, even though there are tons of rows?
Would you expect a search to take that long, in a table with that many rows?
Thanks very much,
Rob
Could you tell how your table looks like, i.e. besides the two float columns, what other columns are available?Also, how many of the rows does the above query normally return on average?
Based on the above information, I might be able to give you some additional suggestions.
Thanks,
Marcel van der Holst
- MSFT|||
Here's additional info on the columns and current indexes:
Columns:
RangeFrom (PK, float, not null)
RangeTo (PK, float, not null)
NameShort (nvarchar(2), null)
...plus 10 more informational NVARCHAR columns, none of which are indexed.
Indexes:
IX_NameShort (Non-Unique, Non-Clustered)
PK_RangeTable (Clustered)
A query should always return the ONE row whose From-To range contains the number passed in.
Thank you for any suggestions,
Rob Wingate
|||if you look at the query plan does it show an index seek or an index scan ?
You might TRY adding a nonclustered index to RangeFrom and to the RangeTo columns and see if it helps.
focus on "SubTree Cost" in the query plan, it's a better indicator than simple duration when determining if you're really improving the query. SubTree goes down, performance improves....!
Cheers,
|||Take a look at the query plan. It should do an index seek. Also, consider adding "top 1" to your query if you know for sure that there should only be one row returned.
|||
The execution plan does the following:
Clustered Index Scan (Country tbl, cost: 49%)
Compute Scalar (Cost: 0%)
Clustered Index Seek (Location tbl, cost: 25%)
Clustered Index Scan (Region tbl, cost: 25%)
Nested Loops (Left Outer Join, cost: 1%)
Nested Loops (Left Outer Join, cost: 0%)
Top (Cost: 0%)
SELECT (Cost: 0%)
I'm not sure whether this info is useful; hopefully it will be.
Thanks again,
Rob
There must be more to the story. If you post the simplified ddl + query, we might be able to offer a better solution.
This example shows that index seek is used for the range lookup.
set showplan_text on
go
declare @.tb table (i float, j float, primary key(i,j))
declare @.i float
select * from @.tb where @.i between i and j
go
set showplan_text off
go
|||
Here's the table:
CREATE TABLE [dbo].[IP2Location](
[IPFrom] [float] NOT NULL,
[IPTo] [float] NOT NULL,
[CountryShort] [nvarchar](2) NULL,
[CountryLong] [nvarchar](64) NULL,
[Region] [nvarchar](128) NULL,
[City] [nvarchar](128) NULL,
[Latitude] [float] NULL,
[Longitude] [float] NULL,
[ZIPCode] [nvarchar](5) NULL,
[TimeZone] [nvarchar](8) NULL,
[ISP] [nvarchar](255) NULL,
[Domain] [nvarchar](128) NULL,
[NetSpeed] [nvarchar](8) NULL,
CONSTRAINT [PK_IP2Location] PRIMARY KEY CLUSTERED
( [IPFrom] ASC,
[IPTo] ASC
) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
END
GO
CREATE NONCLUSTERED INDEX [IX_CountryShort] ON [dbo].[IP2Location]
( [CountryShort] ASC ) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
GO
CREATE UNIQUE NONCLUSTERED INDEX [IX_IPFrom_IPTo] ON [dbo].[IP2Location]
( [IPFrom] ASC, [IPTo] ASC ) WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
...and here's the relevant part of the query:
ALTER PROCEDURE [dbo].[s_GetLocationByIP]
@.IPNumber FLOAT
AS
BEGIN
DECLARE @.tblLocation TABLE (
AreaCode VARCHAR(5),
GeographicArea VARCHAR(50),
Country VARCHAR(500),
CountryCode NVARCHAR(2),
Region NVARCHAR(128),
City NVARCHAR(128),
Latitude FLOAT,
Longitude FLOAT,
ZipCode NVARCHAR(5),
TimeZone NVARCHAR(8),
ISP NVARCHAR(255),
Domain NVARCHAR(128),
NetSpeed NVARCHAR(8)
)
INSERT INTO @.tblLocation (AreaCode, GeographicArea, Country, CountryCode, Region,
City, Latitude, Longitude, ZipCode, TimeZone, ISP, Domain, NetSpeed)
SELECT TOP (1) c.Region_str_code AS AreaCode, Region_str_name AS GeographicArea,
Country_str_name AS Country, CountryShort AS CountryCode, Region, City,
Latitude, Longitude, ZipCode, TimeZone, ISP, Domain, NetSpeed
FROM IP2Location ip
LEFT OUTER JOIN Country c ON ip.CountryShort = c.Country_str_code
LEFT OUTER JOIN Region r ON r.Region_str_code = c.Region_str_code
WHERE @.IPNumber BETWEEN IPFrom AND IPTo
I hope this clarifies. Thanks.
Rob
|||49% of the cost is a index scan on the Country table...
Show us the country table...
you need an index on the country table Country_str_Code column and on the IP2Location.CountryShort Column
|||
Here's the Countries table:
CREATE TABLE [dbo].[Country](
[Country_str_code] [varchar](2) NOT NULL,
[Region_str_code] [varchar](100) NOT NULL,
[Country_str_ISO3166_2char_code] [varchar](100) NULL,
[Country_str_ISO3166_3Char_code] [varchar](100) NULL,
[Country_str_ISO3166_numeric_code] [varchar](100) NULL,
[Country_str_type] [varchar](100) NULL,
[Country_str_name] [varchar](500) NULL,
[Country_str_local_name] [varchar](500) NULL,
[Country_str_long_name] [varchar](500) NULL,
[Country_str_local_long_name] [varchar](500) NULL,
[Country_str_controlling_entity] [varchar](500) NULL,
[Country_str_former_name] [varchar](500) NULL,
[Country_str_name_abbreviation] [varchar](500) NULL,
CONSTRAINT [PK_CountryTemp] PRIMARY KEY CLUSTERED
( [Country_str_code] ASC )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [Country.Country_str_code] ON [dbo].[Country]
( [Country_str_code] ASC)WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [Country.Region_str_code] ON [dbo].[Country]
( [Region_str_code] ASC )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]
Thanks,
Rob
|||ip2.countryshort is a "nvarchar" the other table (country) uses a varchar.
I believe this is causing the inefficiencies in the join.
try making them the same data type and see if you get some better index relief.
cheers
|||LEFT OUTER JOIN Country c ON ip.CountryShort = c.Country_str_code
LEFT OUTER JOIN Region r ON r.Region_str_code = c.Region_str_code
WHERE @.IPNumber BETWEEN IPFrom AND IPTo
You'd want to make sure all the columns used in the join and where clause are indexed. Also make sure they're the same datatype and length so there wouldn't be a need for implicit conversion.
|||
Many thanks to Marcel, Gregory, and oj for your suggestions. Changing ip2.CountryShort from nvarchar to varchar cut the query time down.
Best regards,
Rob
Nice.
thought that might be the issue.
make sure those columns are indexed as well
No comments:
Post a Comment