Basic ranked searching for those that don't want to use FULLTEXT

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.
13 years ago
So this is my 3rd posting on searching on Nop_Commerce.  Like many of you I probably chased my tail a bit on figuring out that although FULLTEXT is what we all want we really can't use it to its full extent easily. The problem is there are several tables, think variants, categories, etc, that need to be searched.  So combining a meaningful score that we can sort results by isn't very practical.

there are two solutions:

one create a new table called searchdata.  which is populated by triggers and simply contains all the fields we want to search by from other tables.  Indexed by the common ProductID tag.

Now full index that and use that as part of the return search query.


OR.

right a simple Function that gives us some sort of score that we can use right away and forgo the full text.  you will definitely lose the functionality of fulltext but you will get a score that you can tweak.  More than likely you will get proper results.

lets face it most of our sites are not google so they don't need google style searching.  I just need the basics.

So here is my solution for now. (still needs tweaking).  Two functions that I call to return a score that I can sort by:

CREATE FUNCTION [dbo].[Nop_Tokenize_String]
(
  @astring   nvarchar(max)
)
RETURNS nvarchar(max)
with execute as caller
AS
BEGIN

-- first we need to tokenize the search string and search against.  to do so we need to delete all extra spaces
-- and other non alpha chargs.
declare @i int=0
declare @j int=0
declare @valid varchar(37)='0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ '
declare @NSearchString nvarchar(max)=''

set @astring=upper(@astring)

while @j<LEN(@astring)
Begin
set @j=@j+1
if CHARINDEX(SUBSTRING(@astring,@j,1),@valid)>0  set @NSearchString=@NSearchString + SUBSTRING(@astring,@j,1) else set @NSearchString=@NSearchString + ' '
End

while CHARINDEX('  ',@NSearchString)>0
begin
set @i=CHARINDEX('  ',@NSearchString)
if @i>0 set @NSearchString=REPLACE(@NSearchString,'  ',' ')
end


--Now NSearchString has been normalized.  No double spaces, no punctuation marks and all words/tokens are seperated by one space.

return @NSearchString


END


and the main scoring function:

create FUNCTION [dbo].[Nop_Search_Rank]
(
  @SearchString   nvarchar(max),
  @ToSearchString nvarchar(max)
)
RETURNS int
with execute as caller
AS
BEGIN

-- first we need to tokenize the search string and search against.  to do so we need to delete all extra spaces
-- and other non alpha chargs.

set @SearchString=dbo.Nop_Tokenize_String(@SearchString)
set @ToSearchString=dbo.Nop_Tokenize_String(@ToSearchString)
declare @Temp_SS nvarchar(max) = @SearchString
declare @Temp_TSS nvarchar(max)= @ToSearchString
declare @i int
declare @j int
declare @score int =0
declare @token nvarchar(200) =''
-- both strings are normalized for searching.
while LEN(@Temp_SS)>0
begin
set @i=CHARINDEX(' ',@Temp_SS)
if @i>0
  begin
    set @token=substring(@temp_ss,1,@i-1)
    set @Temp_SS=SUBSTRING(@temp_ss,@i+1,999999)
  end
else
  begin
    set @token=@temp_SS
    set @Temp_SS=''
  end
-- how many times does it exist in the ToSearchString.  The more times it shows the more valid the score.
set @j=0
while CHARINDEX(@token,@ToSearchString,@j+1)>0
begin
  set @j=CHARINDEX(@token,@ToSearchString,@j+1)
  if @j>0 set @score=@score + len(@token)
end
end

return @score

END

the score function would be called in Nop_ProductLoadAllPaged.
you would use it in the --search standard content area.
--something like
or (nop_Search_Rank(@keywords,p.name + ' ' + pv.name + ' ' + p.shortdescription <and more etc>) >0)

In the order by clause (which is a bunch of case statements)
you could add at the end
CASE WHEN @KEYWORDS<>''
Then nop_Search_Rank(@keywords,p.name + ' ' + pv.name + ' ' + p.shortdescription <and more etc>) END


so the bad - it is called twice but I did not want to include it in the select call since not all will qualify to be tested in the first place.  It is taking a large amount of data, potentially, which could be a cpu/server killer.

also I am sure some of my code can be improved.  Also the test should be ran both ways.
meaning swap the @keywords and the "to be searched" text around.  This will take care of plurals and other forms of the words.  this will return results with words contained in others.  i.e.  after would hit on rafter.  That could easily be fixed by simply adding a space before or after the search token.

If you are a smaller shop, like me, this should work.


So anyone else got something like this.  maybe even better or a better idea?

Mike
13 years ago
I just implemented on my site with some tweaks and it works great.
at least for me.

If any one is interested i am happy to share the simple code.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.