Easy Fuzzy Logic with MySql – The end of “no results found”

As a web programmer I ran into the problem when running a complicated (user) search on Mysql that the results are too strict, and thus giving the well known error “no results found”. While good (although not perfect) results exist!

The problem

When a traditional search query is initiated, sql queries are being generated in the terms of:

User search: where tv_manufacturer=”sony” and tv_description =”%widescreen%” and tv_price < 1000;

A user is asking for a Sony television AND that is widescreen AND less then 1000 dollar. This will show very accurate results. But limits the opportunities when (a best matching) TV is $1050. The users would be okay with paying $50 more in real life. But our query won’t allow it. We want to have that (almost perfect match) results shown!

This query can be rewritten by replacing the AND with OR in the query, but by using OR we get inaccurate results because results will show any TV below 1000 dollar OR any Sony OR any widescreen - useless.

The good news is that we can solve this without having to ask a user the factual and nerdy: WIDESCREEN AND (SONY OR 1000 DOLLAR) – way to difficult.

The answer is in what is named ‘fuzzy logic’. Fuzzy logic is more natural and (semi-) intelligent by mathematical logarithms:

User search: a preferably Sony TV with widescreen support for more or less a 1000 dollars, I prefer less. Please.

A few specialists software company’s offer fuzzy logic software, but this is highly tailored to the specific needs of the system.
But Mysql has a solution, with a few hacks will result in accurate results.

The solution:

The solution is to be found in the “MATCH AGAINST” function of Mysql. It is a text matching system where you can add your preferences, and the query gives points to indicate the score in matching.
Very few people use this, maybe because they are disappointed that it is only matching text. But in this post I will show you how to also integrate a (in the real world less strict) demand like: less then $1000.

We do this by encoding the numbers to a word. In this case the TV price of our tv in the database will be encoded to unique words like “pricemaxthousand”, etc.

All the features of the TV are being stored in a new (text only) column named encodedsqlrow.
So we get this: encodedsqlrow = “sony widescreen pricethousandtotwothousand diagonalthirtyinch”.

With the match against function we can also search “IN BOOLEAN MODE”. This will add ‘preferences’ to every search demand (word) in our query.

The preferences you can give to a demand (word) are in the order of:
+ = Obligated
> = Important
~ = More or less important
- = Without

And last but not least, we can retrieve a score with every results. So the most accurate results can be listed at the top.

With all this together we (a user) can create a search query that will results in more natural human-like picked results.

Creating our query:

if($demandpricemax)

< 1000)
$encodedsearch = “>

sony +widescreen ~pricemaxthousand”;

Getting the score:

Select tv_manufacturer, MATCH (encodedsqlrow) AGAINST (’$encodedsearch’ IN BOOLEAN MODE) as score

Setting the match search:

WHERE MATCH (encodedsqlrow) AGAINST (’$encodedsearch’ IN BOOLEAN MODE) ORDER BY score DESC

Example Page – integrated:
For a dutch website I made this function so it matches all studies (1800) against the many demands of a to-be-student. Like he could say: I am searching for a study obligated in Amsterdam with more or less important in the economic field with important average workload important mostly female on a more or less important university.
Many demands, and this will result in accurate results that include studies in Amsterdam although it has mostly male students.

Have any questions or want to bash this text: email address is on the right hand side of your screen.

Note: the database column (encodedsqlrow) must have an FULLTEXT index (via phpMyAdmin the blue “T” the at ‘actions’. This will make it searchable for the MATCH AGAINST function. Else it won’t work.

Sources:

http://en.wikipedia.org/wiki/Fuzzy_logic
http://www.seattlerobotics.org/encoder/mar98/fuz/flindex.html
http://www.wcc.nl/
http://www.kiesjestudie.nl/l-studietest.html
http://dev.mysql.com/doc/refman/5.0/en/fulltext-boolean.html
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html

Posted by on .