There and Back Again

Query Problems

I’ve been attempting to add some filters to my AJAX resource list. I’ve made an attempt an added them, but they don’t work the way I would like. So lets say your on the library list, a filter that would make a lot of sense, would be to say show me every library thats written in PHP and Provides AJAX. Now if you goto that link you’ll get every library thats written in PHP or Provides AJAX. Now I think the UI worked out pretty well but the or thing stinks, and that all comes down to database queries.

So the database is 3 tables:

Bookmarks
  bookmark_id
  user_id
  url
  ...

bookmark_tag
  bookmark_id
  tag_id

tag
  tag_id
  user_id
  tag

So I end up with a that is basically:

select b.* 
  from 
bookmark b 
  inner join bookmark_tag using(bookmark_id) 
  inner join tag using(tag_id) 
where 
  tag.tag = 'library' and 
  b.bookmark_id in(
    select bookmark_id from bookmark_tag inner join tag using(tag_id) where tag.tag in ('language:php','provides:ajax')
  )

This works great but I export for multiple filter tags means or, since they are in an in list, or subquery for filter which seems like it would be horrible for performance.

Am I just missing a simple solution to the problem or is a subquery for each filter the only option?

Btw: I’m using mysql 4.1.something

7 thoughts on “Query Problems

  1. Alan Knowles

    Done these sort of things for years, in the end, using XML (or HTML lists) for navigation, and DOM parsing, proved to be the most efficient way to do this.. although DB’s can do nested tree’s they are a hasstle to query and not really that efficient..

  2. Joshua Eichorn Post author

    Alan: there is no nested tree, its all tags. The current storage and browsing works great its just the search that gives me the problems, and though pulling the whole thing into the dom and using XPath would work for querying, I can’t see how that would would scale any better then just using a different subquery for each filter added.

  3. Greg Watson

    I’ve been working on a similar problem for an app I’ve been helping to build at work. Same sort of 3 table structure, but in my case using images and keywords. I’ve tried 3 different query types so far:

    * One crazy select with a GROUP and HAVING clause. This worked, but was pretty slow when searching with more than one keyword.
    * A query with nested subselects for each keyword. This can be pretty slow with two keywords that have a large set, but small intersection. One way to speed it up is to do a count on each tag and then order it so that the tag with the fewest bookmarks is nested the deepest (or was it first?).
    * I’ve also been playing around with mysql’s fulltext search. In your case, add an indexed fulltext column to the bookmarks table and then query against it. While this was pretty fast, fulltext search really wasn’t a good match.

    So far I’m sticking with number 2, thought I’m not really happy with the performance for edge cases. In my case the database has 1 million records in the equivalent of your bookmark table, 9 million in the join table, and about 80K in the tag table. For the most part results are quick (1-3) seconds, but other searches take much longer.

    Like you, I can’t help but think there’s a better way to do it.

  4. Joshua Eichorn Post author

    Hmm well I guess from your feedback I should just go the multi-subquery route. Im not planning on using this for anything but my own site so things should perform fine since im never going to have more then a thousand or rows in any table.

    I also just thought of the mysql set datatype (http://dev.mysql.com/doc/mysql/en/set.html) its horrible mysql specific and has a limit of 64 items which makes it worthless for tagging, but maybe something along the same concept (really just a bitmap) might be the solution, if you can think of a way to make it scale.

    I wonder what people like del.ico.us do, maybe store the tags all in one text field and mysql’s FIELD function. Anyhow it sounds like a hard problem i’m glad im not up to your level of tagging.

  5. mike.lively

    I am not entirely sure I understand precisely what you are trying to do.

    I have a media library system that I wrote which uses tags as opposed to directories to catagorize the files. I don’t have any feature to search across multiple tags as I just don’t see it being necessary for what I want to do. I DO however have a feature to search for untagged items, which is close to the same thing and I think the query will scale alright…

    select b.*
    from
    bookmark b
    inner join bookmark_tag using(bookmark_id)
    inner join tag using(tag_id)
    where
    tag.tag in (‘library’, ‘language:php’, ‘provides:ajax’)
    group by
    b.bookmark_id
    having
    count(tag.tag) = 3

    The variables will of course be the tag list (where clause) and the count number.

    Problem is this won’t handle any kind of ‘OR’ filter…

    [offtopic]
    I wrote the backend of the media library script I wrote utilizing ajax. Your site helped quite bit in that endeavor, thanks.
    [/offtopic]

  6. Pingback: There and Back Again » Blog Archive » Tag Filters (Update on Query Problems)