<?xml version="1.0" encoding="utf-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Query Problems</title>
	<atom:link href="http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/</link>
	<description>The weblog of Joshua Eichorn, AJAX, PHP and Open Source</description>
	<pubDate>Fri, 05 Sep 2008 22:21:06 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.1-alpha</generator>
		<item>
		<title>By: Paul M. Jones</title>
		<link>http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4246</link>
		<dc:creator>Paul M. Jones</dc:creator>
		<pubDate>Wed, 27 Jul 2005 23:09:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4246</guid>
		<description>I know I'm late to the party here, but when I was building the tag system for Solar (http://solarphp.com) I found this resource quite valuable.

http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html

Hope this helps; it may confirm some of your decisions.</description>
		<content:encoded><![CDATA[<p>I know I&#8217;m late to the party here, but when I was building the tag system for Solar (http://solarphp.com) I found this resource quite valuable.</p>
<p><a href="http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html" rel="nofollow">http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html</a></p>
<p>Hope this helps; it may confirm some of your decisions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: There and Back Again &#187; Blog Archive &#187; Tag Filters (Update on Query Problems)</title>
		<link>http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4241</link>
		<dc:creator>There and Back Again &#187; Blog Archive &#187; Tag Filters (Update on Query Problems)</dc:creator>
		<pubDate>Tue, 26 Jul 2005 16:52:44 +0000</pubDate>
		<guid isPermaLink="false">http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4241</guid>
		<description>[...] From the couple responses I got to my posting about Query Problems it looks like there is no great solution to tag filtering. It does look like using a subquery for each tag you want to filter by works reasonable well up to at least hundreds of thousands of tags, so that approach should work just fine for me. [...]</description>
		<content:encoded><![CDATA[<p>[...] From the couple responses I got to my posting about Query Problems it looks like there is no great solution to tag filtering. It does look like using a subquery for each tag you want to filter by works reasonable well up to at least hundreds of thousands of tags, so that approach should work just fine for me. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mike.lively</title>
		<link>http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4240</link>
		<dc:creator>mike.lively</dc:creator>
		<pubDate>Tue, 26 Jul 2005 16:37:01 +0000</pubDate>
		<guid isPermaLink="false">http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4240</guid>
		<description>I am not entirely sure I understand precisely what you are trying to do.

I have a media library system that I wrote which uses tags as opposed to directories to catagorize the files. I don't have any feature to search across multiple tags as I just don't see it being necessary for what I want to do. I DO however have a feature to search for untagged items, which is close to the same thing and I think the query will scale alright...

select b.*
  from
bookmark b
  inner join bookmark_tag using(bookmark_id)
  inner join tag using(tag_id)
where
  tag.tag in ('library', 'language:php', 'provides:ajax')
group by 
  b.bookmark_id
having
  count(tag.tag) = 3

The variables will of course be the tag list (where clause) and the count number.

Problem is this won't handle any kind of 'OR' filter...

[offtopic]
I wrote the backend of the media library script I wrote utilizing ajax. Your site helped quite bit in that endeavor, thanks.
[/offtopic]</description>
		<content:encoded><![CDATA[<p>I am not entirely sure I understand precisely what you are trying to do.</p>
<p>I have a media library system that I wrote which uses tags as opposed to directories to catagorize the files. I don&#8217;t have any feature to search across multiple tags as I just don&#8217;t see it being necessary for what I want to do. I DO however have a feature to search for untagged items, which is close to the same thing and I think the query will scale alright&#8230;</p>
<p>select b.*<br />
  from<br />
bookmark b<br />
  inner join bookmark_tag using(bookmark_id)<br />
  inner join tag using(tag_id)<br />
where<br />
  tag.tag in (&#8217;library&#8217;, &#8216;language:php&#8217;, &#8216;provides:ajax&#8217;)<br />
group by<br />
  b.bookmark_id<br />
having<br />
  count(tag.tag) = 3</p>
<p>The variables will of course be the tag list (where clause) and the count number.</p>
<p>Problem is this won&#8217;t handle any kind of &#8216;OR&#8217; filter&#8230;</p>
<p>[offtopic]<br />
I wrote the backend of the media library script I wrote utilizing ajax. Your site helped quite bit in that endeavor, thanks.<br />
[/offtopic]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joshua Eichorn</title>
		<link>http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4238</link>
		<dc:creator>Joshua Eichorn</dc:creator>
		<pubDate>Tue, 26 Jul 2005 05:00:57 +0000</pubDate>
		<guid isPermaLink="false">http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4238</guid>
		<description>Hmm well I guess from your feedback I should just go the multi-subquery route.  Im not planning on using this for anything but my own site so things should perform fine since im never going to have more then a thousand or rows in any table.

I also just thought of the mysql set datatype (http://dev.mysql.com/doc/mysql/en/set.html) its horrible mysql specific and has a limit of 64 items which makes it worthless for tagging, but maybe something along the same concept (really just a bitmap) might be the solution, if you can think of a way to make it scale.

I wonder what people like del.ico.us do, maybe store the tags all in one text field and mysql's FIELD function.  Anyhow it sounds like a hard problem i'm glad im not up to your level of tagging.</description>
		<content:encoded><![CDATA[<p>Hmm well I guess from your feedback I should just go the multi-subquery route.  Im not planning on using this for anything but my own site so things should perform fine since im never going to have more then a thousand or rows in any table.</p>
<p>I also just thought of the mysql set datatype (http://dev.mysql.com/doc/mysql/en/set.html) its horrible mysql specific and has a limit of 64 items which makes it worthless for tagging, but maybe something along the same concept (really just a bitmap) might be the solution, if you can think of a way to make it scale.</p>
<p>I wonder what people like del.ico.us do, maybe store the tags all in one text field and mysql&#8217;s FIELD function.  Anyhow it sounds like a hard problem i&#8217;m glad im not up to your level of tagging.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Greg Watson</title>
		<link>http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4237</link>
		<dc:creator>Greg Watson</dc:creator>
		<pubDate>Tue, 26 Jul 2005 04:34:14 +0000</pubDate>
		<guid isPermaLink="false">http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4237</guid>
		<description>I've been working on a similar problem for an app I've been helping to build at work.  Same sort of 3 table structure, but in my case using images and keywords.  I've tried 3 different query types so far:

  * One crazy select with a GROUP and HAVING clause.  This worked, but was pretty slow when searching with more than one keyword.
  * A query with nested subselects for each keyword.  This can be pretty slow with two keywords that have a large set, but small intersection.  One way to speed it up is to do a count on each tag and then order it so that the tag with the fewest bookmarks is nested the deepest (or was it first?).
  * I've also been playing around with mysql's fulltext search.  In your case, add an indexed fulltext column to the bookmarks table and then query against it.  While this was pretty fast, fulltext search really wasn't a good match.

So far I'm sticking with number 2, thought I'm not really happy with the performance for edge cases.  In my case the database has 1 million records in the equivalent of your bookmark table, 9 million in the join table, and about 80K in the tag table.  For the most part results are quick (1-3) seconds, but other searches take much longer.

Like you, I can't help but think there's a better way to do it.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve been working on a similar problem for an app I&#8217;ve been helping to build at work.  Same sort of 3 table structure, but in my case using images and keywords.  I&#8217;ve tried 3 different query types so far:</p>
<p>  * One crazy select with a GROUP and HAVING clause.  This worked, but was pretty slow when searching with more than one keyword.<br />
  * A query with nested subselects for each keyword.  This can be pretty slow with two keywords that have a large set, but small intersection.  One way to speed it up is to do a count on each tag and then order it so that the tag with the fewest bookmarks is nested the deepest (or was it first?).<br />
  * I&#8217;ve also been playing around with mysql&#8217;s fulltext search.  In your case, add an indexed fulltext column to the bookmarks table and then query against it.  While this was pretty fast, fulltext search really wasn&#8217;t a good match.</p>
<p>So far I&#8217;m sticking with number 2, thought I&#8217;m not really happy with the performance for edge cases.  In my case the database has 1 million records in the equivalent of your bookmark table, 9 million in the join table, and about 80K in the tag table.  For the most part results are quick (1-3) seconds, but other searches take much longer.</p>
<p>Like you, I can&#8217;t help but think there&#8217;s a better way to do it.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Joshua Eichorn</title>
		<link>http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4236</link>
		<dc:creator>Joshua Eichorn</dc:creator>
		<pubDate>Tue, 26 Jul 2005 00:10:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4236</guid>
		<description>Alan: there is no nested tree, its all tags.  The current storage and browsing works great its just the search that gives me the problems, and though pulling the whole thing into the dom and using XPath would work for querying, I can't see how that would would scale any better then just using a different subquery for each filter added.</description>
		<content:encoded><![CDATA[<p>Alan: there is no nested tree, its all tags.  The current storage and browsing works great its just the search that gives me the problems, and though pulling the whole thing into the dom and using XPath would work for querying, I can&#8217;t see how that would would scale any better then just using a different subquery for each filter added.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Alan Knowles</title>
		<link>http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4235</link>
		<dc:creator>Alan Knowles</dc:creator>
		<pubDate>Tue, 26 Jul 2005 00:06:57 +0000</pubDate>
		<guid isPermaLink="false">http://blog.joshuaeichorn.com/archives/2005/07/25/query-problems/#comment-4235</guid>
		<description>Done these sort of things for years, in the end, using XML (or HTML lists) for navigation, and DOM parsing, proved to be the most efficient way to do this.. although DB's can do nested tree's they are a hasstle to query and not really that efficient..</description>
		<content:encoded><![CDATA[<p>Done these sort of things for years, in the end, using XML (or HTML lists) for navigation, and DOM parsing, proved to be the most efficient way to do this.. although DB&#8217;s can do nested tree&#8217;s they are a hasstle to query and not really that efficient..</p>
]]></content:encoded>
	</item>
</channel>
</rss>
