[SQLiteJDBC] Why is this query answered instantly by the sqlite3 application, but slowly using sqlitejdbc?
Gary Helmstetter
gh at theworld.com
Mon Mar 1 16:41:08 EST 2010
Brain,
Thank you for your interest and help! I've been working on this
continuously since receiving your note. Detailed responses inserted
below, in the hope that the day I've spent (all told) on this will not
have to be repeated by others who run into it!
>
> On Sun, Feb 28, 2010 at 4:20 PM, Gary H <gh at theworld.com> wrote:
> I'm using Mac OS X v10.4.11 with the standard Java (1.5.0_19) and
> sqlite3 (3.1.3) that came it. (Yeah, a little old.)
>
> I have a sqlite3 database with a table with a few hundred thousand
> rows, with "name" and "stored" text columns. Name is one of six (so
> far) short strings; stored is a 19-character standard date-time
> string. Each column has a stand-alone index. (Essentially, the table
> tracks events in the life of 6 entities; its other columns describe
> the individual events.) The following query:
>
> > select distinct name from myTable where stored >= date("now");
>
> lists the relevant names instantly when I perform it through the Mac
> OS X "sqlite3" application. I get the same results, but it takes over
> 2 seconds to find each name (total of about 15 seconds), when I do the
> exact same query in the obvious way in my application:
>
> > String q = "SELECT DISTINCT name FROM myTable " +
> > "WHERE stored >= DATE('now');" ;
> > try {
> > Statement stmt = conn.createStatement();
> > ResultSet rs = stmt.executeQuery(q);
> > while (rs.next()) {
> > final String s = rs.getString("symbol");
> > System.err.println("Got " + s);
> > }
> > rs.close();
> > }
> I've tried this with both sqlitejdbc-v054 and sqlitejdbc-v055 . No
> perceptible difference.
>
> A wild speculation, based on the way it behaves, is that when the
> sqlite3 application does the query sqlite uses the indexes optimally
> (ie. applying "stored" first), but for some reason this doesn't happen
> when sqlitejdbc does the same query.
>
> Is this a known deficiency? If not, anyone have any suggestions how to
> attack it ?
>
On Mar 1, 2010, at 11:51 AM, Brain . wrote:
> With such a large time difference, it would be my guess that sqlite
> is not using the index you want it to use. It would be surprising
> that sqlitejdbc would make a difference.
That's my feeling as well. And with 30+ years of dbms experience,
including internals, under my belt, it goes beyond surprising; it
doesn't seem to make any sense at all. Which means it's probably some
kind of misbehavior at a low level - like a version skew, compiler
bug, etc.
> It's pretty much a small wrapper over the native functions.
Well, yes... but the way it's delivered and installed, it effectively
contains a copy of sqlite... which may not (in this case, IS not)
closely synchronized with that which created the database.
>
> Are you using the exact same database in both programs?
> Of course, if you aren't then there could be differences in their
> indexes.
Yes. Positively the same database file.
>
> Sqlitejdbc uses sqlite 3.6.14.2. So maybe that is why there is a
> difference.
Indeed. I had been assuming that newer versions of sqlitejdbc would
work fine with older databases, but of course that's not necessarily
the case!
Let me do some research... OK, I dug through the Crawshaw commit logs
on github to determine the sqlitejdbc--sqlite version correspondences
starting with sqlite 3.4.0. It was a bit of work, so in case it has
value to anyone ever again:
commit date sqlitejdbc sqlite
2007-06-18 034 3.4.0
2007-12-30 038 3.5.4
2008-02-13 040 3.5.6
2008-04-16 044 3.5.8
2008-05-29 047 3.5.9
2008-08-12 053 3.6.1
2008-12-04 054 3.6.3
2009-04-06 054 3.6.10
2009-06-09 055 3.6.14.2
2009-06-09(2) 056 3.6.14.2
In the discussion below, for readability I call "Mac OS X 10.4" Tiger,
and "Mac OS X 10.5" Leopard, and when I refer to "sqlite3" I'm talking
about the Mac OS X command-line SQL interpreter application. Tiger
has sqlite3 v3.1.3, and Leopard has sqlite3 v3.4.0 .
I originally created my databases using Tiger's sqlite3. I could
create one with Leopard's and see if that works any better. Taking
this idea further, I could theoretically create my databases using my
Java program, which would use sqlitejdbc v056 and thus sqlite
3.6.14.2. It seems highly likely the query would then work properly
through sqlitejdbc. However, in my particular situation there are
three problems with that:
(1) It would be a bit of a structural hassle; my project creates its
databases, and does various updates, by feeding a database and a text
file of sql commands to the sqlite3 app. To preserve that scheme I'd
have to write a java program that provided the required sqlite3
functionality. That's not too hard....
(2) I'd then have a database that could probably only be accessed by
my Java programs, since the Tiger or Leopard sqlite3 cannot be
expected to be reliably forward-compatible with databases created by
newer ones eg. 3.6.* . That is a show-stopper; I really need to be
able to use Leopard's sqlite (and preferably Tiger's) to do ad-hoc
queries and data maintenance.
(3) Even just using Leopard's sqlite3 to create the databases, I might
have problems using them on my older iMacs; they have Tiger and I
don't think they can be upgraded to Leopard.
>
> You have several options to debug this.
> 1. You can try using the EXPLAIN command to try to figure out which
> index it is using. You can execute this as a query using
> sqlitejdbc. It might be a little tricky to get the output of it,
> but it works.
I would have tried EXPLAIN if I thought it was possible to get its
results through sqlitejdbc. But assuming I did that and it confirmed
that it used different (or no) indexes with sqlitejdbc, where would we
go from there? (Answer below!)
> 2. You can try using the INDEXED BY clause in your query to force
> it to use an index that you want it to use or give an error why it
> can't.
> http://www.sqlite.org/lang_indexedby.html
Great idea, that should either speed up the query (thus providing at
least a temporary work-around), or give an error message that will
shed more light on the problem. Let me try it...
Eureka! I added "INDEXED BY <name index>" after the table name in the
query, and that modification caused it to run just as fast as when
executed by sqlite3.
> 3. http://www.sqlite.org/optoverview.html
> This page has several helpful sections such as the sqlite ANALYZE
> command which will generate statistics about tables which helps it
> choose which indexes it should use.
Sadly, after further experimentation:
- Using "ANALYZE" to add the sqlite_stats1 table to a database
created by Tiger sqlite3 doesn't help
- Creating the database using Leopard sqlite3 doesn't help
- Performing ANALYZE on a database created using Leopard sqlite3
doesn't help
So it appears that the general problem is that new versions of sqlite
(which are bundled into sqlitejdbc) do not optimize queries properly
when operating against databases created by older versions of sqlite.
In my particular case I have a number of solutions available, all with
uncomfortable issues, and the least uncomfortable is to add "INDEXED
BY" clauses whenever I run into this problem. The only other solution
I can see is to download a "modern" sqlite3 source, build it locally,
and put a private copy (so as not to screw up anything else they might
use sqlite for) on the macs used by this project. That's not much
different than writing my own Java+sqlitejdbc implementation of the
sqlite3 subset I need.
> I hope this helps.
Indeed it did. Thanks again!
- Gary H
More information about the SQLiteJDBC
mailing list