The first alternative technique doesn't use a WHERE clause to filter out our integer values. Let's take a look at some of those queries and their execution results. I mentioned earlier that there were several queries that outperform our basic AND and NOT IN queries (on this server on this day). So, I have managed to convince myself that, despite the two execution times listed above, these two queries are, indeed, the same query as far as SQL Server is concerned - at least on this day, on this server, for these queries (I still gravitate toward the comforting ambiguity of "it depends"). The results regularly go back and forth, much like the heads and tails of a coin toss. I ran the same series of tests on another occasion and the NOT IN query consistently outperformed the AND query. After 100 executions of each query, it seems that execution times for the AND query tend to be lower than those for the NOT IN query (conversion overhead, maybe?): It turns out that SQL likes the AND so much, it converts the NOT IN() clause into a series of AND clauses. |-Compute Scalar(DEFINE:(=CONVERT_IMPLICIT(int,0))) Before running a lot of iterations, let's look at snippets of the query plan text so we know SQL is doing the same thing under the covers for each query. The NOT IN() certainly wins for conciseness. We'll start with the NOT IN query, then follow up with the AND query: PRINT 'Populating sample table (1.000.000 rows. (filterCriterion_sv int identity(1,1) PRIMARY KEY CLUSTERED NOT NULL) The test results were less conclusive at 10,000 rows (with one notable exception), but the trends we see at 1,000,000 rows are clearly discernable by the time we have 100,000 rows. Just to make our queries lengthy enough to be nicely measurable, I arbitrarily chose to insert 1,000,000. Now that we know we're dealing with integer data, we'll set up a table with an integer column, then we'll set up a timing and iteration framework for tracking our query times over multiple executions. The original question was accompanied by a snippet of a WHERE clause evaluating the inequality of a column and four integer values: With all that in mind, let's set off to find out if NOT IN is quicker than a "bunch of " and see a few methods that improve in their performance. I also discovered several other methods (written by people much smarter than myself) that were significantly quicker then either of the two methods I set out to test (the quickest method I tested was a full 35% faster than the quicker of the original two). In actuality, I did find that one of the queries typically outperformed the other. The query engine takes both of these queries and performs them with the exact same sequence of events. This is because SQL is a declarative language, meaning: you tell the computer what you want, not how to get it. It turns out that they are actually the same query, and it should make absolutely no difference. The answer this time was a bit unexpected. This time, however, I thought I'd do some investigation to see if, in fact, we should prefer one method over the other in this case. A coworker recently asked me which was "more efficient - a bunch of or a NOT IN clause?" The answer, I assumed, like almost all things relating to databases, is "it depends." The nice thing about that answer is that it not only camouflages my vast stores of ignorance, it is also quite often true.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |