In 2002, when I first interviewed at Microsoft one of the
questions I was asked was "how would you delete duplicate rows from
a [sql] database table?" I don't actually recall what
my answer was and I don't suppose it much matters as I was hired by
Microsoft soon after the interview - though I am no longer employed
by Microsoft. For seven years, give or take, I haven't had
occasion to consider this question a second time, until
today. Naturally, I almost immediately performed a web search
and located at least five different methods for deleting duplicate
records…I'm sure there are many more than five. A
Microsoft KB article suggested what seems a robust but
rather involved solution, and most others are a variant on
this. In the end, as before, Pinal Dave came through with a straightforward
solution that accomplished exactly what I wanted - shown here:
DELETE
FROM someTable
WHERE Id NOT IN
(
SELECT MAX(Id)
FROM someTable
GROUP BY someDupColumn, anotherDupColumn)
If you follow my blog or Twitter
feed you no doubt know that I go to some lengths to avoid
writing SQL - namely by using the SubSonic ORM. To be fair, I
still use SQL databases (MSSQL and MySQL) and have not jumped to an
alternative such as Amazon's SimpleDB or Microsoft's Azure Data Storage. While I
prefer SubSonic, even Microsoft has gotten into the ORM space with
LINQ and there are many other solutions here as well.
Disclaimer: I was once a Certified Oracle Developer and have
created more than a few apps where the business logic was contained
almost completely in stored procedures - those days are over,
thankfully.
In my opinion, as a web developer, I do not need to rely on a
deep SQL knowledge any longer. With the above mentioned
technologies, highly-usable database administration UI's, and a
basic grasp of good data design I believe web developers can be
freed from 'having' to use SQL. What a relief!
For a recent app that's fairly data intensive,
though quite simple, I used no SQL and only SubSonic to interact
with the MySQL backend. The most complex 'query' in the app
is as follows:
ListingDetailCollection listings = new Select().From("ListingDetail")
.Where("target").In(new Select("code")
.From("ListingZip").WhereExpression("Latitude")
.IsLessThanOrEqualTo(HighLatitude)
.And("Latitude").IsGreaterThanOrEqualTo(LowLatitude)
.And("Longitude").IsGreaterThanOrEqualTo(LowLongitude)
.And("Longitude").IsLessThanOrEqualTo(HighLongitude))
.ExecuteAsCollection<ListingDetailCollection>();
And that's it - reduced to a few lines, in-line with the
business logic, and using the same conventions as C# in general -
makes it easy to create, debug, and annotate. I hope I can
encourage others I work with this year to move away from SQL to
SubSonic (or LINQ, etc…) as well. It has the
potential to increase productivity, quality, and
sophistication.
Let me know what you think.
-Paul