-
Notifications
You must be signed in to change notification settings - Fork 1
MongoDB Index Benchmarking
I probably don't need to sell you on this, but in case I do.
There are a lot of times that I want to know, "what was the last scan of 'mail1'?". To simulate the importance of indexes I took a scan with 184 hosts and added it to MongoDB until I had 50,000+ records. (Pretend you scan 1,000 hosts a week for a year, it adds up.)
mongos> db.scans.count();
51520
The catch was, every time I entered a host I used a random, scanned_on
date.
date = Time.at(rand * Time.now.to_i)
time = Time.utc(date.year, date.month, date.day)
host_details["scanned_on"] = time
mongos> db.scans.find(
{ "hostname": "mail1" },
{ _id: 0, hostname: 1, scanned_on: 1 }
).sort({scanned_on: -1})
{ "scanned_on" : ISODate("2013-02-25T00:00:00Z"), "hostname" : "mail1" }
{ "scanned_on" : ISODate("2013-02-12T00:00:00Z"), "hostname" : "mail1" }
{ "hostname" : "mail1", "scanned_on" : ISODate("2013-01-13T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2012-12-16T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2012-11-29T00:00:00Z") }
{ "scanned_on" : ISODate("2012-04-13T00:00:00Z"), "hostname" : "mail1" }
{ "hostname" : "mail1", "scanned_on" : ISODate("2011-12-05T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2011-10-26T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2011-07-22T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2011-05-15T00:00:00Z") }
{ "scanned_on" : ISODate("2011-01-21T00:00:00Z"), "hostname" : "mail1" }
{ "hostname" : "mail1", "scanned_on" : ISODate("2010-12-12T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2010-03-19T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2009-12-10T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2009-10-02T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2009-07-25T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2009-05-11T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2009-03-24T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2009-03-06T00:00:00Z") }
{ "hostname" : "mail1", "scanned_on" : ISODate("2008-10-24T00:00:00Z") }
Looking at the query profile:
mongos> db.scans.find(
{ "hostname": "mail1" },
{ _id: 0, hostname: 1, scanned_on: 1 }
).sort({scanned_on: -1}).explain();
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 280,
"nscannedObjects" : 51520,
"nscanned" : 51520,
"nscannedObjectsAllPlans" : 51520,
"nscannedAllPlans" : 51520,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 3628,
"indexBounds" : {
},
"server" : "devmongodb001:30000",
"millis" : 3628
}
I had to scan 51520 objects to find 280. The only index I used was on _id
(otherwise "cursor" would be set to something other than "BasicCursor"). Because we don't have any indexes this query took 3.6 seconds to run.
We can do better
I'm searching on hostname
, so why not create an index on hostname
?.
mongos> db.scans.ensureIndex({hostname: 1});
mongos> db.scans.find( { "hostname": "mail1" }, { _id: 0, hostname: 1, scanned_on: 1 }).sort({scanned_on: -1}).explain();
{
"cursor" : "BtreeCursor hostname_1",
"isMultiKey" : false,
"n" : 280,
"nscannedObjects" : 280,
"nscanned" : 280,
"nscannedObjectsAllPlans" : 381,
"nscannedAllPlans" : 381,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 106,
"indexBounds" : {
"hostname" : [
[
"mail1",
"mail1"
]
]
},
"server" : "devmongodb001:30000",
"millis" : 106
}
Outstanding! My index means that I only have to scan 280 objects (thank you B-trees) and my query performance is now 106ms. This is a 92% improvement for the low, low cost of a 1.4 MB index.
We can do better
We can create a compound index on hostname
and scanned_on
.
mongos> db.scans.ensureIndex( { "hostname": 1, "scanned_on": -1 } );
The -1 ensures that the scanned_on
portion of the index is done in reverse order. For date / time fields this is good because your latest entry will always be the first one in the index.
mongos> db.scans.find( { "hostname": "mail1" }, { _id: 0, hostname: 1, scanned_on: 1 }).sort({scanned_on: -1}).explain();
{
"cursor" : "BtreeCursor hostname_1_scanned_on_-1",
"isMultiKey" : false,
"n" : 280,
"nscannedObjects" : 0,
"nscanned" : 280,
"nscannedObjectsAllPlans" : 0,
"nscannedAllPlans" : 280,
"scanAndOrder" : false,
"indexOnly" : true,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 1,
"indexBounds" : {
"hostname" : [
[
"mail1",
"mail1"
]
],
"scanned_on" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "devmongodb001:30000",
"millis" : 1
}
We still have to scan 280 objects, but note the "indexOnly" attribute is set to true. Because this query is only querying on indexed fields and only displaying indexed fields we don't even have to retrieve the documents. All the information we want is in the index. Our query time is reduced to 1ms. This only increases our index size by .5mb.
Displaying a non-indexed field nukes the indexOnly improvements:
mongos> db.scans.find( { "hostname": "mail1" }, { _id: 0, hostname: 1, ip: 1, scanned_on: 1 }).sort({scanned_on: -1}).explain();
{
"cursor" : "BtreeCursor hostname_1_scanned_on_-1",
"isMultiKey" : false,
"n" : 280,
"nscannedObjects" : 280,
"nscanned" : 280,
"nscannedObjectsAllPlans" : 280,
"nscannedAllPlans" : 280,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 2,
"indexBounds" : {
"hostname" : [
[
"mail1",
"mail1"
]
],
"scanned_on" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "devmongodb001:30000",
"millis" : 2
}
but it's still pretty damn good.
As a side benefit, queries on hostname
with no sorting on scanned_on
still use this index.
We cannot do better.
- Plan your indexes.
- Your indexes need to support your queries.
- Covered queries are awesome