Skip to content

Instantly share code, notes, and snippets.

Forked from cscotta/Output.txt
Created November 6, 2011 20:03
Show Gist options
  • Save bwmcadams/1343389 to your computer and use it in GitHub Desktop.
Save bwmcadams/1343389 to your computer and use it in GitHub Desktop.
# note that I see *EXACT* same results as the original post with the original code.
# The fix below makes that not happen anymore (use snapshot mode)
# mongo.jar on my machine is the shipped version of 2.7.0, so same exact Java Driver of original test
# ( I was lead on the 2.7.0 release and did the build/release )
b-mac mongodb/mongo-java-driver ‹master*› » scala -cp mongo.jar Repro.scala
Inserting canary...
Inserting test data...
Paging through records...
Spotted the canary!
Updating canary object...
b-mac mongodb/mongo-java-driver ‹master*› »
import com.mongodb._
import java.util.UUID
// Connect to Mongo
val mongo = new Mongo("localhost", 27017)
val db = mongo.getDB("repro_databoxor")
db.dropDatabase() // make sure our dup also isn't running the test twice and re-inserting...
val collection = db.getCollection("repro")
var canarySightings = 0
// Insert our "canary" object.
println("Inserting canary...")
val canary = new BasicDBObject()
canary.put("name", "canary")
canary.put("value", "value")
// Insert 1,000,000 other objects.
println("Inserting test data...")
for (i <- 1 to 100000) {
val doc = new BasicDBObject()
doc.put("name", UUID.randomUUID.toString)
doc.put("value", UUID.randomUUID.toString)
// The function we'll call to operate on records returned from the DB.
def shipOrderToCustomer(doc: DBObject) {
if (doc.get("name") == "canary") {
canarySightings += 1
println("Spotted the canary!")
if (canarySightings > 1) println("Whoops, shipped the same order multiple times!")
// In one thread (or process or machine, etc.), read through records an act on them.
val reader = new Thread(new Runnable {
def run = {
println("Paging through records...")
// Switch cursor to snapshot mode to ensure we don't see duplicates due to update moves
val cursor = collection.find().snapshot()
while (cursor.hasNext)
// In another thread (or process, machine, etc.), update one of the records.
val updater = new Thread(new Runnable {
def run = {
println("Updating canary object...")
val query = new BasicDBObject()
query.put("name", "canary")
val newDoc = new BasicDBObject()
newDoc.put("name", "canary")
var value = ""
for (i <- 1 to 1000) value += UUID.randomUUID.toString
newDoc.put("value", value)
collection.update(query, newDoc)
Copy link

For what it's worth, you'll probably be much happier using Casbah (the scala driver) instead of raw Java from Scala; things such as proper iterators and support for native Scala types make life much easier.

As to your issue w/ multiple inserts, it looks likely to be a cursor issue. By default query results are not snapshotted; your update is likely causing the file to exceed it's allocated space on the on-disk file and be relocated to the end, which could cause you to see the same record twice on a previously opened cursor (because it moved from it's original location and the cursor passes it over again).

An 11 character change will fix this problem. (See Above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment