Skip to content

Instantly share code, notes, and snippets.

@toidiu
Created April 16, 2018 15:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save toidiu/b284ebcba4785f1ba88ed92883286de0 to your computer and use it in GitHub Desktop.
Save toidiu/b284ebcba4785f1ba88ed92883286de0 to your computer and use it in GitHub Desktop.
## mongo queries for updating the new thumbs location with the old thumbs data
```
# Timeline of data migration and dual-writing thumbs data
event 1 event 2
__________________________________________________________________
| region 1 | region 2 | region 3 |
| | | |
[------------old data----------------------]
[------------new data--------------------]
```
`region 1` was a time were we were only writing to the OLD-DB. `region 2` was a
time were we were dual-writing to the old and NEW-DB. `region 3` is when we
stopped writing to the OLD-DB and were writing to the NEW-DB.
During `event 2` we migrated the old thumbs data to the NEW-DB but not
everything was moved due to the thumbs data being large. Therefore the task is
to migrate the old data in its entirity to the NEW-DB. During the migration we
used the DataWarehouse(DWH) for the migration and it seems like the DWH is
also missing data (from a certain point in history)
We can assume that since `event 2` all data was correctly witten to the NEW-DB.
Therefore all we are conscerned with is the data in the OLD-DB that is not in
the NEW-DB which is `region 1`.
OLD {
_id : sid
uPIdH
tuTrks
tdTrks
}
NEW {
sid
cid
type
uPIdH
}
def main() {
val OLD_DB = ...
val NEW_DB = ...
val TIME_BEFORE_DUAL_WRITE = ...
val batch_size = 100
val itemOffset = 0
while(true) {
// do a paged query so as not to pull in all the data
val oldData = select data in OLD_DB
limit = batch_size
offset = (itemOffset * batch_size)
// we are done if there is no more data to process.
if (oldData.isEmpty) {
break;
}
// filter the data so we only have items which
// have thumbs data
val oldItemList = oldData.filter{ data => data.thumbsData.exists }
// iterate over `oldItemList` and see which ones dont exist in the new
// data. These are the ones that we want to insert into the new data.
for oldItem in oldItemList {
// query new data for items that match the stationID and uPIdH.
// this will contain all all new data we are concerned with
val newDataBySid = select data in NEW_DB
where sid = oldItem._id // _id == sid in old data
and uPIdH = oldItem.uPIdH
// create a set of cids that exists in the new data
// for the particular stationID
val newItemCidSet = newDataBySid.map( _.cid ).toSet
// finally filter the oldItemList that dont exists in the new
// data based on cid. These were never migrated over and
// and we will migrate them now.
val insertItems = (oldItem.tdTrks ++ oldItem.tuTrks)
.filterNot{ cid =>
newItemCidSet.contains(cid)
}.map{ thumb =>
thumb.copy( lm = TIME_BEFORE_DUAL_WRITE )
}
NEW_DB.insert(insertItems)
}
itemOffset = itemOffset + 1
}
}
@lvauthrin
Copy link

Looks ok. We'll need to split this to be thumbs up and thumbs down so we can create the proper type.

We have two options here for proceeding. Either we write this in javascript and test it out in non-prod against a user. Or we create a scala script and do it that way. Should be pretty straightforward either way. Probably safer as a Scala script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment