Created
April 16, 2018 15:39
-
-
Save toidiu/b284ebcba4785f1ba88ed92883286de0 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## mongo queries for updating the new thumbs location with the old thumbs data | |
``` | |
# Timeline of data migration and dual-writing thumbs data | |
event 1 event 2 | |
__________________________________________________________________ | |
| region 1 | region 2 | region 3 | | |
| | | | | |
[------------old data----------------------] | |
[------------new data--------------------] | |
``` | |
`region 1` was a time were we were only writing to the OLD-DB. `region 2` was a | |
time were we were dual-writing to the old and NEW-DB. `region 3` is when we | |
stopped writing to the OLD-DB and were writing to the NEW-DB. | |
During `event 2` we migrated the old thumbs data to the NEW-DB but not | |
everything was moved due to the thumbs data being large. Therefore the task is | |
to migrate the old data in its entirity to the NEW-DB. During the migration we | |
used the DataWarehouse(DWH) for the migration and it seems like the DWH is | |
also missing data (from a certain point in history) | |
We can assume that since `event 2` all data was correctly witten to the NEW-DB. | |
Therefore all we are conscerned with is the data in the OLD-DB that is not in | |
the NEW-DB which is `region 1`. | |
OLD { | |
_id : sid | |
uPIdH | |
tuTrks | |
tdTrks | |
} | |
NEW { | |
sid | |
cid | |
type | |
uPIdH | |
} | |
def main() { | |
val OLD_DB = ... | |
val NEW_DB = ... | |
val TIME_BEFORE_DUAL_WRITE = ... | |
val batch_size = 100 | |
val itemOffset = 0 | |
while(true) { | |
// do a paged query so as not to pull in all the data | |
val oldData = select data in OLD_DB | |
limit = batch_size | |
offset = (itemOffset * batch_size) | |
// we are done if there is no more data to process. | |
if (oldData.isEmpty) { | |
break; | |
} | |
// filter the data so we only have items which | |
// have thumbs data | |
val oldItemList = oldData.filter{ data => data.thumbsData.exists } | |
// iterate over `oldItemList` and see which ones dont exist in the new | |
// data. These are the ones that we want to insert into the new data. | |
for oldItem in oldItemList { | |
// query new data for items that match the stationID and uPIdH. | |
// this will contain all all new data we are concerned with | |
val newDataBySid = select data in NEW_DB | |
where sid = oldItem._id // _id == sid in old data | |
and uPIdH = oldItem.uPIdH | |
// create a set of cids that exists in the new data | |
// for the particular stationID | |
val newItemCidSet = newDataBySid.map( _.cid ).toSet | |
// finally filter the oldItemList that dont exists in the new | |
// data based on cid. These were never migrated over and | |
// and we will migrate them now. | |
val insertItems = (oldItem.tdTrks ++ oldItem.tuTrks) | |
.filterNot{ cid => | |
newItemCidSet.contains(cid) | |
}.map{ thumb => | |
thumb.copy( lm = TIME_BEFORE_DUAL_WRITE ) | |
} | |
NEW_DB.insert(insertItems) | |
} | |
itemOffset = itemOffset + 1 | |
} | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Looks ok. We'll need to split this to be thumbs up and thumbs down so we can create the proper type.
We have two options here for proceeding. Either we write this in javascript and test it out in non-prod against a user. Or we create a scala script and do it that way. Should be pretty straightforward either way. Probably safer as a Scala script.