stormwild/filestream-aws-s3-migration.md

## filestream-aws-s3-migration.md

      
    Raw
  

              filestream-aws-s3-migration.md
            
          
    FileStream to AWS S3 Migration


Migrating 6.5TB of FileStream Data to AWS S3 - A Journey Concluded


The big fish here though: All our system runs off a central Microsoft SQL database. Nothing special there, and that moved to RDS without a problem. Where we (expected, and) had problems, is with what we called our Paperwork Database. This was a second Microsoft SQL Server, hosted on premises, that was approximately 6.5TB of binary objects stored in FileStream format within the database. For those that don’t know it, this allows the SQL Server to store the binary blob as a GUID in the file system, rather than as LOB within the database MDF itself. Can lead to less chance of corruption from single multi-TB sized files, and also offers some other advantages with direct file access that we didn’t use.
Now, not only could this database not directly migrate to RDS (FileStream is not supported on RDS), cost-wise, it was going to be one helluva stupid move. We evaluated the idea of using an EC2 instance, but decided against it for 2 reasons:

Cost. EC2 doesn’t cost as much as RDS for storage, but it’s still significantly higher than S3 costs.
“as-a-service” - Part of the drive into AWS was the managed-services and their offerings. Essentially, pay more for the service, but reduce the need to manage it. Since we were heavy on developers, and light on the Sys-Admin side of things, this made a lot of sense for us, and still does.


The plan ended up being the following steps:

Create an S3 bucket for each table in the database. Not a huge number, but enough to separate the types of attachments we store.
Since FileStream enforces a GUID column in the table, we decided to use that as our S3 key. Performance isn't a major issue, and AWS have most recently announced changes that mean flat storage in S3 isn't bad anyway. Since we could have had 10,000 files all called "paperwork.pdf", filenames weren't going to work and we wanted to keep it flat, so we use the GUID as the S3 key, with an extension.
This means we get to save a file as XXX-XXX-XXX.PDF, and the metadata row in the database tells us the real filename.
We chose this method so we can modify filetypes and keep the originals. For example, we have actually made a lambda that if a XXX-XXX-XXX.DOCX gets uploaded, a lambda fires off, and converts the DOCX to PDF, and writes it back to the S3 Bucket as XXX-XXX.PDF. Now our “fetch” system can retrieve either the original or the PDF version of the same file.
We identified that of all the integration points the paperwork have, there are only a handful of places that WRITE to the storage, many more that read. So this is where we decided to start.
We made a web service API for writing files. This was in .NET (Visual Basic, as that is where the rest of our stack is at the moment), and ended up being pretty straight-forward. It can be called internally from our main application, or externally via our other applications. It takes the needed metadata, a directive for which table/bucket to store in, and the binary object. It then inserts the meta data into the database, gets back the GUID from the row, and inserts the binary into S3. At the early stages, it did this to the old paperwork server (so old legacy reads would get the binary still), and also to S3.
This started to give me a feeling of safety, as I knew every single row that was writing from this point forward, was in BOTH systems, so even if we had a failure of the on-prem server, I still had this data. Baby steps!


Planning and Building Microsoft SQL Server on Amazon RDS


AWS RDS SQL Server migration using native backups


Limitations of native backup and restore

We cannot restore a backup from the s3 bucket of the cross region
We cannot restore backups of databases which have FILESTREAM data as AWS RDS SQL Server does not support FILESTREAM
RDS supports native restores of databases up to 5TB
If the instance is Multi-AZ, we can restore backups of databases that are in full recovery mode


How to deploy SQLPROJ DACPAC to Amazon RDS
Azure

Differences between Azure SQL Database and SQL Server
FileStream

FILESTREAM in SQL Server
Google Database Migration Service

Google makes Database Migration Service generally available

By Stephanie Condon | March 31, 2021 -- 16:00 GMT (00:00 SGT) | Topic: Cloud
With the serverless tool, organizations can migrate MySQL and PostgreSQL databases to Cloud SQL.
Google Cloud on Wednesday announced the general availability of its Database Migration Service (DMS), a serverless tool to migrate MySQL and PostgreSQL databases to Cloud SQL. Later in the year, Google will introduce support for Microsoft SQL Server.
DMS supports migrations from both on-premises and other clouds. It offers a unique migration method that uses MySQL and PostgreSQL's native replication capabilities and maximizes security, fidelity and reliability.