meiqimichelle/ipfs_data-persistence_user-workflow.md

## ipfs_data-persistence_user-workflow.md

      
    Raw
  

              ipfs_data-persistence_user-workflow.md
            
          
    John Q. Curious Public wants his photos to persist on the distributed web.

North star

People need support for persisting data on IPFS, either via clearer paths to co-hosting, or via third-party pinning services. We need to convey how ‘saving’ something on IPFS, or more importantly, ‘publishing’ or ‘sharing’ something on IPFS, doesn’t necessarily mean that it’s always accessible.
Priorities

In the short term, we should provide clear indicators (both visual and words) of what saving, publishing, and sharing, mean in the IPFS ecosystem. We should also give people next steps in our GUI applications for data persistence, which today may be as simple as explanations and links pinning services and roll-your-own co-hosting information.
At a deeper technical level, we need to improve our garbage collection user experience. As traversing a DAG to perform GC is essentially unavoidable, we should intentionally experiment with when and how this happens, and how many other things can happen concurrently. Also, we need to address the race condition problem in the API. [^2]
Workflow: pinning service

Hacker “Pinning Service” MacDev: “I really believe in the distributed web, and I want people to have a pleasant user experience around making their data persist and be available.”
John Q.: “I want my photos to stick around on IPFS, but I’m out of space on my personal computer, and I don’t want to, like, buy more computers or learn how to run a server or whatever it is I need to do.”

  
   Customer workflow
   
   Intermediary workflow
   
   IPFS interaction
   
  
   John Q. signs up for an account on a pinning service.
   
   Hacker MacDev’s pinning service manages user accounts.
   
   No immediate IPFS interaction.
   
  
   John Q. clicks “upload” and browses via the pinning service for the directory called “Q. Curious Family History,” and selects it.
   
   Hacker’s service allows browsing for and uploading local directories to IPFS. It does this by tracking IPFS hashes as well as its own metadata.
   
   The pinning service “adds and pins to IPFS” everything added to its interface.
   
  
   John Q. sees a notice estimating the amount of time it will take for his directory to be available on IPFS.
   
   Hacker’s service adds John Q.’s directory to IPFS in the background.
   
   IPFS recursively pins John Q.’s directory.
   
  
   John Q. gets a notification when his directory is available on IPFS.
   
   Hacker’s service replicates John Q.’s directory across a Cluster of peers so that his information is highly available.
   
   IPFS Cluster is used to orchestrate data replication.
   
  
   John Q. pays for the amount of storage he uses.
   
   Hacker’s service tracks the amount of storage each user has on IPFS, and charges them accordingly. [^3]
   
   The IPFS Pinning Service API provides the information the pinning service needs to track the size and number of pins.
   
  
   John Q. updates his directory (adding and deleting some photos, changing names of others), and re-uploads it to his pinning service.
   
   Hacker’s service interfaces with IPFS to add, delete, and rename files. The service replicates data such that there’s no downtime while IPFS unpins and runs GC. [^4]
   
   IPFS pins new files in the directory, removes pins and blocks that should no longer exist, runs GC, and creates a new hash for the root directory.
   
  
Success metrics


Ease of task completion via usability tests with pinning service employees/founders and customers
Pinning service-reported metrics (maybe they’re willing to share aggregate information)

Number of users
Number of users stored over X amount
Data pinned


User frustration/happiness self-reporting

Support pinning services to add “Was this helpful”-type questionnaire to their apps


Painpoints solved for pinning services (removing friction from their workflows via technical changes/shipping features on IPFS)

Note: building Cube below is part of the metrics picture for the pinning service use case because it is essentially a very basic, open-source pinning service application that we can run metrics and gather feedback on ourselves_._
Workflow: community co-hosting

Librarian Alex: “I want my library to be a trusted home for my community. Now that so much of life is digital, I want the library to be a home for my community’s digital stories and histories, too.”
John Q.: “I’m getting the hang of this ‘data stewardship’ thing, and want to host my family history. The pinning service is nice, but that’s a for-profit enterprise. Maybe there’s another option.”

  
   Customer workflow
   
   Intermediary workflow
   
   IPFS interaction
   
  
   John Q. claims his account on his local library’s Cube.
   
   Librarian Alex’s Cube manages membership accounts via library card number. Each member gets 10 GB of storage on the Community Cube, which Alex hosts on AWS.
   
   No immediate IPFS interaction.
   
  
   John Q. clicks “upload” and browses via the Cube interface for the directory called “Q. Curious Family History,” and selects it.
   
   The Cube online interface allows browsing for and uploading local directories to IPFS, up to limits set by an administrator. It does this by tracking IPFS hashes as well as its own member metadata.
   
   Cube “adds and pins to IPFS” everything added to its interface.
   
  
   John Q. sees a notice estimating the amount of time it will take for his directory to be available on the Community Cube.
   
   Cube  adds John Q.’s directory to IPFS in the background.
   
   IPFS recursively pins John Q.’s directory.
   
  
   John Q. gets a notification when his directory is available on the Community Cube.
   
   Alex’s Cube replicates John Q.’s directory across a Cluster of peers so that his information is highly available.
   
   IPFS Cluster is used to orchestrate data replication.
   
  
   John Q. sends a link to the Community Cube. His daughter can see his information, as well as other directories that other members have contributed.
   
   Cube provides a central interface on the IPFS Gateway that shows all ‘contributed’ information. The Cube provides further availability for this information, in addition to members’ local Public IPFS folders. [Could this be self-hosted/ self-gateway’d?]
   
   The IPFS Pinning Service API provides the information Cube needs to track the size and number of pins.
   
  
   John Q. updates his directory (adding and deleting some photos, changing names of others), and re-uploads it to Cube.
   
   Cube interfaces with IPFS to add, delete, and rename files. The service replicates data such that there’s no downtime while IPFS unpins and runs GC. [^4]
   
   IPFS pins new files in the directory, removes pins and blocks that should no longer exist, runs GC, and creates a new hash for the root directory.
   
  
Success metrics


User engagement rates via opt-in metrics

Number of interactions with IPFS Cube per week
Amount of data across peers
Number and types of peers
Number of accounts/users


Number of downloads
Opt-in share of error logs
User frustration/happiness self-reporting

“Was this helpful”-type questionnaire where appropriate, with open-ended box or issue for optional feedback


Ease of user task completion via usability tests

Adding and removing peers
Creating follower links; adding and removing followers
Being added as a follower (from the follower peer perspective)
Adding and removing data


Long-term vision


Alex, on top of her other librarian duties, happens to have inherited all local admin tasks, so it’s up to her to keep the website lights on, etc.
She learns about IPFS Cube, an out-of-the-box solution for IPFS community co-hosting. Quite literally out-of-a-box: you can order Cubes online, and they’re not even very expensive. The local Code for America chapter that meets in a library conference room once a month can help her set it up at the next hack night.
Once the little box is plugged in, its simple screen asks for a key to her IPFS Drive account so Alex can manage files from her phone or desktop computer.
As a simple start, Alex connects her Cube’s Public folder to the big screen at the library entrance. Pictures of local family and friends are set to rotate automatically. She’s got plans for recorded family histories that people can view via ‘checking out’ a viewing pod at the library.
Next, Alex is excited about publicising the library’s new capabilities. She sends an email blast, and talks to people in the library, about joining the “Community Cube.” All they have to do is go to an URL that she’s created, and they can get approved to join. They can start adding their own family stories, and not have to worry about a for-profit website hosting service going out of business, and taking all of their data with it.
As Alex’s Cube experiment gets rave reviews, the library is able to purchase several more Cubes to provide better availability and redundancy for their own datasets, setting up the new Cubes as “followers” to the original. They really like buying these devices because they work out-of-the-box -- no set-up needed, and it works as soon as it’s turned on.
Other regional libraries follow Alex’s lead, and buy their own self-hosted pinning services. They come to an agreement to help each other host data, providing even stronger availability and redundancy in case, for example, the power goes out at one location. They arrange themselves as a “federated” Cluster.

Notes


[^2] In the medium term, we can look forward UnixFSv2 and selectors landing, which will make our lower-level architecture more effective, and obviate some of our existing user experience bottlenecks. This work, however, does not block the short-term actions laid out above.
[^3] See this comment for more detail on Pinning Service API endpoints for payment types:  https://github.com/ipfs/notes/issues/378#issuecomment-519125912
[^4] “Part of [the challenge here] is just due to some of the limitations in how IPFS works in pinning. The scale we're running at -- tens of thousands of hashes per node -- can make content discovery difficult. This is mostly due to how we simply can't announce all of the content we have fast enough before content announcements expire. Some of this is rooted in challenges the DHT has with undialable nodes. We'll have to see how future IPFS updates effect this. 

 
The big problem is that GC on IPFS doesn't really work like a normal file system. When you delete something, it doesn't immediately go away. You have to "unpin it" and then run a garbage collection process to get rid of it. Right now, our nodes take roughly 10 hours to GC, and when that happens we can't pin. In the beginning, we got around it by … not GC'ing. Now we replicate across multiple nodes and have to intelligently schedule garbage collections to make sure content is always online. This is a really tough problem to solve and as we scale this might not be the best solution.”
The relationship of IPLD selectors and third-party use of the Pin API: [link coming soon]
IPFS Cube Product Proposal: https://docs.google.com/document/d/1yfef8xdpyeLXz_PQp3qofZ6tzEhDXfUZ0oEBBxfB3QQ/edit#
Pinning Service API: https://github.com/ipfs/notes/issues/378
IPFS Cluster <> Filecoin Integration Proposal: https://docs.google.com/document/d/1BUt7stI6gtIBuLrQYzNJ5hEdRaHrbU3a-Um7MfyYAYw/edit#
Experiment in MFS-based cohosting: https://github.com/ipfs-shipyard/cohosting/pull/2
Customer workflow	Intermediary workflow	IPFS interaction
John Q. signs up for an account on a pinning service.	Hacker MacDev’s pinning service manages user accounts.	No immediate IPFS interaction.
John Q. clicks “upload” and browses via the pinning service for the directory called “Q. Curious Family History,” and selects it.	Hacker’s service allows browsing for and uploading local directories to IPFS. It does this by tracking IPFS hashes as well as its own metadata.	The pinning service “adds and pins to IPFS” everything added to its interface.
John Q. sees a notice estimating the amount of time it will take for his directory to be available on IPFS.	Hacker’s service adds John Q.’s directory to IPFS in the background.	IPFS recursively pins John Q.’s directory.
John Q. gets a notification when his directory is available on IPFS.	Hacker’s service replicates John Q.’s directory across a Cluster of peers so that his information is highly available.	IPFS Cluster is used to orchestrate data replication.
John Q. pays for the amount of storage he uses.	Hacker’s service tracks the amount of storage each user has on IPFS, and charges them accordingly. [^3]	The IPFS Pinning Service API provides the information the pinning service needs to track the size and number of pins.
John Q. updates his directory (adding and deleting some photos, changing names of others), and re-uploads it to his pinning service.	Hacker’s service interfaces with IPFS to add, delete, and rename files. The service replicates data such that there’s no downtime while IPFS unpins and runs GC. [^4]	IPFS pins new files in the directory, removes pins and blocks that should no longer exist, runs GC, and creates a new hash for the root directory.