Document Storage
Status: Active
Last Modified: 2024-10-18
Related Issue: #2277
Deciders: Matt Dragon, Aaron Couch, Kai Siren, Michael Chouinard, Lucas Brown
Tags: nofo, document, attachment, s3, storage
Context and Problem Statement
Opportunities include supporting documents that help define the opportunity, provide more instructions about applying, or otherwise supplement the Opportunity Listing. These documents represent individual files, sometimes within a folder/directory hierarchy that are provided to the Grant Seekers as a single Zip download currently. Among these files is one very special file, the Notice of Funding Opportunity (NOFO) that every Opportunity must publish.
Decision Drivers
Use the AWS and Nava platforms whenever feasible
Minimize cost per file (there will be a large number of files, but most will be rarely, if ever, accessed once the Opportunity closes)
Ease of processing - Use the best tools, with already supported libraries as they're intended
Follow best practices
Options Considered
AWS Simple Storage Service (S3) bucket(s)
Decision Outcome
Chosen option: "AWS S3", because it represents the lowest Total Cost of Ownership (TCO) and industry best practices, including baked in support for access control for files, backups, etc. We will use 2 buckets, one for Published Opportunities, and one for DRAFT Opportunities. When an Opportunity is Published we'll ensure that the associated documents are copied to the Published bucket, making them accessible to the general public. Prior to publishing the documents will be accounted for in S3, so that the file storage is consistent throughout the lifecycle, but only the Publishing Service will have access to the files, ensuring they are not released to the public before the Opportunity and they can be revoked from public view if the Opportunity is accidentally Published.
Positive Consequences
Cost can be managed in trade off with performance profile of requests for files
Directly integrates with the AWS Content Delivery Network(CDN), CloudFront
Existing tooling/API allows for manipulation of files from the Publishing System and manually if needed.
S3 API is standard mimicked/supported by other cloud storage providers if we ever wanted to move these files elsewhere.
Negative Consequences
Requires it's own management if we wanted to sync the files to another environment
Disconnects the lifecycle from data in the DB, so any archiving/deleting of files doesn't happen automatically
Pros and Cons of the Options
AWS Simple Storage Service (S3) bucket(s)
Utilize the AWS S3 Service to store/host files. This problem is precisely what S3 was built to solve. It provides strong tooling, monitoring, logging, all built and ready to use. We can architect in such a way that files get scanned before being placed in the final bucket, and get very fine grained support for file versioning, backups, lifecycle, etc.
Pros
Cost can be managed in trade off with performance
Integrates with AWS Content Delivery Network(CDN), CloudFront
Existing tooling/API
S3 API is standard mimicked/supported by other cloud storage providers if we ever wanted to move these files elsewhere.
Built in support for auto-expiring links (which we want at least in the near term until we come up with more of a final structure/naming strategy)
Cons
Another separate resource to manage if we're trying to sync/simulate Prod with other environments
Store files in PostgreSQL
The existing system stores the contents of the files in the Oracle DB. This is also possible in PostgreSQL
Pros
Single data source to backup, move between environments, etc.
Simplified architecture as all communication is just with the DB server
Files and DB records share the same lifecycle so full end-to-end delete/clean up is easier
Cons
Bloats the DB with file storage which likely will rapidly outpace proper DB row storage
Makes the DB a bigger performance bottleneck as it's now handling both app data and file storage/serving responsibilities
Difficult if not impossible to virus/malware scan files stored in this way
Makes backups more costly and difficult to move around due to increased size
Other off-the-shelf or homegrown storage solution
Implement an existing off-the-shelf file storage server or build our own
Pros
If we built our own it would be a custom fit, do exactly what we needed and nothing more
Off-the-Shelf might be cheaper
Cons
Off-the-Shelf
We own everything, storage redundancy, security, patching/upgrades, Ops
Additional vendor contract, security assessment, relationship to manage
Data leaves our AWS VPC Secure Environment
Roll our own
This isn't the core value of the system that justifies building our own
Links
Last updated