Simpler.Grants.gov Public Wiki
Grants.govSimpler.Grants.govGitHubDiscourse
  • 👋Welcome
  • GET INVOLVED
    • Why open source?
    • How to contribute code
    • How to file issues
      • Report a bug
      • Request a feature
      • Report a security vulnerability
    • Community guidelines
      • Code of Conduct
      • Reporting and removing content
      • Incident response protocol
    • Community events
      • Fall 2024 Coding Challenge
        • Event Submissions & Winners
      • Spring 2025 Collaborative Coding Challenge
        • Event Submissions & Winners
    • Communication channels
  • Product
    • Roadmap
    • Deliverables
      • 🏁Static site soft launch
      • 🏁Static site public launch
      • 🏁GET Opportunities
      • 🏁Open source onboarding
      • 🏁Co-Design Group planning
    • Decisions
      • ADR Template
      • ADRs
        • Dedicated Forum for Simpler.Grants.gov Community
        • Recording Architecture Decisions
        • Task Runner for the CI / CD Pipeline
        • API Language
        • Use Figma for design prototyping
        • ADR: Chat
        • DB Choices
        • API Framework and Libraries
        • Back-end Code Quality Tools
        • Front-end Language
        • Communications Tooling: Wiki Platform
        • Use Mural for design diagrams and whiteboarding
        • Ticket Tracking
        • Front-end Framework
        • Front-end Code Quality Tools
        • Front-end Testing & Coverage
        • Backend API Type
        • Front-end Testing & Coverage
        • Deployment Strategy
        • Use U.S. Web Design System for components and utility classes
        • FE server rendering
        • Use NPM over Yarn Architectural Decision Records
        • U.S. Web Design System in React
        • Communications Tooling: Video Conferencing
        • Back-end Production Server
        • Communications Tooling: Analytics Platform
        • Commit and Branch Conventions and Release Workflow
        • Cloud Platform to Host the Project
        • Infrastructure as Code Tool
        • Data Replication Strategy & Tool
        • HHS Communications Site
        • Communications Tooling: Email Marketing
        • Communications Tooling: Listserv
        • Use Ethnio for design research
        • Uptime Monitoring
        • Database Migrations
        • 30k ft deliverable reporting strategy
        • Public measurement dashboard architecture
        • Method and technology for "Contact Us" CTA
        • E2E / Integration Testing Framework
        • Logging and Monitoring Platform
        • Dashboard Data Storage
        • Dashboard Data Tool
        • Search Engine
        • Document Storage
        • Document Sharing
        • Internal Wiki ADR
        • Shared Team Calendar Platform
        • Cross-Program Team Health Survey Tool
        • Adding Slack Users to SimplerGrants Slack Workspace
        • Repo organization
        • Internal knowledge management
        • Migrate Existing API Consumers
      • Infra
        • Use markdown architectural decision records
        • CI/CD interface
        • Use custom implementation of GitHub OIDC
        • Manage ECR in prod account module
        • Separate terraform backend configs into separate config files
        • Database module design
        • Provision database users with serverless function
        • Database migration architecture
        • Consolidate infra config from tfvars files into config module
        • Environment use cases
        • Production networking long term state
    • Analytics
      • Open source community metrics
      • API metrics
  • DESIGN & RESEARCH
    • Brand guidelines
      • Logo
      • Colors
      • Grid and composition
      • Typography
      • Iconography
      • Photos and illustrations
    • Content guidelines
      • Voice and tone
    • User research
      • Grants.gov archetypes
  • REFERENCES
    • Glossary
  • How to edit the wiki
Powered by GitBook
On this page
  • Context and Problem Statement
  • Decision Drivers
  • Options Considered
  • Decision Outcome
  • Positive Consequences
  • Negative Consequences
  • Pros and Cons of the Options
  • AWS Simple Storage Service (S3) bucket(s)
  • Store files in PostgreSQL
  • Other off-the-shelf or homegrown storage solution
  • Links

Was this helpful?

Edit on GitHub
  1. Product
  2. Decisions
  3. ADRs

Document Storage

PreviousSearch EngineNextDocument Sharing

Last updated 28 days ago

Was this helpful?

  • Status: Active

  • Last Modified: 2024-10-18

  • Related Issue:

  • Deciders: Matt, Aaron, Kai, Michael, Lucas

  • Tags: nofo, document, attachment, s3, storage

Context and Problem Statement

Opportunities include supporting documents that help define the opportunity, provide more instructions about applying, or otherwise supplement the Opportunity Listing. These documents represent individual files, sometimes within a folder/directory hierarchy that are provided to the Grant Seekers as a single Zip download currently. Among these files is one very special file, the Notice of Funding Opportunity (NOFO) that every Opportunity must publish.

Decision Drivers

  • Use the AWS and Nava platforms whenever feasible

  • Minimize cost per file (there will be a large number of files, but most will be rarely, if ever, accessed once the Opportunity closes)

  • Ease of processing - Use the best tools, with already supported libraries as they're intended

  • Follow best practices

Options Considered

  • bucket(s)

Decision Outcome

Chosen option: "AWS S3", because it represents the lowest Total Cost of Ownership (TCO) and industry best practices, including baked in support for access control for files, backups, etc. We will use 2 buckets, one for Published Opportunities, and one for DRAFT Opportunities. When an Opportunity is Published we'll ensure that the associated documents are copied to the Published bucket, making them accessible to the general public. Prior to publishing the documents will be accounted for in S3, so that the file storage is consistent throughout the lifecycle, but only the Publishing Service will have access to the files, ensuring they are not released to the public before the Opportunity and they can be revoked from public view if the Opportunity is accidentally Published.

Positive Consequences

  • Cost can be managed in trade off with performance profile of requests for files

  • Directly integrates with the AWS Content Delivery Network(CDN), CloudFront

  • Existing tooling/API allows for manipulation of files from the Publishing System and manually if needed.

  • S3 API is standard mimicked/supported by other cloud storage providers if we ever wanted to move these files elsewhere.

Negative Consequences

  • Requires it's own management if we wanted to sync the files to another environment

  • Disconnects the lifecycle from data in the DB, so any archiving/deleting of files doesn't happen automatically

Pros and Cons of the Options

AWS Simple Storage Service (S3) bucket(s)

Utilize the AWS S3 Service to store/host files. This problem is precisely what S3 was built to solve. It provides strong tooling, monitoring, logging, all built and ready to use. We can architect in such a way that files get scanned before being placed in the final bucket, and get very fine grained support for file versioning, backups, lifecycle, etc.

  • Pros

    • Cost can be managed in trade off with performance

    • Integrates with AWS Content Delivery Network(CDN), CloudFront

    • Existing tooling/API

    • S3 API is standard mimicked/supported by other cloud storage providers if we ever wanted to move these files elsewhere.

    • Built in support for auto-expiring links (which we want at least in the near term until we come up with more of a final structure/naming strategy)

  • Cons

    • Another separate resource to manage if we're trying to sync/simulate Prod with other environments

Store files in PostgreSQL

The existing system stores the contents of the files in the Oracle DB. This is also possible in PostgreSQL

  • Pros

    • Single data source to backup, move between environments, etc.

    • Simplified architecture as all communication is just with the DB server

    • Files and DB records share the same lifecycle so full end-to-end delete/clean up is easier

  • Cons

    • Bloats the DB with file storage which likely will rapidly outpace proper DB row storage

    • Makes the DB a bigger performance bottleneck as it's now handling both app data and file storage/serving responsibilities

    • Difficult if not impossible to virus/malware scan files stored in this way

    • Makes backups more costly and difficult to move around due to increased size

Other off-the-shelf or homegrown storage solution

Implement an existing off-the-shelf file storage server or build our own

  • Pros

    • If we built our own it would be a custom fit, do exactly what we needed and nothing more

    • Off-the-Shelf might be cheaper

  • Cons

    • Off-the-Shelf

      • We own everything, storage redundancy, security, patching/upgrades, Ops

      • Additional vendor contract, security assessment, relationship to manage

      • Data leaves our AWS VPC Secure Environment

    • Roll our own

      • This isn't the core value of the system that justifies building our own

Links

  • Alternatives

AWS Simple Storage Service (S3)
Ceph
Backblaze B2
Wasabi Hot Cloud Storage
List of Alternatives
#2277
AWS Simple Storage Service (S3)
Store files in PostgreSQL
Other off-the-shelf or homegrown storage solution