At Embrace, we constantly strive to improve our infrastructure to handle the massive amount of data we process daily. As a mobile app observability platform, one of our more significant challenges was managing the storage and retrieval of large objects, in particular sessions, which is our biggest object. For context, our infrastructure processes tens of millions of user sessions per hour, with individual sessions ranging from 30 KB to 80 KB in size.
Our previous solution involved using Cassandra to store these objects, and fetch them by ID when needed, but this approach led to several issues, prompting us to seek a more efficient method. This quest resulted in the implementation of a new component that allows us to store multiple objects into one single file in S3, reducing costs drastically, and simplifying our infrastructure.
Given that this solution has worked well for our needs and can be used for any number of large cloud object storage requirements, we decided to open source the module and share with the community, so that anyone can use it!
TL;DR: We reduced storage costs by 70% with our new approach. You can head to our s3-batch-object-store GitHub repository to check out the module and try it out. Let us know what you think, as we welcome feedback, issues, and pull requests to continue improving the module.