Use Cases for Integrating Decentralized Storage into Yotta Platform

Integrating a decentralized storage solution provides a robust data-centric ecosystem that enhances Yotta OS with high data availability, secure data ownership, and performance acceleration. In this article, we explore how Yotta OS can leverage decentralized storage, as summarized in Figure 1.

Model Parameters and Training Data

A fundamental use case for decentralized storage in Yotta OS is hosting model parameters and training datasets. This can be done explicitly, where users create storage buckets on the Yotta Platform and specify them for storing data. Alternatively, it can be handled implicitly, with the serverless platform utilizing decentralized storage under the hood to manage model parameters and datasets seamlessly.

Media Data

For AI-generated content, such as images, videos, and music, the generated output is typically stored in blob storage, while only the retrieval URL is returned to the requester. In these scenarios, Yotta OS can leverage decentralized storage to securely host generated data and provide users with access links for retrieval. This ensures scalable and efficient storage for AI-generated media.

Prompt Cache

In many AI inference scenarios, applications repeatedly use the same context across multiple API calls. For example:

A user engages in long, multi-turn conversations with a chatbot where the conversation history remains relevant.
A developer iteratively extends a codebase while maintaining contextual consistency across multiple requests.

By reusing previously seen input tokens (or prompts), prompt caching reduces redundant computations and accelerates response times.

Yotta OS can utilize a decentralized storage to implement an efficient prompt cache. Additionally, privacy concerns often arise when storing user interactions. As the result, strong data ownership protections are typically required to ensure secure and private prompt caching, mitigating potential risks.

Attention Cache

The attention cache is a technique proposed by a co-founder of Yotta Labs [1]. Transformer-based LLMs (large language models) are widely adopted due to their superior inference accuracy and high throughput. However, they are also computationally intensive, leading to long inference times.

To address this challenge, we introduce memoization to accelerate self-attention computation in transformers. Our key observation is that there is significant similarity in attention computation across different inference sequences. By building a memoization database and employing a novel embedding technique to identify semantically similar inputs, we can reuse previously computed attention scores — effectively reducing computation time. This memoization database is referred to as the attention cache.

Like the prompt cache, the attention cache can be built on decentralized storages. The distributed architecture enables Yotta OS to co-locate common attention computation results with transformer execution, further improving performance.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is an advanced technique that enhances LLMs by integrating retrieval-based and generative models. In RAG:

The retrieval-based model fetches relevant information from large data repositories.
The generative model uses this retrieved data to produce more accurate and contextually relevant responses.

RAG is particularly effective in reducing hallucinations in LLM outputs by ensuring that responses are grounded in factual, retrievable data.

To implement RAG, Yotta OS can store relevant documents in Greenfield’s decentralized storage. These documents can then be indexed in a vector database for efficient retrieval, enhancing both performance and reliability.

Conclusion

By integrating with a decentralized storage solution like Greenfield, Walrus, etc, Yotta OS can enhance its capabilities across multiple domains, including model storage, media asset management, caching strategies, and retrieval-based AI models. This integration not only improves data availability and performance but also ensures stronger data ownership and security — key factors in the evolution of AI-driven platforms.

References

[1] Y. Feng, H. Jeon, F. Blagojevic, C. Guyot, Q. Li, and D. Li, “AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems,” 2023. [Online]. Available: https://arxiv.org/abs/2301.09262