Actors
There are four actors within the network:
Client
The client submits a request for a Text-to-Speech (TTS) task, specifying parameters like voice characteristics (e.g., pitch, tone, etc...), acoustic features, and the text to be synthesized. If the network lacks a pre-trained model for these specifications, the client must provide training data (e.g., voice samples).
The client deposits tokens into a smart contract, which manages the payment distribution across the nodes involved in the task.
Training Node
These nodes store acoustic models and voice-specific embeddings derived from user-provided training data. They are responsible for fine-tuning or adapting models to match specific voice characteristics.
Training nodes adjust the model weights and acoustic parameters based on provided training data, ensuring that the synthesized speech closely matches the requested voice style and tone.
Processor Node
Processor nodes handle the execution of the TTS pipeline, which includes all the stages explained in the TTS explanation of this docs.
A task can be split among multiple processor nodes, with each specializing in a specific stage of the TTS process. Each node advertises the subtasks it can handle, and the cost per subtask.
These nodes use deep neural networks or other generative models for the TTS process, ensuring accurate synthesis and high-quality output.
Validator Node
Validator nodes ensure the accuracy and quality of outputs from both training and processor nodes. They validate that model adjustments reflect the correct voice characteristics and that the TTS output matches the input text.
They use methods like speech-to-text (STT) for content validation, voice embedding comparison for identity verification, and signal quality assessment to ensure high fidelity audio.
Validation decisions are aggregated using consensus mechanisms, where majority votes determine whether the task outputs meet the quality standards.
Although a single node could perform training and processing for efficiency, the system is designed with modular architecture to provide flexibility to participants. This modularity enables specialization and scalability, allowing nodes to focus on specific tasks.
To participate, nodes must stake tokens as a security deposit, which incentivizes correct behavior and provides collateral for potential penalties if nodes fail to meet quality standards. This staking mechanism ensures commitment to the system’s integrity.
Last updated