Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

planning: ichigo framework overview #171

Open
PodsAreAllYouNeed opened this issue Jan 23, 2025 · 0 comments
Open

planning: ichigo framework overview #171

PodsAreAllYouNeed opened this issue Jan 23, 2025 · 0 comments
Assignees

Comments

@PodsAreAllYouNeed
Copy link

PodsAreAllYouNeed commented Jan 23, 2025

Overall Design Pattern

Image

For the speech portion of the framework, we need to design around these 4 primitive models and the quantizer. Our research strategy will also be build around this and the package will thus also follow this structure.

This is not currently how things are designed in existing packages, for example Whisper is a single model. To fit within our framework, we need to break it into Whisper Encoder = s2r and Whisper Decoder = r2t

In implementation, examples for how this will work is:

ASR task
transcription = r2t(s2r(audio))

TTS
speech = r2s(t2r(text))

How this will be implemented

r2t, t2r, r2s, s2r will all inhering nn.module, and implement forward methods that do exactly what their name says. They are generic classes, and the specific implementation should be handled through a yaml config.

ASR, TTS, Ichigo, are pipeline implementations of the fundamental models, and they will also be handled via a yaml config.

There can be built-in configs to achieve one-line implementations, such as for IchigoASR, but users should also be able to just define their own custom config.

Related work

This is fundamentally just extending the ideas of the framework invented by WhisperSpeech

@PodsAreAllYouNeed PodsAreAllYouNeed self-assigned this Jan 23, 2025
@github-project-automation github-project-automation bot moved this to Investigating in Menlo Jan 23, 2025
@tuanlda78202 tuanlda78202 self-assigned this Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Investigating
Development

No branches or pull requests

2 participants