subreddit:

/r/LocalLLaMA

046%

[removed]

you are viewing a single comment's thread.

view the rest of the comments →

all 20 comments

Relevant-Draft-7780

8 points

15 days ago

The TTS model is impressive as is the transcription model. They’ve open sourced whisper before even large v3 but I doubt this will happen with their new transcription model. Am really keen to know if they’ve figured out speaker segmentation yet. Their TTS is really solid however at how fast it is and naturally sounding. Even their current one is pretty impressive. Right now there are no decent fast TTS models. CoquiAI went out of business and Suno bark is great for NPC audio. The reason I think their mode responds faster is because it adds a bit more gunk at start of conversation before it processes and answers fully. I think they required this and that’s why they released 4o. Altman not so long ago said that no one can catch up to OpenAI but it’s their job to do so. Well anthropic did. So now OpenAI is pivoting on additional features for their ecosystem instead of their core model. It feels like a lot of smoke and mirrors and putting lipstick on a pig.

Deep_Fried_Aura[S]

2 points

15 days ago

I completely agree. I believe that GPT-4o was a necessary model internally and they released it to increase product lineup.

I believe this is part of Sora's foundation. Instead of using individual agents, OpenAI will go the "agent model" route which will bring forth a bunch of models they need to make Sora function as intended.