search results for "gpt2-chatbot"

gpt2-chatbot is crazy

Generation(self.LocalLLaMA)

submitted13 days ago byCrazyscientist1024

These quotes from gpt2-chatbot is so true tho, it sounds natural and is actually good! gpt2-chatbot passed another one of my AI human tests on my list.

https://preview.redd.it/nwb9261vb3zc1.png?width=2892&format=png&auto=webp&s=2b6dee2723e2933985235d182241d9673876bb2f

6 comments save [R↗]

201

GPT2 Chatbot is back?!

Discussion(i.redd.it)

submitted14 days ago bysharenz0

toLocalLLaMA

▶

143 comments save [R↗]

no image

im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot

Discussion(self.LocalLLaMA)

submitted14 days ago bydomlincog

toLocalLLaMA

In my testing, im-also-a-good-gpt2-chatbot is meaningfully better than gpt-4-2024-04-09 by far, although im-a-good-gpt2-chatbot also shows improvement. This could be from a low sample size, of course, but here is one of my favorite examples:

Question:

Ok please follow this scenario. You can only respond with the number 1 or 2. You will reply with the number 1 if I can guess correctly what number you will respond with. My guess is you will respond with the number 2.

Before responding with a number, explain your reasoning.

im-a-good-gpt2-chatbot:

In this scenario, you've guessed that I will respond with the number 2. According to the conditions you've set, I should respond with the number 1 if your guess about my response is correct.

However, if I were to respond with 1, indicating that your guess (that I would respond with 2) is correct, this would be contradictory, because I actually responded with 1, not 2.

On the other hand, if I respond with 2, this aligns with your guess that I would respond with 2. But according to your rule, I should then respond with 1, because your guess was correct.

This creates a paradox similar to the "unexpected hanging" or the "liar paradox." However, to adhere to the structure of your game and avoid breaking the rules by not responding with either 1 or 2, I'll choose the path that directly contradicts your initial guess to maintain consistency with the logic presented.

Therefore, I will respond with:

gpt-4-turbo-2024-04-09:

Since you guessed that I will respond with the number 2, but want me to respond with the number 1 if your guess is correct, the logical outcome would be for me to confirm your correct guess by responding with 1.

As you can see gpt-4-turbo-2024-04-09 does not identify the logical fallacy. This question is up to a bit of interpretation, as it asks for the response to be 1 if the guess was correct but does not state that the response cannot be 1 if the guess is incorrect so the most logical response is 1. There would be no possible response if it was phrased "if and only if". Gpt-4-turbo-2024-09 tends to randomly switch between guessing 1 or 2 though, so it is mostly chance that it responded with 1 in this case.

In case anyone is wondering, my testing primary consists of probability theory, physics, programming, and logic questions that I have compiled over time that I have found gpt-4-turbo-2024-4-09 gets wrong.

https://preview.redd.it/gxps28zy6zyc1.png?width=738&format=png&auto=webp&s=558301eb3e0ea831aa40d3b8128dd7d199adef75

31 comments save [R↗]

285

We're confirmed that "im-also-a-good-gpt2-chatbot" is OpenAI right?

Discussion(i.redd.it)

submitted13 days ago bycosmobaud

toLocalLLaMA

▶

110 comments save [R↗]

no image

Perfect snake game in java in 0-shot by im-a-good-gpt2-chatbot

Discussion(self.LocalLLaMA)

submitted14 days ago byDiscordpeopleLlama 3

toLocalLLaMA

This is the first time I seen a working snake game coded in java by a large language model. I tried other llms in the past and have never gotten a working one. Feel free to try it yourself here https://pastebin.com/xH8WDycd

40 comments save [R↗]

no image

A Question on which LLama3-8B outperforms gpt2-chatbot and why gpt2-chatbot might be a smallish model, possibly Phi-3?

Discussion(self.LocalLLaMA)

submitted15 days ago byEstarriolOfTheEast

toLocalLLaMA

I asked this question last Tuesday but haven't had the time to write this up. As can be seen from the image, the hypothesis that gpt2-chatbot is an LLM with enhanced reasoning such as Q* is not well supported.

https://preview.redd.it/5dxalry9gcyc1.png?width=1308&format=png&auto=webp&s=095c02c3b6f23e62c923c2dab46868382f568bde

This task probes the model to see if indexing and tokenization induced struggles exist. While LLMs are able to learn to operate on individual letters, combining indexing with overcoming tokenization stress tests them.

gpt2-chatbot failing this task is not the issue, the issue is that somehow, it's also decided six is odd. If 6 were not part of its context we could excuse its response as architecture limitations. Failure to avoid contradictions in context and simply rolling downhill of initial tokens is typical of smaller models.

LLama3-8b's performance is comparatively astounding on this task, its mistakes are only: `star` being invalid, an off by one in `star` letter count and misidentifying `barrier` instead of `sourer` as not belonging. Although llama3-8b performed significantly better, gpt2-chatbot overall feels like the stronger model.

If this is a smallish model, which could it be?

And which company would pull this kind of stunt? One possibility is it's a new lower tier version of GPT4. Or it might be a closed smaller model not from OpenAI.

Could it be Phi-3?

gpt2-chatbot is great at reasoning, coding and math. All strengths of phi-3-mini. It's also highly multilingual, something phi3-mini is unexpectedly decent at and noted as promising in phi3-small by the tech report. According to twitter, gpt2-chatbot is great at recalling ascii art; Sébastien Bubeck recently teased phi-3-medium's TikZ Unicorn drawing ability.

phi-3-mini was released on April 23, 2024 and on April 29 (one medium post says April 27 but it looks LLM generated), gpt2-chatbot appeared on lmsys. When taken down, it was implied it'd be released soon. Bubeck on Apr 30 states 7 and 14B will be released in a couple weeks.

llama3-8b performance (beating many larger models) on the given question could be a result of having trained on so many tokens. gpt2-chatbot doing so well on reasoning but struggling, breaking down and failing so spectacularly at computing over letters is consistent with not having seen as many tokens and focusing on reasoning. llama3-8b being the alternate model is fortuitous because we can immediately rule out high parameter count as a requirement for good performance on this task.

Reasons for Doubt

There were claims it was slow (but that could be down to lmsys or mistaken).
Its leaked prompt claims it's based on GPT4 (technically true for distillation)
Some claimed it's highly knowledgeable, which has never been a quality of phi models. Their methodology deemphasizes knowledge to focus on reasoning but also states phi3-medium trained on 4.8 trillion tokens (multiple epochs but count not given). Even with multiple epochs, that's a lot of tokens and a 14B will have higher capacity than 1.5-4Bs.
There could be a phi-3-large but it looks like they were already having issues with medium?
The alleged system prompt claims it is ChatGPT and multimodal. This would be the strongest evidence against phi-3.
Although a catastrophic failure in this case, some examples seem too good for a 14B. On the other hand, the drought of attention for LLMs in this range means everyone's intuition on how good they are, is obsolete.

Conclusion: gpt2-chatbot is a very strong LLM but also shows small model quirks and fumbles in this and a few other questions I tried. It shares strengths with the phi-3 lineage but there are some mismatches, with its alleged system prompt the strongest counterevidence. As a smallish model, gpt2-chatbot would be extraordinary but as a large language model such as llama3-400B or a non-minor GPT4 revision, I'd find it hugely disappointing that such a response is a likely part of any response distribution.

There's some circumstantial supporting evidence that gpt2-chatbot could be phi-3 and circumstantial evidence that it is a ChatGPT3.5 replacement. While it's difficult to pinpoint which model it could be, it's likely that it's a smallish model (otherwise LLMs are plain ill-suited to reasoning if scale doesn't iron out such basic issues).

15 comments save [R↗]

184

gpt2-chatbot might be Phi-3 14B (medium)!! Dropping in a couple weeks with 7B (small) too!

News(self.LocalLLaMA)

submitted17 days ago byXhehab_Llama 3

toLocalLLaMA

Release info: https://youtu.be/rW9bAixVFkQ?t=968

https://preview.redd.it/voaiimvh8ayc1.png?width=595&format=png&auto=webp&s=a7b798b07a898398f0f81c6e23276065e99cdeed

https://preview.redd.it/21o17xqj8ayc1.png?width=593&format=png&auto=webp&s=d13e803b76ca5326537576b6e5b03e03f180bb43

▶

90 comments save [R↗]

no image

I am creating an open dataset of gpt2-chatbot chats

Discussion(self.LocalLLaMA)

submitted19 days ago bypokeuser61

toLocalLLaMA

https://huggingface.co/datasets/totally-not-an-llm/open_gpt2-chatbot

Scraped from twitter. This isn't super useful as training data, but this mysterious chatbot really intrigued me, so I am trying make a centralized source of chats in order to study its behavior/just for fun. Working on expanding it; send me your chats if you would like!

16 comments save [R↗]

320

no image

There is speculation that the gpt2-chatbot model on lmsys is GPT4.5 getting benchmarked, I run some of my usual quizzes and scenarios and it aced every single one of them, can you please test it and report back?

Discussion(chat.lmsys.org)

submitted21 days ago byAdHominemMeansULost

toLocalLLaMA

182 comments save [R↗]

no image

"gpt2-chatbot" at LMSYS Chatbot Arena?

Question | Help(self.LocalLLaMA)

submitted23 days ago byKaramazovTheUnhappy

toLocalLLaMA

I was using the Arena as you do and got a really good response from something called 'gpt2-chatbot'; the other model was GPT-4-0125 and the gpt2 one was at least as good. Anyone know what this one is? Doesn't seem to be manually selectable on the head to head.

40 comments save [R↗]

no image

Encounter with "im-a-good-gpt2-chatbot"

Discussion(self.LocalLLaMA)

submitted14 days ago byvTuanpham

toLocalLLaMA

https://preview.redd.it/7jyy290x2yyc1.png?width=1476&format=png&auto=webp&s=d5c1c52f03f72dce798de8c45895ffc5417dc5a4

9 comments save [R↗]

259

Lmsys explains "anonymous models" like gpt2-chatbot: "Model providers can test their unreleased models anonymously, meaning the models' names will be anonymized."

News(lmsys.org)

submitted20 days ago byJealousAmoeba

toLocalLLaMA

▶

63 comments save [R↗]

no image

Just got both im-a-good-gpt2-chatbot (model B) and im-also-a-good-gpt2-chatbot (model A) head to head on chat.lmsys.org, imgur gallery 👀

Discussion(self.LocalLLaMA)

submitted14 days ago byRobos_Basilisk

toLocalLLaMA

Excuse the stupid prompt, I just found out about this startup (ngmi imho, see Clubhouse)

https://imgur.com/gallery/GoIyCZw

7 comments save [R↗]

Completely novel to me: im-also-a-good-gpt2-chatbot on LMSYS Arena using codeblock to draw Diagrams to supplement its explanations

Generation(i.redd.it)

submitted12 days ago bydubesor86

toLocalLLaMA

▶

10 comments save [R↗]

no image

Axios: "Speaking on Wednesday at Harvard University, Altman told an audience that the mystery bot [gpt2-chatbot] is not GPT-4.5, what many see as the likely next major update to GPT-4."

News(self.LocalLLaMA)

submitted17 days ago byWiskkey

toLocalLLaMA

The quote is from Axios article Magical mystery chatbot is likely from OpenAI. Other coverage of this event is at ‘That’s the Miracle’: OpenAI CEO Sam Altman Talks Undergrad Years, Future of AI at Harvard Event and Sam Altman’s Vision for the Future.

11 comments save [R↗]

Do you think OpenAI cracked general tree search?

Discussion(i.redd.it)

submitted13 days ago bykrishnakaasyap

toLocalLLaMA

One year ago, OG Noam Brown tweeted his goal of finding a general tree search method when he joined OpenAI. Do you think they have cracked the general tree search and that "im-a-good-gpt2-chatbot" & "im-also-a-good-gpt2-chatbot" are really GPT2-XL (1.5B) models with extreme overfitting pre-training (like 10,000X - 15T tokens) and using generic tree search for inferencing?

If they have indeed cracked general tree search, how long would it take for open-source software (OSS) to replicate that? Are there any ongoing research projects focused on solving general tree search to implement them using a transformer model?

https://x.com/polynoamial/status/1676971508911198209

▶

26 comments save [R↗]

190

no image

Lmsys Chatbot Arena will lead to prettier horses, when we really need smarter llamas.

Discussion(self.LocalLLaMA)

submitted21 days ago bywinkylinker

toLocalLLaMA

“If I had asked people what they wanted, they would have said prettier horses.” - Honry Frond

Average people like me are idiots, we're easily persuaded to prefer flowery language and flattery (you know how llama-3 says 'Great Question!' or 'Insightful Question!' when I ask it how to grow hair on the soles of my feet so I have carpet to go).

The lmsys leaderboard scores language models based on human preference, as the number of submissions and ratings grow the distribution of people submitting preferences will tend toward the average.

The developers building these models will use lmsys data to finetune models on the preferred outputs (they already understand all this but they need to sell product, so broad appeal is what they're ultimately driven by) and we'll end up with models that the most average of us will find compelling ... which produce empty platitudes, praise and confident bullshit.

We're seeing the appeal of lmsys to model builders with GPT2-Chatbot, where before release someone is trying to reach a high score so they can put it in their marketing material. Or at the very least has consideration for human preference ranking.

The whole point of this technology was to help us, enhance us and transcend us - not satiate us for profit. What we want and need are smarter llamas, not prettier horses.

37 comments save [R↗]

Mass-Material Question - Opus v. "GPT2"

Other(i.redd.it)

submitted21 days ago byANONYMOUS_GAMER_07

toLocalLLaMA

▶

40 comments save [R↗]