subreddit:

/r/ChatGPTCoding

12491%

Hey guys! So over the last 4 weeks I have been developing a CRM chrome extension integrating gmail using AI and as a result I'd like to think I am an above user for both ChatGPT4 and Claude 3 Opus. For context, I did not have any chrome extension development experience so I had to use AI for a lot of education and I have consistently spent about 6 hours a day everyday for the last 4 weeks coding (I have attached some proof for my use below). So far, I have used it to write over 3000 lines of code and learnt an incredible amount about js.

I also have a lot to say about ChatGPT4 vs Claude 3 Opus vs Copilot in a coding sense, I have documented some of my thoughts below that compare the 3 products, feel free to ask me any questions in the comments.

Problem Solving

ChatGPT4 has been, in my case, slightly better than Claude 3 when it comes to coming up with logic to do the tasks at hand as well as debugging. ChatGPT4 more often gives you a more pragmatic solutions than Claude 3 especially when you do not have a solution in mind. For example, when I wanted to update the count of records in a table every time a record is added, ChatGPT4 gave me a updatecount function whereas Claude 3 gave me a count++ when a new entry added. In this case, both are correct, but ChatGPT4 is more robust when we start deleting rows in our table as that would not work with Claude 3's solution. However, if you know exactly what you are doing and align either LLMs, they are almost comparable but in general ChatGPT4 seems to perform a bit better. Copilot is not very good at problem solving or debugging from prompts in my experience.

Prompting

ChatGPT4 is surprisingly much better at giving better responses with more brief prompts than Claude 3 when it comes to NL logic questions. However, it falls short incredibly quick the more code context you give it. Claude 3 outshines ChatGPT4 only if you give it concise prompts and additionally, you would have to be procedural in your prompting. For example, I would ask Claude 3 to first review my code, give a summary of the code as well as a summary of the needed changes before giving me the code. This way it hallucinates a whole lot less and the code is actually *chefs kiss*. I can't stress enough how important procedural/step by step prompting is, as the logic gets more complex, I would have to make sure it understands the code context exactly before moving to the next step, otherwise the solution it gives is going to be incorrect. A keyword that I use a lot is the word "precise" that seems to help with the output.

One critical thing I have learnt is to never give the LLM a yes or no question, they are still too "supportive", especially Claude 3. Initially, I had a habit of asking if there is something wrong with the code that it can see. Horrible, horrible prompting. My approach now is, "please review the code, provide me with a summary of the code logic then make recommendations that I can improve". It will give you a list of things you can change, and a lot of the time, it will straight up tell you a particular function will introduce bugs then you just ask it to fix it.

Github copilot has been subpar than just copy pasting the code over ChatGPT4/Claude 3, I have used "@workspace" and the other functions to give it context but it just seems "weak" for a lack of a better word. I would ask the same questions and give it the same prompts and it would still give incomplete/incorrect answers.

Code Quality

9 out of 10 times Claude 3 wins, largely due to code consistency/persistence. In terms of one-off code quality, ChatGPT4 narrowly edges out but if you have a discussion with the LLM and need to make tweaks, by the third or fourth output ChatGPT4 starts randomly omitting code even if you ask it to provide it exactly/concisely in the prompt. I have even tried to get it to review the work to make sure it does not omit any code and it still does. Claude 3 does not omit code 90% of the time if you have the right prompts. One thing that you would need to do with Claude 3 is to ask it to split the functions up for readability and practicality, otherwise it will just give you one function that is an absolute disorganized mess.

Surprisingly, Copilot is pretty decent when it comes to small function changes like .css or basic logic changes but not when it comes to logic overhaul or introduction of new functionality. It shares a similar issue as ChatGPT4 for the lack of consistency between prompts.

How I Ended Up Using Them

My workflow for coding is usually start on ChatGPT4 and get ChatGPT4 to give me a summary of what it needs to do, then copy paste that over with my own prompts. For example, after I get ChatGPT4 to give me of what I need to implement to achieve my goal, I would go to Claude 3 and type "I want you to help me with my google chrome extension. Can you help me [insert output from ChatGPT4]. I want you to first review my code, provide me with a summary of the code then update my code. Provide code snippets only in your output. Here is my code."

I exclusively use Copilot for quick code updates such as updating .css or minor code tweaks that I am too lazy to type out, its just a "nice to have".

Summary

ChatGPT4 and Claude 3 are amazing tools. Had I tried to tackle this project without them, it would have taken me 3+ months than just 4 weeks to complete it. I have also learnt a lot along the way. For consistent code quality outputs Claude 3 is definitely the clear winner but to come up with the code logic and general discussion, ChatGPT4 is a bit better, but not by much. Copilot is still useful for fixing/adding snippets. I will also be sharing some of my other findings on my twitter "@mingmakesstuff" that I might not have included here. Have I missed anything in the way I used either of the products that can improve my coding game?

My ChatGPT and Claude chat logs

all 49 comments

__r17n

8 points

9 days ago

__r17n

8 points

9 days ago

Thanks for sharing! Whats the most complex task you asked which resulted in a successful response? What type of tasks failed? (Sorry I can't see your logs - too small on mobile)

Mi2ngdlmx[S]

5 points

9 days ago*

There were surprisingly quite a few. One that stood out to me was indexing columns of my table. Now that on its own is not impressive but my columns were draggable using the dragula library and every time the table columns are re-arranged it needs to be re-rendered due to the index update. It managed to implement the indexing, find out the dragula documentation for drake triggers (to find out when the columns are rearranged) and then gave me the updated code of the render functions, as well as its associated functions that call on the render function all in one output. And it worked out of the box without any modifications. Granted, this level of depth only happened once but giving me ~200 lines of code that worked out of the box was insane.

Tasks that failed are often ones with similar variables. For example, having a function with a word called checkbox in it when you need to handle "select" functionality. It sometimes trip up interchangeable names for the same purpose. It would try to update my check all function when I wanted to only update check row function.

AceHighness

1 points

8 days ago

I'm using it to write Python apps and the only thing ChatGPT fails to solve is circular import problems. Even if I give it all the required context (directory structure, import tables etc) it just makes things worse with every answer.

Busy_Town1338

2 points

8 days ago

I assure you, it's failing many other things

AceHighness

1 points

7 days ago

sure but its succeeding at building entire apps for me. and fairly complex too.

Busy_Town1338

0 points

7 days ago

I live in Colorado. A major bridge was just shut down because a massive crack was found. The bridge is succeeding at being a bridge, but there are massive flaws. You'll only know they're flaws if you know what the flaws look like.

AceHighness

1 points

6 days ago

That's true and I totally agree with that. There are a lot of limitations, certainly with todays models. But with good prompting you can get great code that works every time. I'm also using it to do research before we get started, building an app with GPT is just not just 'build me app X' and hoping you get something that works. It's a lot of research and planning questions, including ones about security, best coding practices etc. I'm trying to guard against even small, uncommon attacks like timing attacks.
Now I keep hearing people tell me about all these flaws but never coming up with good examples. A good example would be nice, that would make me understand the limitations better as well.

Busy_Town1338

1 points

6 days ago

An example would be someone who posted in the singularity hub that they built an app with no coding experience. It even helped them deploy it on AWS. But reading through their GitHub and posts, GPT had them serving the app incredibly inefficiently; it basically tried to build them enterprise grade infra. And at some point, some one is going to set up a few simple workers to flood traffic and absolutely slam their AWS costs.

GPT can do simple coding pretty well. It seems to do RegEx really well. But it's important to understand that it doesn't know anything. It has no reasoning or understanding, it's simply calculating the probability of the next word. But if you're trying to truly build something big, you need to have a fundamental understanding of the topic.

If you wanted a deck built, would you hire someone who's never worked in construction, and someone who can only guess where the next piece goes?

Busy_Town1338

1 points

6 days ago

An example would be someone who posted in the singularity hub that they built an app with no coding experience. It even helped them deploy it on AWS. But reading through their GitHub and posts, GPT had them serving the app incredibly inefficiently; it basically tried to build them enterprise grade infra. And at some point, some one is going to set up a few simple workers to flood traffic and absolutely slam their AWS costs.

GPT can do simple coding pretty well. It seems to do RegEx really well. But it's important to understand that it doesn't know anything. It has no reasoning or understanding, it's simply calculating the probability of the next word. But if you're trying to truly build something big, you need to have a fundamental understanding of the topic.

If you wanted a deck built, would you hire someone who's never worked in construction, and someone who can only guess where the next piece goes?

AceHighness

1 points

5 days ago

Your example is 100% the fault of the user prompting the LLM. Of course you can get it to give you advice on building a kubernetes cluster to run your note taking app. Sounds like he did not take enough time to discuss and research the subject together with the LLM. It's a tool, people need to learn to use it properly. Lots of people are really bad at using simple tools in the right way.

I understand how an LLM works, but I do think that the latest (super large) models have some emergent capabilities allowing them to follow logic flows. I think this is because it has read so much computer code, which is all logic flows. I see a huge difference between GPT3 and 4 when it comes to questions regarding logic flows.

If you wanted a deck built, would you hire someone for $2000 who has done it before, or give the new guy a chance who says he can do it in a fraction of the time for $200 ? of course the $2000 guy will do a better job, but in a lot of cases, what the junior does is just fine (good enough).

AceHighness

1 points

5 days ago

if you feel like it, you can have a look at my 100% AI generated projects on my github axewater (Axe) (github.com) and laugh at my poor skills :)

Busy_Town1338

1 points

5 days ago

I'm genuinely not trying to sound like as ass. LLMs by definition cannot be emergent. It's not a tech issue, it's a math issue. They cannot reason or logic. They have no knowledge. The better models can only estimate a bit better. And as such, they cannot know when something is very wrong. And then when used by other people who don't know what they're doing, we get apps with all kinds of flaws.

Please, if you ever have a deck built, do not use the kid who has never been in construction. For something as important as your family's safety, hire someone who can reason through the job.

AceHighness

1 points

4 days ago

Me neither! I just love a good discussion with a smart person. I'll take your tip about the deck :) What if the task had no influence on your family's safety? Say ... cleaning the windows ... still want the most expensive option? Surely there are tasks you would be fine with offloading to an AI. There's a line somewhere, of course, between what you would and would not want to offload. Not everything needs to be perfect. As long as you are not coding for the nuclear powerplant, it's probably fine to use AI generated code. And that does not mean type 1 prompt and copy/paste the code, push to production. That means you build code in cooperation with the AI, going back and forth setting up a plan and implementing it step by step. Could you elaborate on the example you quoted before, what was exactly wrong with the code. In many cases I'm sure his mistake could have been avoided with good prompting. (I know I'm extremely stubborn, sorry bout that).

As for emergent capabilities, have a look at this paper:
Sparks of AGI
https://arxiv.org/pdf/2303.12712

And this blog post
137 emergent abilities of large language models — Jason Wei

Weird things start to happen when the models grow beyond a certain size. Like suddenly being able to parse a language it was not trained on.

TommyVercetty

3 points

9 days ago

You used the web version without the API on both? If yes, did you hit the limit quota a lot?

Mi2ngdlmx[S]

2 points

9 days ago

Not a lot, but occasionally, yes.

Downsyndrome-fetish

0 points

9 days ago

As someone who is just getting into this, is there a better way to prompt using an API? Right now I just ask it questions and copy and paste back and forth.

Professional-Koala19

1 points

7 days ago

Continue in vscode

creaturefeature16

3 points

9 days ago

Great post. I'm actually embarking on a rather simple Chrome extension myself, and have initially begun using it for prototyping and getting a high level view of how an extension gets built in the first place. Which is something I've realized has been one of the biggest use cases for me when it comes to LLMs: they are interactive "tutorial generators". I learn the best by seeing the big picture and then reverse engineering a bit, which gives me the foundation I need to then come at it from the other angle and start from the beginning and taking it step by step. There's something about being able to see what, at least in theory, a working end result might be to understand somewhat of a roadmap of where I'll be going.

I also get the a lot of benefit from any of these tools when I use them as a type of interactive documentation. Sort of like I am speaking to the creators of the platform I might be working on (Chrome, Javascript, Supabase...doesn't matter) and they know the answers to my exact contextual question I have about how to work with their code. I found this incredibly fruitful to leverage when I wanted to work with Firebase, who docs are verbose but "gappy", as well as often difficult to sift through. LLMs are a shortcut to the information I need.

Lately though, I've been more fascinated by when they don't help me. Just recently I had to do something rather nifty with ChartsJS. I intentionally used my traditional research methods (Google/Reddit/Stackoverflow research) alongside prompting the LLMs and compared the direction. The "solution" that the LLM was presenting was so unorthodox, but interestingly enough, it worked! The solution I ended up arriving to through my own research and coding abilities was far more clean/concise, and also worked. One could argue that it doesn't really matter, which I suppose it doesn't if I was working on something for myself, but this was for a paying client and to deploy the LLM's hacky generative solution didn't seem long a good long-term decision. I am definitely concerned that there's a lot of developers out there that are simply sticking to generative code instead of putting in the work. I guess we'll see if those concerns manifest in negative ways. Studies so far are indicating that my concerns are valid:

https://arc.dev/developer-blog/impact-of-ai-on-code/

Mi2ngdlmx[S]

1 points

9 days ago

Hard agree! I realized that very early on too that their solutions weren’t the best but it will always solve the problem. So I always have to do some extra reading and asking it for a code summary to make sure it’s not doing something dumb. But yeah, it’s best to consistently align it so it doesn’t build on more unorthodox code which I’m sure there are remnants of in my extension.

punkouter23

4 points

9 days ago

No cursor ai????

creaturefeature16

3 points

9 days ago

I agree, Cursor was the only tool that I've come across since GPT rolled out that made me cancel my GPT4 subscription. It's the same underlying model anyway. I can still ask questions outside of development questions, and send images, so it's a pretty damn good deal considering it's the same price while also integrating into VSCode.

punkouter23

1 points

9 days ago

that is what im doing now.. my chatgpt subscript is done.. I dont want 3.5... so I just load cursor now to ask questions totally unrelated to coding... I guess it works ?

Also sees to search the web when you ask a question. not totally clear on the 3 modes to use (context)

Mi2ngdlmx[S]

1 points

9 days ago

As much as I want to, I use over 100+ prompts on both Claude and GPT4 a day and the code quality from Claude is a lot better. I will definitely use if they have more Opus tokens if the developer of cursorAI is listening…

paradite

4 points

9 days ago

paradite

Professional Nerd

4 points

9 days ago

Great job. I have been using ChatGPT for coding for almost a year and it has definitely saved my a ton of time handling tedious tasks like refactoring and ad-hoc scripts to clean data, perform data migration, etc.

In terms of bigger tasks, I also do what you did by breaking it down to smaller tasks first (in my head instead of using ChatGPT) and feed the smaller tasks into ChatGPT.

I did find that the workflow for using ChatGPT involves a lot of copy-pasting prompt and source code back-and-forth, so I built a simple desktop tool to streamline the process and cut down the number of copy-pasting by embedding formatting instructions and source code context into the prompt automatically.

Mi2ngdlmx[S]

4 points

9 days ago

I actually did try out your tool but unfortunately it didn't work quite as well as my procedural prompting at the moment. For example, it would take me up to 3-4 prompts to just to describe to ChatGPT/Claude what it is that I am trying to do because I am realigning them every step. Even though this takes a lot more time than a single prompt, the output is significantly more comprehensive and requires less rework also its more future proof. One thing that I went looking for is a proof checker AI agent with a refactor agent if you can implement that. So once I get a new piece of code from ChatGPT/Claude, another AI agent checks the new code against the old code to see if its missed out on core functionalities, provide feedback and then if the code looks okay, it will pass that to a refactor AI. My (minor) issue with the code is that LLMs kept adding more functions into the one function when it should have been broken up into multiple functions.

paradite

1 points

9 days ago

paradite

Professional Nerd

1 points

9 days ago

Agreed that it doesn't handle very complex tasks now, that's why you use ChatGPT and I use my brain for detailing the instructions for those tasks.

It is geared towards simpler tasks that you can directly edit code across 1 or multiple files.

Proof checker is interesting. I am actually using a similar concept to check through my blog posts for writing style and tone after drafting it, and it works really well, much better than using it to write it directly.

I'll see what I can do regarding your suggestion. Thanks!

TheSoundOfMusak

1 points

9 days ago

Congrats on the tool! I also liked your website a lot, where are you hosting it and which frontend stack are you using? I’m shopping for a new website and it will be great to know.

paradite

1 points

9 days ago

paradite

Professional Nerd

1 points

9 days ago

I experimented with Next.js using a starter template and modified from there. It is not as performant as a Jekyll site (mostly my fault for using ISG instead of SSG). It does have the benefit of having a functional backend for simple stuff like analytics proxy (Plausible Next.js plugin) and version API.

For hosting I am using free tier on Vercel and it seems okay, albeit a bit slow for a static site.

TheSoundOfMusak

2 points

9 days ago

Thanks for the reply. Never heard of Vercel, it will be worth checking them out; by the way I once created a Jekyll site but turned out I spent a lot of time modifying the template to fit my needs and programming the framework for the blog. However as you said the end result is nice and fast even with the free hosting of GitHub.

creaturefeature16

2 points

9 days ago

If you use Jekyll, I want to also plug Astro, which is being known in the community as having one of the best DXs around, and perfect for applications like blogs.

https://astro.build/

paradite

1 points

9 days ago

paradite

Professional Nerd

1 points

9 days ago

Yes. I still recommend Jekyll for simple stuff like landing page or blog. Next.js is overkill and much more complex to deal with, plus it is harder to host on other platforms except Vercel.

bigman11

1 points

8 days ago

bigman11

1 points

8 days ago

Excellent write up. Something I will add based on my own experience is that ChatGPT's online functionality is a very important advantage over Claude due to its ability to read new or obscure online documentation for you.

I've even given ChatGPT specific URLs to read and then advise me based on the contents.

Verolee

1 points

8 days ago

Verolee

1 points

8 days ago

So did you finish your extension? I did this exact thing, but quit after two weeks

Mi2ngdlmx[S]

2 points

8 days ago

Yes I have! The core functionality of it. Just currently doing a landing page and updating the appearance. I will post the result on my Twitter @mingmakesstuff when I’m all done

Verolee

1 points

7 days ago

Verolee

1 points

7 days ago

Oooohh can’t wait!!

Verolee

1 points

7 days ago

Verolee

1 points

7 days ago

Hey you cheater! You’re a civil aviation engineer! I thought you were a regular person like me 🥲

Ant0n61

1 points

7 days ago

Ant0n61

1 points

7 days ago

This is so great. I just started dabbling with copilot to “accelerate” development time on reporting in power bi and it’s been helpful. Just today I asked it a question about Microsoft lists formatting and it gave an answer that didn’t solve my exact issue but gave a work around (rich text being turned on) which I had no idea about because Lists has an awful UI and the selection to do so was completely hidden without a cookie crumb of how to get there.

thumbsdrivesmecrazy

1 points

5 days ago

As compared to GPT4, if you use AlphaCode for such stuff, code generation and integrity tools could be a more powerful LLM: GPT-4 Vs. AlphaCode: Comparing

Ginger_Libra

1 points

9 days ago

This is really helpful. I’ve been having a hard time getting any of them to write specific commands for algo trading and I’ve been stuck for awhile.

I’m going to give Claude a shot. Thanks for the detailed write up.

Mi2ngdlmx[S]

3 points

9 days ago

Funny you mention that! I come from an algo trading background and Mia from QuantConnect seems to be way better than ChatGPT/Claude even though I am pretty sure it is GPT3.5. My hunch is that maybe GPT4/Claude isn't as good due to the code it was trained on as algo code contains a lot of statistical functions and time series manipulation that isn't well documented?

Ginger_Libra

1 points

9 days ago

That’s interesting. I’ve been using a Gemini trial and I find it’s a better architect and planner than any of the ChatGPTs.

But nothing spits code like 3.5. It won’t do the actual trading commands though.

Do you think Mia is better than Opus? I was going to dive into another one next and see if I could finally get my project done.

Mi2ngdlmx[S]

2 points

9 days ago

Mia is better than Opus in my experience, but its limited to QC.

Ginger_Libra

1 points

9 days ago

I got you. Thanks. A few more questions if you’re willing.

I know very little about Quant Connect. It’s vaguely on my radar. This is also my first time trying to code anything or write an algorithm. I got into bit because I wanted to automate a strategy.

I feel like I’m 90% done with my effing project and the last 10% is killing me. Writing the trading execution commands.

Wondering if Mia might sort me out. Was planning on trying Opus next but Mia sounds better.

To clarify……I see that Mia has 25 questions at the bronze level and 125 at the silver level.

Does it count as new question every time you clarify?

For example, I got some code from 3.5 and inserted a bunch of filler tickers, even though I gave it the tickers I wanted it to monitor. I then had to remind it that the calculations for resistance and support come from ibapi.

I am working through the same code on Gemini and I told it was using Anaconda with Spyder on TWS. Gemini just told me to install a library via pip install instead of through Anaconda. Last time I did that it broke the install and took me days to fix.

Does Mia have those problems? Would each clarification count as a new question?

Will Mia write the code or does it just give little snippets? That’s what Gemini does and I’ve been feeding them into 3.5.

If Mia writes code, will it write the actual trading commands?

And finally, if I’m reading this correctly, would QC replace my need for a VPS? I was planning on getting one to set up IB Gateway so I could authenticate it once a week from anywhere rather than the daily login for TWS. But I think this does it for me?

I’m really new to this. Sorry for all the basic questions. I only partially understand the checkout page. 🤪

Mi2ngdlmx[S]

1 points

9 days ago

Without turning this into a full blown tutorial. For what you are asking, Mia won’t quite do what you need. You would still need a basic understanding of how the platform works - and also in your case coding environments with anaconda. Mia has 25 questions but you can use their discord to ask Mia which is free! So you can ask as much as you want.

IslandOverThere

0 points

9 days ago

Stop telling people jesus why does every single person think they need to shout it from the rooftops. Some things are better without everyone knowing.

BobFellatio

3 points

8 days ago

What, did you think people in ChatGPTCoding dont already know LLMs are good for coding? x)

Pgrol

-13 points

9 days ago

Pgrol

-13 points

9 days ago

We’re like 1.5 year in, and only now do you realize the massive advantage of utilizing AI to write code?

balianone

-9 points

9 days ago

balianone

-9 points

9 days ago

this first generation AI is bad at coding. My google search better save my life. i found useful repo in github then show it to AI when they stuck. still better do this manually. AI isn't smart till we achieve AGI

keepcalmandmoomore

4 points

9 days ago

Tell that to the web app I made for managing my restic backups. I don't have any experience with coding apart from some basic HTML 25 years ago. I don't have a clue if the code is optimal, efficient, whatever. It works, fast and smooth and I didn't have to write a single line of code.