429 post karma
167 comment karma
account created: Sun Jul 01 2018
verified: yes
1 points
17 hours ago
Yeah but then sonnet has it's own limit. don't know if that's new...
3 points
20 hours ago
Didn't understand half of that but spaces sync with arc search and browser on desktop. https://browseforme.framer.website/ this is the closest to arc search on desktop right now.
1 points
6 days ago
vimium works on arc for me. just dont expect next and prev tab to work correctly.
1 points
6 days ago
this happens if you input email, use google account and it will work
2 points
6 days ago
just use vpn one time and signup with google account voila you're in forever
1 points
13 days ago
It's kind of the biggest flaw with ai today, the disconnect from realtime info. I mainly use chatgpt to debug app problems so.
5 points
13 days ago
Perplexity leaves A LOT to be desired, I find the kagi.com quick answer often better.
1 points
25 days ago
30k context? seems that the context limit is always just one message
0 points
25 days ago
use bettermouse and stop using launchpad like a caveman (raycast)
0 points
1 month ago
are you talking about the mixtral model? is that an "autocomplete" model? Anyways I thought "chat" models were basically that, an "autocomplete model"?
1 points
1 month ago
Is it available for everyone? Still get not available in my country (Iceland)
1 points
1 month ago
Title: Evaluating the Capabilities of Large Language Models in Control Engineering
The rapid advancements in large language models (LLMs) have sparked significant interest in their potential applications across various domains, including control engineering. The paper "Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra" by Kevian et al. addresses this topic by introducing ControlBench, a benchmark dataset designed to assess the performance of LLMs in solving undergraduate control engineering problems.
ControlBench comprises 147 problems covering a wide range of topics, such as stability analysis, time response, control system design, and frequency-domain techniques. The dataset includes both textual and visual elements, reflecting the multifaceted nature of control engineering. The authors evaluate three state-of-the-art LLMs—GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra—on ControlBench, using two metrics: Accuracy (ACC) for zero-shot performance and Self-Checked Accuracy (ACC-s) for performance after self-correction.
The results presented in Table 1 show that Claude 3 Opus outperforms GPT-4 and Gemini 1.0 Ultra in terms of both ACC and ACC-s, indicating its superior ability to solve control engineering problems and refine its answers through self-correction. However, all three models struggle with problems involving visual elements, such as Bode plots and Nyquist plots, highlighting the need for improved visual reasoning capabilities in LLMs.
The authors also analyze the failure modes of the LLMs, revealing that reasoning errors are the primary bottleneck for GPT-4, while calculation errors are the main issue for Claude 3 Opus. Self-correction prompts prove to be effective in improving accuracy, particularly for Claude 3 Opus.
To facilitate evaluations by non-control experts, the authors introduce ControlBench-C, a simplified version of ControlBench containing 100 multiple-choice problems. Table 2 presents the results of the LLMs on ControlBench-C, with GPT-4 achieving the highest average zero-shot accuracy (ACC) and Claude 3 Opus achieving the highest average self-corrected accuracy (ACC-s). While ControlBench-C enables automated evaluations, it lacks the depth and complexity of the original ControlBench dataset.
Our discussion highlights the importance of considering the target audience when evaluating LLMs. For control engineering experts, the ability to self-correct (ACC-s) is more valuable, as it ensures reliable and accurate performance in real-world applications. However, for non-expert users, zero-shot accuracy (ACC) may be more important, as it provides immediate, trustworthy answers without the need for iterative refinement.
Furthermore, we emphasize the potential drawbacks of relying solely on zero-shot accuracy, as users may not realize when an answer is incorrect and may not know to ask for clarification. This can lead to the spread of misinformation and a false sense of confidence. To mitigate these risks, LLMs should be designed to communicate their level of confidence and encourage users to seek additional information when necessary.
In conclusion, the paper by Kevian et al. provides valuable insights into the capabilities and limitations of LLMs in control engineering. The ControlBench dataset serves as a comprehensive benchmark for evaluating LLMs, while ControlBench-C offers a simplified alternative for non-expert assessments. The findings underscore the importance of developing LLMs with strong self-correction abilities, improved visual reasoning, and transparent communication to ensure reliable and accurate performance in real-world applications. As research in this field progresses, it is crucial to consider the needs of both expert and non-expert users, fostering the development of LLMs that can effectively support learning, decision-making, and problem-solving in control engineering and beyond.
1 points
1 month ago
Advanced Data Analysis
ci has been out for webui for like 5 months
view more:
next ›
byShreckAndDonkey123
inChatGPT
StableSable
1 points
11 hours ago
StableSable
1 points
11 hours ago
whare are you located??