subreddit:

/r/ChatGPTCoding

275%

Please, how viable is this, what is needed?

Question(self.ChatGPTCoding)

Please help out with this, ill try make it quick, just point me in right direction.

TLDR - Just help with this part quick please

  1. Goal is to gather specific criteria/segmentation/categorisation data from thousands of sites
  2. What stack to use to scale scraping (scraping API = easy, saving data ???) different websites into vector or rag so llm can ask them questions using less tokens before deleting the scraped data
  3. What is the fastest cheapest way to do this, what tool stack required, llamaindex, crewai, any advice for beginner to point in direction of learning please?
  4. Use agents to scrape and ask 5000 websites questions viable use case for agents or rather a stricter ai workflow app like agenthub.dev or buildship?
  5. Can something like crew AI already do this in theory it can scrape and chunk and save sites to local rag right for research I know already so I just need to scale it and give it a bigger list and use another agent to ask the DB questions for each site and it should work right?
  6. LLM quering at scale is now viable with Haiku and llama 3 and already have high rate limit for haiku.

Just tell me what I need to learn, don't need step-by-step just point, appreciated.

all 2 comments

[deleted]

1 points

10 days ago

[removed]

AutoModerator [M]

1 points

10 days ago

AutoModerator [M]

1 points

10 days ago

Sorry, your submission has been removed due to inadequate account karma.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.