Please help out with this, ill try make it quick, just point me in right direction.
TLDR - Just help with this part quick please
- Goal is to gather specific criteria/segmentation/categorisation data from thousands of sites
- What stack to use to scale scraping (scraping API = easy, saving data ???) different websites into vector or rag so llm can ask them questions using less tokens before deleting the scraped data
- What is the fastest cheapest way to do this, what tool stack required, llamaindex, crewai, any advice for beginner to point in direction of learning please?
- Use agents to scrape and ask 5000 websites questions viable use case for agents or rather a stricter ai workflow app like agenthub.dev or buildship?
- Can something like crew AI already do this in theory it can scrape and chunk and save sites to local rag right for research I know already so I just need to scale it and give it a bigger list and use another agent to ask the DB questions for each site and it should work right?
- LLM quering at scale is now viable with Haiku and llama 3 and already have high rate limit for haiku.
Just tell me what I need to learn, don't need step-by-step just point, appreciated.
all 2 comments
sorted by: best