web_crawl#
- baf.utils.web_crawl.crawl_website(initial_url, max_depth=2, max_pages=20, format='markdown', base_url_prefix=None)[source]#
BFS crawler that collects URLs starting with base_url_prefix (if provided).
- Parameters:
initial_url – str, starting point of crawl
max_depth – int, maximum link depth
max_pages – int, maximum number of pages to crawl
format – ‘html’ or ‘markdown’
base_url_prefix – str, optional, only URLs starting with this prefix are included
- Returns:
A dictionary mapping each crawled URL to its content in the requested format (html or markdown).