Reference for the scrape
command in Genie, the AI-powered CLI companion.
scrape
command allows you to extract content from web pages by specifying HTML elements. This is useful for gathering data from websites, such as headings, paragraphs, or links, and saving the results to a file for further analysis.
-e, --element string
: Specifies the HTML element to extract from the web page. You can target specific elements like h1, p, a, etc. For example, if you want to extract all paragraph tags, you would use -e p.
-l, --limit int
: Sets the maximum number of pages to scrape. If you want to limit the scraping to a certain number of pages, specify this flag. For instance, -l 5 would scrape only the first five pages.
-o, --output string
: Defines the file path where the scraped data will be saved. This is important if you want to store the results for later use. Example: -o output.txt.
-p, --pagination string
: CSS selector for pagination links, allowing the scraper to navigate through multiple pages automatically. Use this if the website has a “Next” button or similar pagination control. For example, -p “.next-page”.