AI Powered Dynamic Web Scraper
Extract structured data from web URLs, with custom columns, custom instructions and AI powered parsing
xlwings Lite: Dynamic Web Scraper Excel App
Transform Excel into a versatile web scraping platform that extracts structured data from multiple web pages using AI-powered analysis.
How xlwings Lite Works: The Python code is embedded directly inside the Excel file itself! Download the Excel file from the link below to get the complete app with all code included.
What This App Does
This xlwings Lite app turns Excel into a powerful web scraping platform:
- Extract structured data from multiple websites
- Process batches of URLs with user-defined column specifications
- Automatically detect and extract data elements using AI
- Work with custom extraction instructions to fine-tune results
- Format scraped data directly into Excel tables
- Handle progress reporting and error management
- Deploy for market research, competitor analysis, and data aggregation
How to Use
- Download the app and install xlwings Lite from the Add-in button in Excel.
- Set up the MASTER sheet:
- Enter your Jina API key and Google Gemini API key.
- Configure optional parameters like delays and retry settings.
- Define data columns in COLUMN_INPUTS:
- Specify column names and detailed descriptions.
- Add custom extraction instructions if needed.
- Add URLs in URL_LIST sheet: Enter the URLs you want to scrape, one per row.
- Run the macro: Use xlwings tab to run "scrape_urls_from_list".
- View results: Check the DATA sheet for your structured extracted data.
How It Works
The Dynamic Web Scraper operates through a multi-layer architecture:
Excel-Python Bridge:
- xlwings creates a bidirectional connection between Excel and Python.
- Configuration data is read from MASTER, COLUMN_INPUTS, and URL_LIST sheets.
- Results are written back to the DATA sheet as structured tables.
Web Scraping Layer:
- Jina AI API handles the rendering of webpages.
- Fully rendered page content is returned in markdown format.
- Sequential processing respects rate limits with configurable delays.
AI-Powered Data Extraction:
- Google's Gemini large language model analyzes the scraped content.
- A specialized prompt combines webpage content with column definitions.
- AI intelligently extracts relevant data based on user specifications.
- JSON responses are parsed into structured Excel tables.
Practical Applications:
- Market research across multiple e-commerce sites
- Competitor analysis and price monitoring
- Building contact lists from business directories
- Sports statistics and performance data collection
- Financial data aggregation from multiple sources
- News and content monitoring from various websites
Source Code & Resources
Source Code Location: All Python source code is embedded inside the Excel file! You can download the Excel file with embedded code from the links above.
- xlwings Lite: Official xlwings Lite website with installation instructions and examples.
- xlwings Documentation: Comprehensive documentation with Excel object reference and API documentation.
- Jina AI Web Scraping API: Dashboard for the Jina AI web scraping API used in this app.
- Google Gemini API: Documentation for the Google Gemini API used for AI-powered data extraction.
Created by Felix Zumstein, xlwings Lite delivers a powerful and flexible solution for integrating Python with Excel - enabling native Excel support for databases, AI agents, LLMs, advanced analytics, machine learning, APIs, web services, and complete automation workflows.