Dual Search Engine Content Discovery | Google + Bing API + AI Analysis
## 📋 n8n Community Submission Package (Condensed)
### SEO-Optimized Title:
```
Dual Search Engine Content Discovery | Google + Bing API + AI Analysis + Knowledge Base Builder
```
### Workflow Description:
Automatically search Google and Bing, scrape web pages, extract clean content, and build a searchable knowledge base with AI-powered analysis
---
#### 🎯 What This Workflow Does
Searches both Google and Bing APIs with 34 pre-configured queries across 6 categories, fetches web pages, extracts and cleans content using Cheerio, calculates relevance scores, filters low-quality results, analyzes with AI, and stores high-value content in a searchable knowledge base with metadata and categorization.
#### ✨ Key Features
- Dual Search Engines: 50/50 split between Google Custom Search and Bing Search API v7
- 34 Pre-Configured Queries: Legal cases, industry news, technical docs, education, competitive intel, local services
- Intelligent Content Extraction: Cheerio-based HTML parsing removes ads, navigation, and extracts main content
- Relevance Scoring: Keyword matching algorithm (8 terms) with 40% threshold
- AI Analysis: Optional content analysis for insights and summaries
- Multi-Storage: Knowledge base + content library + specialized legal processing
- Rate Limited: 3-second delays for responsible scraping
- Cost Optimized: Stays within free API tiers (Google: 100/day, Bing: 1000/month)
#### 📚 Perfect For
Knowledge base building, competitive intelligence, market research, content curation, SEO research, industry monitoring, legal research, trend analysis
#### 🚀 Setup Requirements
- n8n instance
- Google Custom Search API (free tier: 100/day) + Search Engine ID
- Bing Search API v7 (free tier: 1000/month)
- Database (Supabase/PostgreSQL/MySQL)
- Optional: AI API for content analysis (Claude/GPT-4)
- Cheerio library (included in n8n)
#### 🔧 What's Included
- Complete workflow JSON
- 13 detailed sticky notes
- 34 pre-configured search queries (6 categories)
- Google and Bing API setup guides
- Database schema examples
- Content extraction and cleaning logic
- Relevance scoring algorithm
- Optimization strategies
#### 🎨 Customization Options
- Modify search queries for your industry
- Adjust relevance threshold (default: 0.4)
- Change schedule frequency (default: 12 hours)
- Add/remove content categories
- Customize keyword lists for scoring
- Add domain whitelists/blacklists
- Adjust rate limiting delays
- Enable/disable storage paths
#### 🔍 How It Works
1. Every 12 hours, generate 34 search queries
2. Route queries to Google (17) or Bing (17) APIs
3. Parse and normalize search results (10 per query = 340 URLs)
4. Fetch web page HTML with 20-second timeout
5. Extract main content using Cheerio, remove unwanted elements
6. Calculate relevance score using keyword matching
7. Filter content (≥40% relevance = ~200-250 pages pass)
8. Optionally analyze with AI
9. Store in knowledge base with metadata
10. Special processing for legal category content
#### 📊 Expected Performance
- Per Run: 34 queries, ~340 URLs found, ~200-250 pass filter
- Daily: 68 searches, ~400-500 quality articles stored
- Monthly: ~12,000-15,000 articles added to knowledge base
- API Costs: $0/month (within free tiers)
- Runtime: ~20-25 minutes per execution
- Storage Growth: ~500-750MB/month
#### 💰 Cost Analysis
Google Custom Search API:
- Free tier: 100 searches/day
- This workflow: 17/run = 34/day
- Well within free tier
Bing Search API v7:
- Free tier: 1,000 searches/month
- This workflow: 17/run = 34/day = ~1,020/month
- Slightly over free tier: ~$0.14/month
Total monthly cost: ~$0-0.20 (essentially free!)
#### 🏷️ Tags
`web-scraping` `google-search-api` `bing-search-api` `content-discovery` `knowledge-base` `cheerio` `html-parsing` `ai-content-analysis` `competitive-intelligence` `market-research` `automated-research` dual-search
---
Version: 2.0
Difficulty: Intermediate
Setup Time: 45-60 minutes
Requires: Google API, Bing API, Database
---
Pro Tips:
- Start with just Google API to simplify setup
- Test with 5-10 queries before running all 34
- Monitor API usage daily for first week
- Adjust relevance threshold based on your content quality needs
- Add domain blacklist for known spam sites
- Use the legal content path only if needed