X (Twitter) Bulk Scraper/Monitor/Alerts + Vision
DeprecatedPricing
Pay per usage
X (Twitter) Bulk Scraper/Monitor/Alerts + Vision
DeprecatedMonitor X (formerly Twitter) for specific content. Extract data, monitor, and optionally run image-based alerts using cloud vision APIs. Perfect for brand reputation management, tracking tweets, hashtags, specific images, and user activity.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
9
Total users
2
Monthly active users
5 months ago
Last modified
Categories
Share
X (Twitter) Bulk Scrape/Monitor + Vision AI
Monitor X/Twitter accounts, extract tweets, filter by keywords/hashtags, and run AI vision analysis on images using 6 different AI providers.
π Apify Actor
π Node.js
LICENSE
π― Overview
This Apify Actor scrapes X (formerly Twitter) posts from multiple accounts, filters by keywords/hashtags, and optionally runs AI vision analysis on images to detect objects, brands, or custom content patterns. Perfect for social media monitoring, brand tracking, and competitive intelligence.
β¨ Key Features
Core Scraping
- π Hyperdrive Mode: Lightning-fast RSS-based scraping with automatic fallback to web scraping
- π₯ Bulk Processing: Monitor up to 100 Twitter accounts simultaneously
- π Smart Filtering: Filter by keywords, hashtags, or require images
- π Dual Datasets: Separate outputs for tweets and vision alerts
- π Automatic Retry: Robust error handling with multiple Nitter instance fallbacks
AI Vision Analysis (Optional)
Analyze tweet images using 6 industry-leading AI providers:
- π€ Google Gemini 2.0 Flash - Latest multimodal AI with base64 encoding
- π¨ OpenAI GPT-4o Vision - Advanced image understanding and analysis
- ποΈ Google Cloud Vision - Label detection, OCR, safe search, object localization
- βοΈ Azure Computer Vision - Tags, objects, brands, faces, adult content detection
- πΈ AWS Rekognition - Label detection and content moderation
- π Custom Webhooks - Integrate your own vision API
Alert System
- π Webhook Notifications: Get instant alerts when vision pipelines trigger
- π― Flexible Configuration: Per-pipeline or global webhook URLs
- π Confidence Scoring: Filter alerts by AI confidence thresholds
- π·οΈ Label Matching: Trigger on specific detected objects or keywords
π₯ Input Configuration
Basic Example
{"usernames":["apify","openai"],"maxItems":100,"preferRss":true}
Complete Example with Vision Analysis
{"usernames":["apify","elonmusk","openai"],"searchTerms":["AI","automation","web scraping"],"hashtags":["webscraping","machinelearning"],"maxItems":500,"preferRss":true,"requireImages":false,"rssTimeoutSecs":10,"visionPipelines":[{"name":"Product Launch Detector","provider":"gemini_vision","enabled":true,"configJson":"{\"prompt\":\"Is this a product launch or announcement?\",\"triggerKeywords\":[\"launch\",\"new\",\"announcement\"],\"model\":\"gemini-2.0-flash-exp\"}","alertWebhookUrl":"https://your-webhook.com/product-alerts"},{"name":"Brand Monitor","provider":"openai_vision","enabled":true,"configJson":"{\"prompt\":\"Identify brands and logos\",\"triggerKeywords\":[\"Tesla\",\"Apple\",\"Nike\"],\"model\":\"gpt-4o\"}","alertWebhookUrl":""},{"name":"Object Detector","provider":"google_vision","enabled":true,"configJson":"{\"threshold\":0.8,\"triggerLabels\":[\"car\",\"vehicle\"],\"maxLabels\":10}"}]}
Input Fields
| Field | Type | Required | Description |
|---|---|---|---|
usernames | array | β Yes | X/Twitter usernames to monitor (without @ symbol) |
searchTerms | array | No | Filter tweets containing these keywords |
hashtags | array | No | Filter tweets containing these hashtags |
maxItems | integer | No | Maximum tweets to collect (default: 100) |
preferRss | boolean | No | Use RSS scraping first (default: true) |
requireImages | boolean | No | Only collect tweets with images (default: false) |
rssTimeoutSecs | integer | No | RSS fetch timeout in seconds (default: 10) |
visionPipelines | array | No | AI vision analysis configuration |
Vision Pipeline Configuration
Each pipeline in visionPipelines array:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | β Yes | Descriptive name for the pipeline |
provider | string | β Yes | AI provider: gemini_vision, openai_vision, google_vision, azure_cv, aws_rekognition, custom_webhook |
enabled | boolean | No | Enable/disable this pipeline (default: true) |
configJson | string | No | Provider-specific configuration as JSON string |
alertWebhookUrl | string | No | Webhook URL for alerts (overrides env var) |
Provider-Specific Configuration
Gemini Vision
{"prompt":"Describe what you see in detail","triggerKeywords":["product","launch"],"model":"gemini-2.0-flash-exp"}
OpenAI Vision
{"prompt":"Identify brands and logos","triggerKeywords":["Nike","Apple"],"model":"gpt-4o","maxTokens":500}
Google Cloud Vision
{"threshold":0.8,"triggerLabels":["car","vehicle"],"maxLabels":10}
Azure Computer Vision
{"minConfidence":0.7,"targetTags":["car","person"],"blockAdult":false}
AWS Rekognition
{"minConfidence":0.7,"targetLabels":["Car","Person"],"blockUnsafe":true}
Custom Webhook
{"webhookUrl":"https://your-api.com/analyze","timeout":20000,"headers":{"Authorization":"Bearer YOUR_TOKEN"}}
π€ Output
Main Dataset (Tweets)
Each scraped tweet contains:
{"title":"Check out our new Actor for web scraping!","link":"https://x.com/apify/status/1234567890","author":"apify","published":"2026-01-15T10:30:00Z","description":"Check out our new Actor...","tags":["#webscraping","#automation"],"imageUrl":"https://pbs.twimg.com/media/abc123.jpg","visionAlertsCount":2,"scrapedUsername":"apify","collectedAt":"2026-01-15T10:35:00Z","sourceType":"rss","instance":"nitter.net"}
Alerts Dataset (Vision Triggers)
Each triggered alert contains:
{"pipelineName":"Product Launch Detector","provider":"gemini_vision","itemLink":"https://x.com/apify/status/1234567890","imageUrl":"https://pbs.twimg.com/media/abc123.jpg","labels":[{"name":"product","score":0.95},{"name":"announcement","score":0.88}],"score":0.95,"analysis":"This image shows a new product launch announcement...","triggeredAt":"2026-01-15T10:35:00Z"}
Output Views
The Actor provides multiple pre-configured output views:
- tweets - Full dataset JSON
- tweetsTable - Simplified table view
- tweetsCSV - CSV export
- tweetsWithImages - Images only
- visionAlerts - All vision alerts
- visionAlertsTable - Simplified alerts view
- visionAlertsCSV - Alerts CSV export
- highConfidenceAlerts - 90%+ confidence only
- runStats - Actor run statistics
π Environment Variables
Configure AI providers via environment variables in the Actor settings:
Required (if using vision analysis)
| Variable | Description | Example |
|---|---|---|
OPENAI_API_KEY | OpenAI API key for GPT-4o Vision | sk-... |
GEMINI_API_KEY | Google Gemini API key | AIza... |
GOOGLE_APPLICATION_CREDENTIALS | Google Cloud credentials JSON | {"type":"service_account",...} |
AZURE_CV_ENDPOINT | Azure Computer Vision endpoint | https://your-resource.cognitiveservices.azure.com/ |
AZURE_CV_KEY | Azure Computer Vision API key | abc123... |
AWS_ACCESS_KEY_ID | AWS access key for Rekognition | AKIA... |
AWS_SECRET_ACCESS_KEY | AWS secret key | abc123... |
AWS_REGION | AWS region (optional) | us-east-1 (default) |
Optional
| Variable | Description |
|---|---|
ALERT_WEBHOOK_URL | Global webhook URL for all alerts |
WEBHOOK_<PIPELINE_NAME> | Pipeline-specific webhook (e.g., WEBHOOK_PRODUCT_DETECTOR) |
Setting Environment Variables
Via Apify Console:
- Go to your Actor β Settings β Environment variables
- Click "Add variable"
- Enter name and value
- Check "Secret" for sensitive data
Via .actor/actor.json:
{"environmentVariables":{"OPENAI_API_KEY":"@openai-key","GEMINI_API_KEY":"@gemini-key"}}
Note: Use @secret-name syntax to reference Apify secrets.
π― Use Cases
1. Brand Monitoring
Monitor brand mentions and visual content across competitor accounts:
- Track logo appearances in images
- Detect product placements
- Monitor sentiment around brand discussions
2. Product Launch Detection
Get instant alerts when competitors announce new products:
- Analyze images for product unveils
- Detect "new" or "launching" keywords
- Track announcement patterns
3. Content Moderation
Filter and flag inappropriate content:
- Adult content detection (Azure/AWS)
- Unsafe content filtering
- Brand safety monitoring
4. Competitor Analysis
Track competitor social media activity:
- Monitor posting frequency
- Analyze content themes
- Track image-based campaigns
5. Social Media Intelligence
Aggregate insights from multiple accounts:
- Trending topics detection
- Hashtag performance tracking
- Engagement pattern analysis
6. Market Research
Gather visual data for market analysis:
- Product feature comparisons
- Packaging design trends
- Campaign creative analysis
π Quick Start
1. Basic Tweet Scraping (No Vision)
{"usernames":["apify"],"maxItems":50}
2. Keyword Filtering
{"usernames":["techcrunch","theverge"],"searchTerms":["AI","ChatGPT"],"maxItems":100}
3. Image-Only Collection
{"usernames":["nasa","spacex"],"requireImages":true,"maxItems":50}
4. With Gemini Vision
{"usernames":["producthunt"],"requireImages":true,"visionPipelines":[{"name":"Product Detector","provider":"gemini_vision","enabled":true,"configJson":"{\"prompt\":\"Describe this product\",\"triggerKeywords\":[\"app\",\"software\"]}"}]}
π Performance & Limits
- Speed: 50-100 tweets per minute (RSS mode)
- Concurrent Accounts: Up to 100 usernames
- Vision Processing: ~2-5 seconds per image per provider
- Memory: 512MB recommended (1GB for heavy vision usage)
- Timeout: 300 seconds default (adjust in Actor settings)
π§ Troubleshooting
No Items Collected
Possible causes:
- Bot protection blocking Nitter instances
- Invalid usernames
- User accounts have no recent posts
- Filters are too restrictive
Solutions:
- Verify usernames are correct (without @ symbol)
- Try different time of day
- Reduce filter restrictions
- Check Actor logs for specific errors
Vision Analysis Not Working
Possible causes:
- Missing API credentials in environment variables
- Invalid API keys
- API rate limits exceeded
- Image URLs inaccessible
Solutions:
- Verify all required environment variables are set
- Check API key validity in provider dashboard
- Review Actor logs for specific API errors
- Ensure images are publicly accessible
Webhook Alerts Not Received
Possible causes:
- Invalid webhook URL
- Webhook endpoint timeout
- Firewall blocking Apify IPs
Solutions:
- Test webhook URL with curl/Postman
- Increase webhook timeout in config
- Verify webhook endpoint accepts POST requests
- Check webhook logs for incoming requests
ποΈ Architecture
Data Flow
- Input Validation - Verify usernames and configuration
- Instance Discovery - Fetch working Nitter instances from status page
- RSS Scraping - Try RSS feeds from multiple instances
- Web Scraping Fallback - Parse HTML if RSS fails
- Content Filtering - Apply keyword/hashtag filters
- Vision Processing - Run enabled AI pipelines on images
- Alert Triggering - Send webhooks for matched patterns
- Data Storage - Save to Apify datasets
Technical Stack
- Runtime: Node.js 18 (Apify SDK 3.x)
- HTTP Client: Axios
- HTML Parsing: Cheerio
- RSS Parsing: rss-parser
- AI Providers: Native REST APIs
- Image Processing: Base64 encoding for Gemini/OpenAI
π Changelog
Version 1.0.0 (2026-02-01)
- β¨ Initial release
- π RSS-first scraping with web fallback
- π€ 6 AI vision providers
- π Webhook alert system
- π Dual dataset output
π License
Apache-2.0
π Support & Resources
- π Apify Documentation
- π¬ Apify Discord Community
- π Report Issues
- π‘ Feature Requests
- π§ Contact Support
π Credits
Built with β€οΈ using:
- Apify Platform
- Nitter instances
- OpenAI GPT-4o Vision
- Google Gemini
- Google Cloud Vision
- Azure Computer Vision
- AWS Rekognition
Made by [dubz]
