VOOZH about

URL: https://apify.com/dtrungtin/fuzzy-search-dataset-actor

⇱ Fuzzy Search Dataset Actor Β· Apify


Pricing

from $0.001 / actor start

Go to Apify Store

Fuzzy Search Dataset Actor

Search any Apify dataset using typo-tolerant fuzzy matching.

Pricing

from $0.001 / actor start

Rating

0.0

(0)

Developer

πŸ‘ Tin

Tin

Maintained by Community

Actor stats

0

Bookmarked

3

Total users

3

Monthly active users

2 months ago

Last modified

Share

Search any Apify dataset using typo-tolerant fuzzy matching. Point this Actor at an existing dataset, provide a query, and get back ranked results β€” even when spellings are imperfect. Try it directly in Apify Console.

What does Fuzzy Search Dataset do?

This Actor loads records from any Apify dataset and runs a fuzzy full-text search across one or more fields. It handles typos, partial matches, and word-order variations automatically. For example, searching "iphon pro mx" can still return results for "iPhone 15 Pro Max".

It's ideal for post-processing scraped data β€” after collecting a large dataset with another Actor, use this one to instantly build a search layer on top of it without any external infrastructure.

Why use Fuzzy Search Dataset?

  • No search engine needed β€” works directly on any Apify dataset without Elasticsearch, Algolia, or similar tools
  • Typo-tolerant β€” handles misspellings, abbreviations, and partial queries out of the box
  • Multi-field search β€” search across title, description, brand, nested fields like product.name, or any combination
  • Tunable relevance β€” control strictness, field weights, and minimum match length to fit your data
  • Automation-ready β€” trigger via API, schedule it, or chain it after a scraping Actor in a workflow

How to use Fuzzy Search Dataset

  1. Run a scraping Actor to build the source dataset (or use any existing dataset you already have in Apify Console).
  2. Copy the dataset ID from the dataset URL or the Storage section of Apify Console.
  3. Open this Actor and paste the dataset ID into the Dataset ID field.
  4. Enter your search query and choose which fields to search.
  5. Run the Actor β€” results are written to its output dataset, ranked by relevance score.

Input

Configure the Actor in the Input tab or pass a JSON object via the API.

FieldTypeRequiredDefaultDescription
datasetIdstringβœ…β€”ID of the Apify dataset to search
querystringβœ…β€”Text to search for
fieldsarray of strings["title"]Dataset fields to search (supports dot notation for nested fields)
limitinteger20Maximum number of results to return (1–1000)
thresholdnumber0.35Fuzzy strictness β€” 0.0 = exact only, 1.0 = match anything
ignoreLocationbooleantrueAllow matches anywhere in the text, not just at the start
minMatchCharLengthinteger2Minimum characters in a token before it counts as a match
includeScorebooleantrueAttach a relevance score to each result (lower = better match)
includeMatchesbooleanfalseInclude matched text ranges β€” useful for keyword highlighting
extendedSearchbooleanfalseEnable advanced query syntax (^starts-with, !exclude, =exact)
weightsobjectβ€”Per-field importance weights, e.g. {"title": 0.7, "description": 0.3}

Example input:

{
"datasetId":"UoYaa1QjGdgdJrSHA",
"query":"iphon pro max",
"fields":["title","description"],
"limit":10,
"threshold":0.35,
"weights":{
"title":0.8,
"description":0.2
}
}

Output

Results are pushed to the Actor's default dataset as a single object containing the query, total result count, and an array of ranked matches.

Example output:

{
"query":"iphon pro max",
"totalResults":3,
"results":[
{
"rank":1,
"score":0.04,
"item":{
"title":"Apple iPhone 15 Pro Max",
"description":"6.7-inch Super Retina XDR display, A17 Pro chip",
"brand":"Apple",
"price":1199
}
},
{
"rank":2,
"score":0.18,
"item":{
"title":"iPhone 14 Pro Max",
"description":"48MP main camera, Dynamic Island",
"brand":"Apple",
"price":999
}
}
]
}

You can download results in JSON, CSV, Excel, or HTML from the dataset tab in Apify Console or via the Apify API.

Output data fields

FieldFormatDescription
querytextThe search query that was used
totalResultsnumberNumber of results returned
results[].ranknumber1-based position in results (1 = best match)
results[].scorenumberRelevance score (0.0 = perfect match, 1.0 = no match)
results[].itemobjectFull original record from the source dataset

Tips and advanced options

Tuning the threshold:

  • 0.2 β€” strict, good for product codes and exact names
  • 0.35 β€” balanced (default), works well for product titles and descriptions
  • 0.6 β€” loose, useful for free-text fields or short queries

Multi-field search with weights:

To boost title matches above description matches, set weights to {"title": 0.8, "description": 0.2}. Weights must sum to 1.0 across all searched fields.

Advanced query syntax (when extendedSearch is enabled):

SyntaxMeaningExample
=iphoneExact match=iPhone 15
^appleStarts with^Apple
!samsungExclude!Samsung
'proIncludes token'pro

Performance: The Actor loads the entire dataset into memory. For datasets over 100k records, consider filtering the source dataset first or increasing the Actor's memory allocation.

Pricing / Cost estimation

This Actor processes data in-memory without using a browser or proxy, so compute costs are low. A typical run over 10,000 records completes in under 30 seconds. Apify provides a free tier sufficient for many use cases.

FAQ and support

Is this legal? This Actor only reads from datasets you own or have access to on the Apify platform. It does not scrape any external websites.

Can I use this with datasets from other Actors? Yes β€” as long as you have access to the dataset ID, this Actor can read it.

The results are empty or not what I expected. Try raising the threshold value (e.g. to 0.5), enabling ignoreLocation, or adding more fields to the fields array.

Found a bug or want a feature? Open an issue in the Issues tab on this Actor's page. Custom solutions are also available β€” reach out via the Apify platform.

Resources

You might also like

CRM Deduplication Tool

enosgb/crm-deduplication-tool

Detects and merges duplicate contacts in CRM databases using advanced fuzzy matching algorithms

HubSpot Company Enrichment & Fuzzy Matcher for Clay

alizarin_refrigerator-owner/hubspot-company-enrichment-fuzzy-matcher-for-clay

Fuzzy match and enrich companies against your HubSpot CRM using multi-signal matching (domain, company name, phone, location). Returns HubSpot ID, lifecycle stage, deal status & confidence scores. Perfect for Clay workflows, lead deduplication, and outbound enrichment.

Content Similarity Finder

fiery_dream/content-similarity-finder

Find duplicate and similar content with advanced fuzzy matching algorithms. Perfect for data cleaning and deduplication.

πŸ‘ User avatar

Cody Churchwell

2

Dataset Download

idiatech/apify-Dataset-Download

Download any dataset from the Apify platform automatically and in any format you want. Use this actor along with a Dataset toolbox automation tool.

Data.gov.uk Scraper - Cheap πŸŒπŸ“ŠπŸ‡¬πŸ‡§

scrapestorm/data-gov-uk-scraper---cheap

πŸ”Ž Easily collect dataset listings from data.gov.uk Provide one or multiple search URLs and extract dataset information such as πŸ“„ Dataset Title 🏒 Published By πŸ•’ Last Updated πŸ“ Description πŸ”— Dataset URL & more Perfect for open data research, government data monitoring & dataset discovery πŸ“ŠπŸš€

1

5.0

AI Prompt Keyword Matcher

antonio_espresso/ai-prompt-keyword-matcher

Analyze prompts for fuzzy keyword matches and brand token usage.

8

Data.gov.uk Scraper - Low-costπŸ’²πŸ”₯πŸ“šπŸ‡¬πŸ‡§

delectable_incubator/data-gov-uk-scraper-low-cost

Scrape data.gov.uk dataset listings πŸ”ŽπŸ“Š with a powerful open data scraper. Extract dataset titles, publishers, update dates, descriptions, tags, and dataset URLs from search results. Ideal for government data monitoring, open data research, dataset discovery, and structured data catalog creation πŸš€

Full Sanctions Screener API

george.the.developer/full-sanctions-screener

Screen entities against OFAC SDN, EU, UN, and UK OFSI sanctions lists with fuzzy matching. Instant API with standby mode.