![]() |
VOOZH | about |
dotnet add package Softellect.AddressProcessor --version 10.0.100.3
NuGet\Install-Package Softellect.AddressProcessor -Version 10.0.100.3
<PackageReference Include="Softellect.AddressProcessor" Version="10.0.100.3" />
<PackageVersion Include="Softellect.AddressProcessor" Version="10.0.100.3" />Directory.Packages.props
<PackageReference Include="Softellect.AddressProcessor" />Project file
paket add Softellect.AddressProcessor --version 10.0.100.3
#r "nuget: Softellect.AddressProcessor, 10.0.100.3"
#:package Softellect.AddressProcessor@10.0.100.3
#addin nuget:?package=Softellect.AddressProcessor&version=10.0.100.3Install as a Cake Addin
#tool nuget:?package=Softellect.AddressProcessor&version=10.0.100.3Install as a Cake Tool
Updated 204-11-26
This document describes internal logic and main interfaces of F# Address Process (AP) Service.
The problem, which AP was designed to solve, is substantially different from the one, which USAddress.AddressParser (UAP) is solving. UAP utilizes pattern matching to split a single address into parts: full street name (including street number), city, state, zip. So, if a collection of addresses or address with some extra and/or erroneous and/or missing information is given as an input, UAP often performs incorrect splits.
On the other side AP was designed to quickly handle bad addresses and/or collections of addresses as input. In particular, a collection of addresses is assumed as not several valid addresses separated by some token (like space, comma, semicolon) but, rather, a string where the common part of address is not repeated, for example: "660-680 N 9 ST & GARAGE BLYTHE CA 92225", which has two addresses: "660 N 9 ST BLYTHE CA 92225", "680 N 9 ST BLYTHE CA 92225" and some extra words, which should be ignored. In addition, the users might miss some parts of the address, mistype and/or use various abbreviations, etc... There are over 200 known USPS street type abbreviations, some of which might have up to 3 "flavors". For example, "ST GEORGE STREET" might be inputted by user as "ST GEORGE STREET", "ST GEORGE STR", "ST GEORGE", or even "GEORGE ST", etc... To address such a problem, AP uses full address table preprocessed and partitioned as necessary.
To address the issue of different street types and directions, AP performs data cleaning and standardization as follows.
let GarbagePattern = @"%[^a-zA-Z0-9 - /]%"Once the data is cleaned a partitioning is performed by removing house number column and storing obtained StreetFullName, City, State, and ZipCode in the table StreetZips. Further partitioning is performed by storing aggregate information in the tables: ZipCodeCities (a map from state to all cities in each zip code) and StateCities (a map from zip code to all cities in each state).
AP parses the address string backwards. Note that "-" is not a token for AP and all parts separated by "-" are glued together to form a single word. AP deals with "-" internally because the logic is very different depending on the location of hyphen.
AP uses two standardized circular rule collections (RuleInfo). Each rule collection contains the following rules (and some supporting data, which is not described here):
zipRule : Rule - attempts to extract the zip from the last word of the string.stateRule : Rule - attempts to extract state.cityRule : Rule - attempts to extract city.streetRule : Rule - attempts to extract street.numberRule : bool -> Rule - attempts to extract house numbernewAddress : Rule - applies new address and restarts processing.If any of the rules succeeds, then it removes the word(s) that it processed and passes the remaining string further. newAddress rule creates address and restarts processing if there are any words left.
The rules utilize look ahead and look backward checks. It is now easier to explain how AP works using an example: "660-680 N 9 ST & GARAGE BLYTHE CA 92225":
zipUpdater : AsyncUpdater<ZipCode, ZipMap>
stateCityUpdater : AsyncUpdater<State * City, StateCityMap>
zipToCityUpdater : AsyncUpdater<ZipCode, ZipToCityMap>
stateToCityUpdater : AsyncUpdater<State, StateToCityMap>
wordMap : Map<ZipCode, Map<string, string>>
Full description of these maps is beyond the scope of this document, so only the primary map is described in details: zipUpdater : AsyncUpdater<ZipCode, ZipMap>. AsyncUpdater dynamically loads a part of map, called ZipMap for a given zip code. This map is internally a type abbreviation:
type ZipMap = Map<ZipCode, Map<list<string>, list<StreetCityState>>>.
So, it is a map from ZipCode (key is a zip code) to a map of list<string> (key is a list of all possible sorted valid word combinations in a street name) to a list<StreetCityState>> (list of Street, City, State triples) of all StreetCityStates, where any valid sorted sublist from the list of words from which full street name consists, matches the key. A set of valid sublists is obtained from the full set of all sublist of words, from which the full street name consists, by applying certain weighting function. See toValidSubLists for details. The main idea is that if we have, let's say "Massachusetts Ave" then we want to match it with input "Massachusetts Ave", "Massachusetts", but not with "Ave". Perfect matches are always returned and partial matches are sorted by some rank to return the best match.
ZipMap.ZipMap.| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
This package is not used by any NuGet packages.
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 10.0.100.3 | 7,068 | 1/5/2026 |
| 9.0.100.2 | 15,940 | 11/26/2024 |
| 9.0.100.1 | 243 | 11/26/2024 |