parser Package¶
parser Package¶
cities Module¶
models Module¶
numbered_streets Module¶
parsing Module¶
- class ebpub.geocoder.parser.parsing.Location(*args)¶
Bases: dict
A dict-like object with only a few valid keys: ('number', 'pre_dir', 'prefix', 'street', 'suffix', 'post_dir', 'city', 'state', 'zip')
- class ebpub.geocoder.parser.parsing.Standardizer(d)¶
Bases: object
Replaces a suffix, directional, state, etc. with the preferred standard form.
For example, given the text "avenu" for suffixes, returns "AVE".
>>> suff_standardizer = Standardizer(suffixes) >>> suff_standardizer("avenu") 'AVE' >>> dir_standardizer = Standardizer(DIRECTIONALS) >>> dir_standardizer("north") 'N' >>> dir_standardizer("n") 'N' >>> pre_standardizer = Standardizer(prefixes) >>> pre_standardizer("US hwy") 'US HIGHWAY' >>> pre_standardizer("SR") 'STATE ROUTE'
- ebpub.geocoder.parser.parsing.abbrev_regex(d, case_insensitive=True, matches_entirely=True)¶
Returns a regular expression pattern that matches an abbreviation:
>>> suffixes = { ... 'av': ['ave', 'avenue'], ... 'st': ['str', 'street'], ... 'rd': 'road' ... } >>> regex = abbrev_regex(suffixes) >>> re.search(regex, "Ave") <_sre.SRE_Match object at ...> >>> re.search(regex, " Ave ") == None True >>> regex = abbrev_regex(suffixes, case_insensitive=False) >>> re.search(regex, "str") <_sre.SRE_Match object at ...> >>> re.search(regex, "Str") == None True >>> regex = abbrev_regex(suffixes, matches_entirely=False) >>> re.search(regex, " Road ") <_sre.SRE_Match object at ...>
- ebpub.geocoder.parser.parsing.address_combinations()¶
Generator that yields a list of strings for every possible combination of address tokens. For example:
['number', 'pre_dir', 'street'] ['number', 'street', 'city', 'state']
There were about 6240 combinations at last count.
- ebpub.geocoder.parser.parsing.normalize(location)¶
Normalizes an address string for parsing, comparisons.
>>> normalize(u"1972 n. dawson ave. chicago il") u'1972 N DAWSON AVE CHICAGO IL' >>> normalize(u"1972 n. dawson ave., chicago il") u'1972 N DAWSON AVE CHICAGO IL' >>> normalize(u"n kimball ave & w diversey ave") u'N KIMBALL AVE & W DIVERSEY AVE' >>> normalize(None) u'NONE'
- ebpub.geocoder.parser.parsing.number_standardizer(s)¶
Removes the second number in hyphenated addresses such as '123-02', as used in NYC. Note that this also removes the second number in address ranges, and non-digit prefixes or suffixes:
>>> number_standardizer('1-2') '1' >>> number_standardizer('100-200') '100' >>> number_standardizer('12A-12B') '12' >>> number_standardizer('x') 'x' >>> number_standardizer('257b') '257' >>> number_standardizer('9L00') '9' >>> number_standardizer('W01') '01' >>> number_standardizer('9 8 7 6 5') '9'
- ebpub.geocoder.parser.parsing.parse(location)¶
Given a location string, return a list of possible valid results as Location instances.
- ebpub.geocoder.parser.parsing.prefix_regex(case_insensitive=True, matches_entirely=True)¶
Returns a regex that matches any token of the prefixes:
>>> regex = prefix_regex() >>> re.search(regex, 'HWY') <_sre.SRE_Match object at ...> >>> re.search(regex, 'NY') <_sre.SRE_Match object at ...> >>> print re.search(regex, 'nope') None
- ebpub.geocoder.parser.parsing.strip_unit(location)¶
Given an address string, strips the apartment number, suite number, etc.
>>> strip_unit('200 E 31st st') '200 E 31st st' >>> strip_unit('200 E 31st st unit 123') '200 E 31st st' >>> strip_unit('123 W broadway apt B') '123 W broadway' >>> strip_unit('99 s northshore drive apt. B') '99 s northshore drive' >>> strip_unit('45 carlton ave #12') '45 carlton ave' >>> strip_unit('148 lafayette st suite 13') '148 lafayette st'
- ebpub.geocoder.parser.parsing.token_split()¶
findall(string[, pos[, endpos]]) --> list. Return a list of all non-overlapping matches of pattern in string.